I fucked with the title a bit. What i linked to was actually a mastodon post linking to an actual thing. but in my defense, i found it because cory doctorow boosted it, so, in a way, i am providing the original source here.

please argue. please do not remove.

  • Melllvar@startrek.website
    link
    fedilink
    English
    arrow-up
    61
    arrow-down
    4
    ·
    11 months ago

    I think we should have a rule that says if a LLM company invokes fair use on the training inputs then the outputs are public domain.

    • Steve@communick.news
      link
      fedilink
      English
      arrow-up
      26
      arrow-down
      1
      ·
      edit-2
      11 months ago

      That’s already been ruled on once.

      A recent lawsuit challenged the human-authorship requirement in the context of works purportedly “authored” by AI. In June 2022, Stephen Thaler sued the Copyright Office for denying his application to register a visual artwork that he claims was authored “autonomously” by an AI program called the Creativity Machine. Dr. Thaler argued that human authorship is not required by the Copyright Act. On August 18, 2023, a federal district court granted summary judgment in favor of the Copyright Office. The court held that “human authorship is an essential part of a valid copyright claim,” reasoning that only human authors need copyright as an incentive to create works. Dr. Thaler has stated that he plans to appeal the decision.

      Why would companies care about copyright of the output? The value is in the tool to create it. The whole issue to me revolves around the AI company profiting on it’s service. A service built on a massive library of copyrighted works. It seems clear to me, a large portion of their revenue should go equally to the owners of the works in their database.

        • Steve@communick.news
          link
          fedilink
          English
          arrow-up
          12
          arrow-down
          3
          ·
          11 months ago

          That’s just saying you can claim copyright if you lie about authorship. The problem then is, you may step into the realm of fraud.

            • Aatube@kbin.social
              link
              fedilink
              arrow-up
              4
              ·
              11 months ago

              Well, what you initially said sounded like fraud, but the incredibly long page indeed doesn’t talk about fraud. However, it also seems a bit vague. What counts as your contributions to the work? Is it part of the input the model was trained on, “I wrote the prompt”, or making additionally changes based on the result?

              • Even_Adder@lemmy.dbzer0.com
                link
                fedilink
                English
                arrow-up
                5
                arrow-down
                1
                ·
                11 months ago

                The vagueness surrounding contributions is particularly troubling. Without clearer guidelines, this seems like a recipe for lawsuits.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      The outputs are not copyrightable.

      But something not being copyrightable doesn’t necessarily mean openly distributed.

      It does mean OpenAI can’t really restrict or go after other companies training off of GPT-4 outputs though, which is occurring broadly.

  • NevermindNoMind@lemmy.world
    link
    fedilink
    English
    arrow-up
    38
    arrow-down
    3
    ·
    11 months ago

    Google scanned millions of books and made them available online. Courts ruled that was fair use because the purpose and interface didn’t lend itself to actually reading the books in Google books, but just searching them for information. If that is fair use, then I don’t see how training an LLM (which doesn’t retain the exact copy of the training data at least in the vast majority of cases) isn’t fair use. You aren’t going to get an argument from me.

    I think most people who will disagree are reflexively anti AI, and that’s fine. But I just haven’t heard a good argument that AI training isn’t fair use.

    • commie@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      3
      ·
      11 months ago

      here’s a sidechannel attack on your position: every use, even infringing uses, are fair use until adjudicated, because what fair use means is that a court has agreed that your infringing use is allowed. so of course ai training (broadly) is always fair use. but particular instances of ai training may be found to not be fair use, and so we can’t be sure that you are always going to be right (for the specific ai models that may come into question legally).

      • runefehay@kbin.social
        link
        fedilink
        arrow-up
        4
        arrow-down
        2
        ·
        11 months ago

        I am no lawyer, but I suspect what will be considered either fair use or infringing will probably depend on how the programmed AI model is used.

        For example, if you train it on a book of poetry, asking it questions about the poetry will probably be considered fair use. If you ask the AI to write poetry in the style of the book’s poems and you publish the AI’s poetry, I suspect it might be considered laundering copyright and infringing. Especially if it is substantially similar to specific poems in the book.

        • commie@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          13
          arrow-down
          1
          ·
          11 months ago

          If you ask the AI to write poetry in the style of the book’s poems and you publish the AI’s poetry, I suspect it might be considered laundering copyright and infringing.

          is the image of a cabin in a snowy landscape copyrighted by Thomas kinkade? fuck no. That’s an idea. ideas can’t be copyrighted. a style isn’t a discreet work. it is an idea. it can’t be copyrighted. if I produce something in the style of Keats or Stephen King or Rowling, they can’t sue me for copyright unless I make a substantially infringing use of their work. The style isn’t sufficient, because the style can’t be copyrighted.

  • Cyber Yuki@lemmy.world
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    2
    ·
    11 months ago

    What constitutes fair use?

    17 U.S.C. § 107

    Notwithstanding the provisions of sections 17 U.S.C. § 106 and 17 U.S.C. § 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.

    GenAI training, at least regarding art, is neither criticism, comment, news reporting scholarship, nor research.

    AI training is not done by scientists but engineers of a corporative entity with a long term profit goal.

    So, by elimination, we can conclude that none of the purposes covered by the fair use doctrine apply to Generative AI training.

    Q.E.D.

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      11 months ago

      “Such as” means that these are examples and not an exhaustive list.

      Can you explain how the 3 factors you listed rule out scholarship or research purpose? Regarding the first factor, how do you determine that AI developers are all engineers and never computer scientists?

      • TheFriar@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        I’d argue that the community benefit aspect of the “scholarship or research purposes”language preclude for-profit AI companies from falling under fair use. These aren’t education programs. They’re not research for the greater good. They are private entities trying to create a machine that can copy until it creates. For their own needs, not the greater good. Education has a net positive effect on society, and those stipulations in the law are meant to better serve the whole.

        If these generative AI machines were being built by students, it would fall under these specifications of fair use. But the profit motive changes everything.

        I’d say “fair use” pretty much covers educational and community benefit. Private companies do neither. They are stealing and reproducing for themselves, not society.

          • TheFriar@lemm.ee
            link
            fedilink
            English
            arrow-up
            0
            ·
            11 months ago

            I think their point is the law is written to benefit people. Not private companies or machines.

            If this wide definition of “teaching” were acceptable, then the entire concept would cease to exist.

            “You stole my paper and reproduced it for profit!”

            “NOO, I’m just teaching my employees to write better. It’ll happen eventually, but we’re at the stage where reproducing something incredibly similar to your paper is necessary!”

            • toast@retrolemmy.com
              link
              fedilink
              English
              arrow-up
              1
              ·
              11 months ago

              I agree that all this needs to be examined, and some new laws and regulations should be developed. But, for good or ill, teaching is a covered use as written in the section of the law quoted above, and teaching is part of the process of training.

              If anything, laws will likely have to be rewritten to adress changing technologies, but it seems disingenuous to quote a section of the law and then ignore the most relevant word in the entire text

              • TheFriar@lemm.ee
                link
                fedilink
                English
                arrow-up
                1
                ·
                11 months ago

                I definitely get your point. But you don’t “teach” a machine. You program a machine. In the case of AI, technically the machine is building its own database and sort of growing and adapting as it gets more advanced.

                I get your point, but I just don’t think “teaching” is even what is happening here. Like I said, if the definition were that broad, it would be rendered meaningless. Not to mention, there are so, so, so many examples of the generative AI just reproducing something specifically in the style of a known artist. Writing in the style of a specific author. It does that because we ask it to, but the point is the program is a machine for reproduction. You don’t teach something without sentience. You teach living things, you write code and make a program act in a specific way. And right now, the programs are blatantly reproducing signature pieces of work.

                Now, OP mentioned we are “teaching” the machines to do things on its own. But my point is that’s not teaching. It’s reproducing and stealing. It’s not creating anything, it’s spitting out elements of what it’s absorbed. And because these machines can’t think, can’t add their own style—because what’s super fucked up is we are pretty much just discussing the machines replacing artists at the moment—these things are about experience and personality. Neither of which AI has. They ingest everything and spit back out what we ask for. And they’re spitting out elements of this or that—and in these cases, it’s intellectual property of artists and writers. And the most depressing aspect of this whole thing is that we have pretty much moved beyond the “wait, out of everything, we are teaching machines to take human creativity and expression away from…humans?” stage and just moved on to talking about whether it’s technically legal.

                I agree, laws will definitely have to be rewritten. But for the sake of argument, I don’t think the letter of the law can be as broad as you’re suggesting. Interesting thought experiment for us, though. Because…no one gives a shit about our takes on the matter lol

                Or the take of artists and writers. But that’s a whole different problem.

  • snooggums@kbin.social
    link
    fedilink
    arrow-up
    21
    arrow-down
    8
    ·
    11 months ago

    Selling an AI model (or usage of that model) that allows for producing works that are clearly based upon those copyrighted works and would be considered copyright infringement if a person did the same thing is not fair use.

    If a person creating the same thing as generative AI would be infringing, then it isn’t magically not infringing because it is on the internet or done by a program. Basically, AI needs to follow the same rules and restrictions as a person would. That does mean that the AI also needs to be trained to not create copyright infringing works if the use of the AI is being sold.

    As a downloadable model that anyone can use at no cost? Sure, whatever is fine. Then it is on the person who uses it and tries to infringe. But if someone pays a company to use their AI to create infringing work, that is on the company and they are just as at fault as if they sold T shirts that infringed on copyright.

    • commie@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      19
      ·
      11 months ago

      If a person creating the same thing as generative AI would be infringing, then it isn’t magically not infringing because it is on the internet or done by a program

      no one is arguing otherwise.

    • commie@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      22
      ·
      11 months ago

      That does mean that the AI also needs to be trained to not create copyright infringing works if the use of the AI is being sold.

      no it doesn’t.

    • commie@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      22
      ·
      11 months ago

      if someone pays a company to use their AI to create infringing work, that is on the company and they are just as at fault as if they sold T shirts that infringed on copyright.

      wrong.

    • commie@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      24
      ·
      11 months ago

      Selling an AI model (or usage of that model) that allows for producing works that are clearly based upon those copyrighted works and would be considered copyright infringement if a person did the same thing is not fair use

      it is.

      • Aatube@kbin.social
        link
        fedilink
        arrow-up
        26
        arrow-down
        1
        ·
        edit-2
        11 months ago

        I think you might want to elaborate

        instead of making 4 replies in 3 minutes
        each averaging
        2.75 words

        • commie@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          13
          arrow-down
          5
          ·
          11 months ago

          I don’t see how selling a model or the use of a model infringes on a specific copyright. whose copyright has been infringed? how can you prove that? take AI out of the question. if you wanted to prove that some other author has infringed the copyright on your novel, how would you do that? if you want to prove that some quote unquote artist has infringed on your copyright, how would you do that? if any of your methods for proving that a person has infringed on your copyright is applicable to an AI, then that’s what that is. but if you can’t prove it, if the AI just learned about how style works, if an AI just saw your work but never actually copied it, then it’s not infringing.

        • commie@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          22
          ·
          edit-2
          11 months ago

          instead of making 4 replies in 3 minutes

          each averaging

          2.75 words

          this is irrelevant to the truth of my claim.

      • snooggums@kbin.social
        link
        fedilink
        arrow-up
        10
        arrow-down
        3
        ·
        11 months ago

        I’m sorry, are you saying that selling a book that has the same characters as a recently released book doing the same things but with wording differences is somehow fair use? Like a book called Harry Potter and the Something Rock with the exact same plot points but worded slightly different is fair use?

        Do you even understand what copyright is?

        • commie@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          2
          ·
          11 months ago

          are you saying that selling a book that has the same characters as a recently released book doing the same things but with wording differences is somehow fair use? Like a book called Harry Potter and the Something Rock with the exact same plot points but worded slightly different is fair use?

          no. I was saying selling an AI model or access to it that is capable of producing that work is not, itself, copyright infringement.

          in fact, do you know what a clean room is? if I provided to a writing team every English language work except those written by JK Rowling and it produced a work exactly like you’re describing, The resultant work would not be infringing copyright. it should not be any different for AI where you cannot prove what materials it was provided.

          • snooggums@kbin.social
            link
            fedilink
            arrow-up
            6
            arrow-down
            4
            ·
            11 months ago

            Copyright doesn’t care if the writer is unaware of the source material because intent doesn’t matter.

            • commie@lemmy.dbzer0.comOP
              link
              fedilink
              English
              arrow-up
              9
              arrow-down
              3
              ·
              11 months ago

              intent does matter for fair use claims, and knowledge matters for bare infringement.

  • MoogleMaestro@kbin.social
    link
    fedilink
    arrow-up
    14
    arrow-down
    7
    ·
    edit-2
    11 months ago

    It isn’t fair use, See most of faq @ fairuse faq.

    “Fair Use” is often the subject of discussion when talking about online copyright with regards to online video content or music sampling, but it’s notably a flawed defense as it generally has no legal definition for how much of certain content can be used or referenced. The very first line of that faq has the following note:

    How do I get permission to use somebody else’s work?
    You can ask for it. If you know who the copyright owner is, you may contact the owner directly. If you are not certain about the ownership or have other related questions, you may wish to request that the Copyright Office conduct a search of its records or you may search yourself. See the next question for more details.

    All artists / writers and others are asking LLM model producers to do is a) Ask for permission or B) Attribute the artists work in some kind of ledger, respecting the copyright of their work. Every work you make (write/play/draw/whatever) has a copyright that should be respected by companies and are not waived by EULA or TOS (ever) and must be respected in order for author attribution as a concept to work at all. There is plenty of free, permissive copyrighted content on the internet that can be used instead to train an LLM, but simply asking for permission or giving attribution would at least be a step in the right direction for these companies and for the industry as a whole.

    Defenders of AI will note that the “use” of art in LLM is limited and thus protected by fair use, but that is debatable based on the content of the above listed FAQ.

    How much of someone else’s work can I use without getting permission?
    Under the fair use doctrine of the U.S. copyright statute, it is permissible to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports. There are no legal rules permitting the use of a specific number of words, a certain number of musical notes, or percentage of a work. Whether a particular use qualifies as fair use depends on all the circumstances. See, Fair Use Index, and Circular 21, Reproductions of Copyrighted Works by Educators and Librarians.

    You can see that the use cases above (commentary, criticism, news reporting and scholarly reports) does not qualify LLM companies to use or train their models with copyrighted data for privatized industry. Additionally, you’ll note that “market disruptive” uses cannot be protected by fair use in it’s definition, meaning that displacing artists with AI automatically makes LLM use of copyrighted material an infraction of copyright that is not protected by the fair use clause.

    Regardless, this will need to be proved in court and even if it passes certain criteria, it will not apply to all infractions. Fair use is a defense, not a protection, and thus LLM producers will have to spend time in court in order to defend individual infractions. There’s no way for them to catch all copyright infringement with one ruling, it needs to be proved on a case-by-case basis.

    IANAL but this is my 2 cents on the matter.

    • commie@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      4
      ·
      11 months ago

      this will need to be proved in court

      this is true of all fair use. this is almost the definition of fair use. Fair use can only exist after a judge has adjudicated it. before it is questionable.

    • Infiltrated_ad8271@kbin.social
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      11 months ago

      You can see that the use cases above (commentary, criticism, news reporting and scholarly reports) does not qualify LLM companies to use or train their models

      Seems quite obvious that the text you quoted refers exclusively to plagiarism. This does not include things like being inspired by it, referencing it, parodying it and of course not training AI either, because what matters is whether the result is protected content.

      You can argue that memorizing and sharing training data is a copyright violation, and that’s a fair point, but it’s also worth noting that this is very much a minority, accidental and is being addressed.

  • cyd@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    2
    ·
    11 months ago

    Agreed. I would also argue that trained model weights are not copyrightable.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      They aren’t.

      Courts have already ruled that copyright requires human creation, and weights are not decided by humans but by the training algorithms.

      • cyd@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        11 months ago

        I didn’t know it was already settled law. But in that case, why are models like llama still released under licenses? If they are non-copyrightable, licenses should be unenforceable and therefore irrelevant.

        • kromem@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 months ago

          The license is related to access.

          Basically it’s gated and not publicly available, and the only way to open the gate is to say “I promise not to do anything outside what you are limiting me to do.”

          A second person that gets access without agreeing to that can use the weights however they want (what copyright would relate to), but the person who gave them access to the weights would have been in breach of their agreement.

          So separate things with different scopes.

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      Ignore the petapixel article. Even the headline is false.

      Fair Use in the US comes directly from the constitution. I don’t think other countries have anything quite like it. Also, in the English Law system much of the law is made by judges (case law). Japan’s system is a Civil Law system which seeks to keep law-making with the actual legislative bodies. Law-makers in these countries have to be much more on the ball.

      About 15 years ago, law-makers in these countries began explicitly allowing AI training (actually, data-mining in general). In Japan this first happened in 2009 and was expanded in 2018.

      It is an advantage for these countries that the matter is already cleared up by their functioning legislatures, but if there is something they didn’t think of (or lobbyists messed things up) then tough luck. The US still has its constitutional protections which means that any necessary corrections can still be made at “runtime”. I think the internet as we know it might not have been possible in any other country.

    • commie@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      5
      ·
      edit-2
      11 months ago

      in the ethical sense, everything is fair use. period.

      in the legal sense, everything is fair use until it’s proven in court not to be.

        • Falcon@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          11 months ago

          If and only if the trained model is accessible without licence.

          E.g. I don’t want Amazon rolling out a Ilm for $100 a month based on freely accessible tutorials written by small developers.

          But yeah duck copyright

        • commie@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          13
          arrow-down
          4
          ·
          11 months ago

          if anybody gets a copy of it, they have no ethical obligation not to share it, and every ethical justification for sharing it.

            • commie@lemmy.dbzer0.comOP
              link
              fedilink
              English
              arrow-up
              11
              arrow-down
              4
              ·
              11 months ago

              this reads like an appeal to ridicule. if you have an objection to what I said please state it.

              • Batman@lemmy.world
                link
                fedilink
                English
                arrow-up
                4
                arrow-down
                1
                ·
                11 months ago

                Every web request costs someone money. If you aren’t paying them you are being provided a service. They’ve given you knowledge/ material in their possession free of charge. You are taking advantage of that good will by using the content for purposes not intended. That is a moral failing.

                To be clear the ownership of the material is not important, just the access is immoral, as the harm is already done.

                Ill add the caveat that it can be moral if they’ve specifically told you you can via the websites robot.txt file which websites of consequence all have. But the assumption has to be they don’t intend this because that is how consent works.

                • commie@lemmy.dbzer0.comOP
                  link
                  fedilink
                  English
                  arrow-up
                  5
                  arrow-down
                  2
                  ·
                  11 months ago

                  They’ve given you knowledge/ material in their possession free of charge.

                  this is a very common human activity

                • commie@lemmy.dbzer0.comOP
                  link
                  fedilink
                  English
                  arrow-up
                  5
                  arrow-down
                  3
                  ·
                  11 months ago

                  the assumption has to be they don’t intend this

                  why? if someone publishes something on port 80, why should I ever assume they mean anything but for me to have and use that data?

                • commie@lemmy.dbzer0.comOP
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  1
                  ·
                  11 months ago

                  You are taking advantage of that good will by using the content for purposes not intended. That is a moral failing.

                  only if there were so e sort of agreement about what the acceptable uses are and what is not acceptable.

                • commie@lemmy.dbzer0.comOP
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  arrow-down
                  1
                  ·
                  11 months ago

                  If you aren’t paying them you are being provided a service.

                  if you ARE paying them, you’re being provided a service, too

                • commie@lemmy.dbzer0.comOP
                  link
                  fedilink
                  English
                  arrow-up
                  11
                  arrow-down
                  1
                  ·
                  11 months ago

                  an appeal to ridicule is also called a horse laugh fallacy. it’s like writing lol instead of actually explaining what’s wrong with the position to which your objecting. this response also reads like an appeal to ridicule. if you can’t explain what’s wrong with my position, maybe you shouldn’t be speaking about my position.

        • commie@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          5
          ·
          11 months ago

          Just because a court hasn’t yet deemed that specific action illegal doesn’t mean it’s not illegal when you do it. Doesn’t matter if the crime is theft, rape, murder, etc.

          theft rape and murder are criminal matters. copyright is civil, and, yes, the courts can adjudicate every individual case.