• theherk@lemmy.world
    link
    fedilink
    English
    arrow-up
    145
    arrow-down
    4
    ·
    1 day ago

    Seems like a reasonable approach. Make people be accountable for the code they submit, no matter the tools used.

    • ell1e@leminal.space
      link
      fedilink
      English
      arrow-up
      28
      arrow-down
      1
      ·
      1 day ago

      If the accountability cannot be practically fulfilled, the reasonable policy becomes a ban.

      What good is it to say “oh yeah you can submit LLM code, if you agree to be sued for it later instead of us”? I’m not a lawyer and this isn’t legal advice, but sometimes I feel like that’s what the Linux Foundation policy says.

      • ViatorOmnium@piefed.social
        link
        fedilink
        English
        arrow-up
        51
        arrow-down
        1
        ·
        1 day ago

        But this was already the case. When someone submitted code to Linux they always had to assume responsibility for the legality of the submitted code, that’s one of the points of mandatory Signed-off-by.

        • badgermurphy@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          20
          ·
          1 day ago

          But now, even the person submitting the license-breaching content may be unaware that they are doing that, so the problem is surely worse now that contributors can easily unwittingly be on the wrong side of the law.

          • Traister101@lemmy.today
            link
            fedilink
            English
            arrow-up
            45
            arrow-down
            1
            ·
            1 day ago

            That’s their problem. If they are using an LLM and cannot verify the output they shouldn’t be using an LLM

            • jj4211@lemmy.world
              link
              fedilink
              English
              arrow-up
              6
              ·
              1 day ago

              Problem is that broadly most GenAI users don’t take that risk seriously. So far no one can point to a court case where a rights holder successfully sued someone over LLM infringement.

              The biggest chance is getty and their case, with very blatantly obvious infringement. They lost in the UK, so that’s not a good sign.

            • hperrin@lemmy.ca
              link
              fedilink
              English
              arrow-up
              2
              ·
              23 hours ago

              Nobody can verify that the output of an LLM isn’t from its training data except those with access to its training data.

            • badgermurphy@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              13
              ·
              1 day ago

              It is their problem until the second they submit it, then it is the project’s problem. You can lay the blame for the bad actions wherever you want, but the reality is that the work of verifying the legality and validity of these submissions if being abdicated, crippling projects under increased workloads going through ever more submissions that amount to junk.

              What is the solution for that? The fact that is the fault of the lazy submitter doesn’t clean up the mess they left.

              • Traister101@lemmy.today
                link
                fedilink
                English
                arrow-up
                13
                arrow-down
                1
                ·
                1 day ago

                Frankly I expect the kernel dudes to be pretty good about this, their style guides alone are quite strick and any funny business in a PR that isn’t marked correctly is I think likely a ban from making PRs at all. How it worked beforehand, as already stated by others is the author says “I promise this follows the rules” and that’s basically the end of it. Giving an official avenue for generated code is a great way to reduce the negatives of it that’ll happen anyway. We know this from decades of real life experience trying to ban things like alcohol or drugs, time after time providing a legal avenue with some rules makes things safer. Why wouldn’t we see a similar effect here?

                • badgermurphy@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  24 hours ago

                  I do think that some projects will fare better than others, particularly ones like you mentioned, where the team is robust and capable of handling the filtering of increased submissions from these new sources.

                  I believe we are going to end up having to see some new mechanism for project submissions to deal with the growing imbalance between submission volume and work hours available for review, as became necessary when viruses, malware, and spam first came into being. It has quickly become incredibly easy for anyone to make a PR, but not at all easier to review them, so something is going to have to give in the FOSS world.

    • hperrin@lemmy.ca
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      7
      ·
      1 day ago

      No, it’s not a reasonable approach. Make people be the authors of the code they submit is reasonable, because then it can be released under the GPL. AI generated code is public domain.

      • theherk@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        8
        ·
        1 day ago

        I suppose there should be no code generators, assemblers, compilers, linkers, or lsp’s then either? Just etching 1’s and 0’s?

        • hperrin@lemmy.ca
          link
          fedilink
          English
          arrow-up
          5
          ·
          1 day ago

          The copyright office has made it explicitly clear that those tools do not interfere with the traditional elements of authorship, and that the use of LLMs does. So, if you don’t want to take my word for it, take the US Copyright Office’s word for it.

          • theherk@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            edit-2
            23 hours ago

            As the agency overseeing the copyright registration system, the Office has extensive experience in evaluating works submitted for registration that contain human authorship combined with uncopyrightable material, including material generated by or with the assistance of technology. It begins by asking “whether the ‘work’ is basically one of human authorship, with the computer [or other device] merely being an assisting instrument, or whether the traditional elements of authorship in the work (literary, artistic, or musical expression or elements of selection, arrangement, etc.) were actually conceived and executed not by man but by a machine.” In the case of works containing AI-generated material, the Office will consider whether the AI contributions are the result of “mechanical reproduction” or instead of an author’s “own original mental conception, to which [the author] gave visible form.” The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work. This is necessarily a case-by-case inquiry. If a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the “traditional elements of authorship” are determined and executed by the technology—not the human user. Based on the Office’s understanding of the generative AI technologies currently available, users do not exercise ultimate creative control over how such systems interpret prompts and generate material. Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output. For example, if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare’s style. But the technology will decide the rhyming pattern, the words in each line, and the structure of the text. When an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship. As a result, that material is not protected by copyright and must be disclaimed in a registration application.

            In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.” Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are “independent of ” and do “not affect” the copyright status of the AI-generated material itself.

            This policy does not mean that technological tools cannot be part of the creative process. Authors have long used such tools to create their works or to recast, transform, or adapt their expressive authorship. For example, a visual artist who uses Adobe Photoshop to edit an image remains the author of the modified image, and a musical artist may use effects such as guitar pedals when creating a sound recording. In each case, what matters is the extent to which the human had creative control over the work’s expression and “actually formed” the traditional elements of authorship.

            https://www.copyright.gov/ai/ai_policy_guidance.pdf

            What this makes clear is that it certainly isn’t black or white as you say. Nevertheless, automation converting an input to an output, simply cannot be the only mechanism used in determining authorship.

            And that wouldn’t change my statement anyway, but rather supports it. The person submitting a patch must be accountable for its contents.

            An outright ban would need to carefully define how an input gets converted to an output, and that may not be so clear. To be effectively clear, one would have to potentially end the use of many tools that have been used for many years in the kernel, including snippet generation, spelling and grammar correction, IDE autocompleting. So such a reductive view simply will not suffice.


            Additionally, copywritability and licenseability are wholly different questions. And it does not violate GPL to include public domain content, since the license applies to the aggregate work.

            • hperrin@lemmy.ca
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              1
              ·
              edit-2
              23 hours ago

              If a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the “traditional elements of authorship” are determined and executed by the technology—not the human user. Based on the Office’s understanding of the generative AI technologies currently available, users do not exercise ultimate creative control over how such systems interpret prompts and generate material. Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output. For example, if a user instructs a text-generating technology to “write a poem about copyright law in the style of William Shakespeare,” she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare’s style. But the technology will decide the rhyming pattern, the words in each line, and the structure of the text. When an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship. As a result, that material is not protected by copyright and must be disclaimed in a registration application.

              That seems very clear to me. Generative AI output is not human authored, and therefore not copyrighted.

              The policy I use also makes very clear the definition of AI generated material:

              https://sciactive.com/human-contribution-policy/#Definitions

              I’m not exactly sure how you can possibly think there is an equivalence between a tool like a spelling and grammar checker and a generative AI, but there’s a reason the copyright office will register works that have been authored using spelling and grammar checkers, but not works that have been authored using LLMs.

              • theherk@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                ·
                23 hours ago

                Just read the next two paragraphs. Don’t just stop because you got to something that you like. The equivalence I draw is clear. You don’t like it, and that’s okay. But one would have to clarify exactly what the ban entails, and that wouldn’t be as clear as you might think. LLM’s only, transformers specifically, what about graph generation, other ML models? Is it just ML? If so, is that because a matrix lattice was used to get from input to output? Could other deterministic math functions trigger the same ban? What is a spell checker used RNG to select best replacement from a list of correct options? What if a compiler introduces an assembled output with an optimization not of the authors writing?

                Do you see why they say “The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work. This is necessarily a case-by-case inquiry”?

                And that still affects copywriteability, not license compliance.

                • hperrin@lemmy.ca
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  22 hours ago

                  Do you want to explain to me what, in those two paragraphs, means that the use of spell checkers and LLMs is equivalent with regard to copyrightability? It seems like those paragraphs make it clear that the use of spell checkers is not the same as LLMs.

                  The policy I use bans “generative AI model” output. Generative AI is a pretty well defined term:

                  https://en.wikipedia.org/wiki/Generative_AI

                  https://www.merriam-webster.com/dictionary/generative AI

                  If you have trouble determining whether something is a generative AI model, you can usually just look up how it is described in the promotional materials or on Wikipedia.

                  Type: Large language model, Generative pre-trained transformer

                  - https://en.wikipedia.org/wiki/Claude_(language_model)

                  I never said it violates GPL to include public domain code. I’m not sure where you got that from. What I said is that public domain code can’t really be released under the GPL. You can try, but it’s not enforceable. As in, you can release it under that license, but I can still do whatever I want with it, license be damned, because it’s public domain.

                  I did that with this vibe coded project:

                  https://github.com/hperrin/gnata

                  I just took it and rereleased it as pubic domain, because that’s what it is anyway.

                  • ∃∀λ@programming.dev
                    link
                    fedilink
                    English
                    arrow-up
                    3
                    ·
                    10 hours ago

                    public domain code can’t really be released under the GPL

                    Disney created films based on old fairy tales. Disney has a copyright on those films even though they include elements from the public domain because the films also include the artists’ original expression. The linux kernel (probably) contains public domain AI-generated code alongside original work from its many contributors. If you wanted to get the entire project into the public domain, you’d have to get permission from nearly all its contributors or wait for their copyright term to expire. The small snippets of code which were AI-generated are public domain. The bulk of the project isn’t, and the project as a whole isn’t.

                    As much as I dislike AI, I can’t say I understand forbidding AI-generated contributions on the grounds that the submitted code is public domain. I suppose somebody can come along and “steal” the public domain snippets, but I suspect it’s difficult to definitively tell apart the human-written code from AI-generated and strip out the human-written bits. If they do, what’s the issue? It wasn’t yours to begin with and you can still keep it in your project. Moreover, now that the magical plagiarism machines exist, who’s going to be lifting code in this way, anyway?

      • ziproot@lemmy.ml
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        2
        ·
        1 day ago

        Isn’t that the rule? The author has to be a human?

        The new guidelines mandate that AI agents cannot use the legally binding “Signed-off-by” tag, requiring instead a new “Assisted-by” tag for transparency. Ultimately, the policy legally anchors every single line of AI-generated code and any resulting bugs or security flaws firmly onto the shoulders of the human submitting it.