RFC 239: Policy on LLM assistance in contributions#239
Conversation
|
Fixes #202 |
gsnedders
left a comment
There was a problem hiding this comment.
Thanks for trying to tackle this!
I think we should also write something about use of LLMs for review, if nothing else.
| A few examples of policies on LLM use in FOSS contributions: | ||
|
|
||
| - permissive | ||
| - [ghostty/AI_POLICY.md at main · ghostty-org/ghostty](https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md) | ||
| - [Policy about LLM generated code from PRs · Issue #28335 · opencv/opencv](https://github.com/opencv/opencv/issues/28335) | ||
| - [CONTRIBUTING.md: Guidelines relevant to AI-assisted contributions by gasche · Pull Request #14052 · ocaml/ocaml](https://github.com/ocaml/ocaml/pull/14052) | ||
| - [LLVM AI Tool Use Policy — LLVM 23.0.0git documentation](https://llvm.org/docs/AIToolPolicy.html) | ||
| - prohibitive | ||
| - [Code of Conduct ⚡ Zig Programming Language](https://ziglang.org/code-of-conduct/#strict-no-llm-no-ai-policy) | ||
| - [Getting Started - The Servo Book](https://book.servo.org/contributing/getting-started.html#ai-contributions) |
There was a problem hiding this comment.
I feel like there's three others which are notably relevant here: Chromium's, and Firefox's, given they are two of the five repos which have approval to land changes in WPT without further review. (WebKit and Test262 do not currently have policies — TC39's explicitly does not apply to code.)
| > Commits generated entirely by an LLM must be attributed to the LLM in the | ||
| > "Author" field. |
There was a problem hiding this comment.
This feels problematic. If we attribute a PR to Claude, Gemini, or OpenAI's GPT, if I try and contact the author… well, I don't think Anthropic, Google, or OpenAI are going to be very helpful?
Both Chromium and Firefox's policies are crystal clear that humans are still the authors and must self-review before submitting.
Therefore, when there's still a human very much in the loop who is required to self-review, it does not seem reasonable to consider the LLM the author — and the Chromium policy is explicit that, "Authors must attest that the code they submit is their original creation, regardless of whether AI tooling was used".
There was a problem hiding this comment.
Strong agree on this point for all the reasons you give. The Author field's purpose is to give a contact for problems/question, not assign blame. Listing an LLM is worthless there.
And also, yeah, assigning authorship to an LLM is abrogating your responsibility as an engineer to commit useful code that you understand.
| > Contributions that contain substantial amounts of tool-generated content must | ||
| > be labeled as such. |
There was a problem hiding this comment.
Neither Chromium nor Firefox require this today, and it's entirely plausible we've already had commits land into WPT via exports which don't meet this bar.
That said, Chromium's policy here is currently:
To aid reviewers, authors should flag areas that they are not confident about that had AI assistance.
This is maybe a weaker form, and hopefully something more in line with existing contributions.
| > ### For Trusted External Review | ||
| > | ||
| > Some external projects conduct review which the WPT maintainers recognize as | ||
| > authoritative. From rendering engines like Gecko to dedicated test suites | ||
| > like WASM, patches merged in these projects are incorporated into WPT without | ||
| > further review. The policy outlined by this document does not apply to these | ||
| > contributions; the external projects are trusted to determine their own | ||
| > mechanisms for quality assurance. |
There was a problem hiding this comment.
This feels like it should probably be at least in part in another RFC that tries to define our existing policies?
As far as I'm aware, there's currently five repos which have approval to incorporate based on downstream review — Chromium, Firefox, Servo, Test262, and WebKit.
My understanding of the unwritten policy is we trust downstream reviewers; I can't even find the various places where we've elucidated parts of the policy over the years.
|
|
||
| ## Details | ||
|
|
||
| Proposed text: |
| > contributions; the external projects are trusted to determine their own | ||
| > mechanisms for quality assurance. | ||
|
|
||
| ## Risks |
There was a problem hiding this comment.
I think it's worthwhile including at least a few more technical risks:
- Contributions of tests generated by an LLM closely looking at a specific implementation's code, matching that implementation, rather than the spec. (This is, of course, already an issue — but could inevitably become more of a problem if we get more, larger contributions.)
- Contributions not matching the spec at all. I've seen this mostly with trying to generate tests to assert ordering of things which end of using HTML's parallelism and HTML's event loops; that case is especially annoying because it can lead to flaky tests.
This initial draft takes a maximalist approach (and a permissive stance) to promote a robust and grounded discussion.
Rendered