Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions rfcs/llm_policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# RFC 239: Policy on LLM assistance in contributions

## Summary

Introduce guidelines for acceptable use of large-language models when
contributing to web-platform-tests.

## Background

[#202 Set policy for LLM-generated
tests](https://github.com/web-platform-tests/rfcs/issues/202) includes evidence
for public interest in a formal policy for LLM usage in authoring contributions
to WPT.

The Chrome team is exploring applications of LLMs for detecting coverage gaps
and for filling those gaps with generated code. ([Project
repository](https://github.com/GoogleChromeLabs/wpt-gen), [April 2026
presentation](https://www.youtube.com/watch?v=9r0PBbJFLoM))

A few examples of policies on LLM use in FOSS contributions:

- permissive
- [ghostty/AI_POLICY.md at main · ghostty-org/ghostty](https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md)
- [Policy about LLM generated code from PRs · Issue #28335 · opencv/opencv](https://github.com/opencv/opencv/issues/28335)
- [CONTRIBUTING.md: Guidelines relevant to AI-assisted contributions by gasche · Pull Request #14052 · ocaml/ocaml](https://github.com/ocaml/ocaml/pull/14052)
- [LLVM AI Tool Use Policy — LLVM 23.0.0git documentation](https://llvm.org/docs/AIToolPolicy.html)
- prohibitive
- [Code of Conduct ⚡ Zig Programming Language](https://ziglang.org/code-of-conduct/#strict-no-llm-no-ai-policy)
- [Getting Started - The Servo Book](https://book.servo.org/contributing/getting-started.html#ai-contributions)
Comment on lines +20 to +29
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's three others which are notably relevant here: Chromium's, and Firefox's, given they are two of the five repos which have approval to land changes in WPT without further review. (WebKit and Test262 do not currently have policies — TC39's explicitly does not apply to code.)


## Details

Proposed text:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed for where?


> ### For Individual Contributors
>
> #### Disclosure
>
> Contributions that contain substantial amounts of tool-generated content must
> be labeled as such.
Comment on lines +39 to +40
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither Chromium nor Firefox require this today, and it's entirely plausible we've already had commits land into WPT via exports which don't meet this bar.

That said, Chromium's policy here is currently:

To aid reviewers, authors should flag areas that they are not confident about that had AI assistance.

This is maybe a weaker form, and hopefully something more in line with existing contributions.

>
> #### Attribution
>
> Commits generated entirely by an LLM must be attributed to the LLM in the
> "Author" field.
Comment on lines +44 to +45
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels problematic. If we attribute a PR to Claude, Gemini, or OpenAI's GPT, if I try and contact the author… well, I don't think Anthropic, Google, or OpenAI are going to be very helpful?

Both Chromium and Firefox's policies are crystal clear that humans are still the authors and must self-review before submitting.

Therefore, when there's still a human very much in the loop who is required to self-review, it does not seem reasonable to consider the LLM the author — and the Chromium policy is explicit that, "Authors must attest that the code they submit is their original creation, regardless of whether AI tooling was used".

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong agree on this point for all the reasons you give. The Author field's purpose is to give a contact for problems/question, not assign blame. Listing an LLM is worthless there.

And also, yeah, assigning authorship to an LLM is abrogating your responsibility as an engineer to commit useful code that you understand.

>
> #### Understanding
>
> Every pull request must be initiated by one human. That person must author
> the pull request description, understand every change proposed, and be
> prepared to engage in technical discussion regarding those changes.
>
> ### For Trusted External Review
>
> Some external projects conduct review which the WPT maintainers recognize as
> authoritative. From rendering engines like Gecko to dedicated test suites
> like WASM, patches merged in these projects are incorporated into WPT without
> further review. The policy outlined by this document does not apply to these
> contributions; the external projects are trusted to determine their own
> mechanisms for quality assurance.
Comment on lines +53 to +60
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it should probably be at least in part in another RFC that tries to define our existing policies?

As far as I'm aware, there's currently five repos which have approval to incorporate based on downstream review — Chromium, Firefox, Servo, Test262, and WebKit.

My understanding of the unwritten policy is we trust downstream reviewers; I can't even find the various places where we've elucidated parts of the policy over the years.


## Risks
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worthwhile including at least a few more technical risks:

  1. Contributions of tests generated by an LLM closely looking at a specific implementation's code, matching that implementation, rather than the spec. (This is, of course, already an issue — but could inevitably become more of a problem if we get more, larger contributions.)
  2. Contributions not matching the spec at all. I've seen this mostly with trying to generate tests to assert ordering of things which end of using HTML's parallelism and HTML's event loops; that case is especially annoying because it can lead to flaky tests.


### Discouraging volunteers

All but the most permissive policy is effectively another hurdle to
contributing to the project. Friction in the contribution process could deter
people who might otherwise volunteer their time to help improve the project.

In some sense, adding friction is the goal of this policy. New technology has
removed barriers which previously restricted unqualified individuals from
participation. Rather than introducing more restrictions on good-faith actors,
an ideal policy will buttress eroded structural barriers with more intentional
social ones.

### Encouraging low-value contributions

All but the most restrictive policy could be interpreted as an invitation to
take shortcuts which undermine the quality of contributions.

However, it will not be possible to strictly enforce any policy. It inevitably
falls on contributors to follow rules and for administrators to police
transgressions. Respect in public works projects is never guaranteed; policies
exist only to make expectations clear (this is the same dynamic that guides the
design and enforcement of codes of conduct).