docs: Added tutorial regarding RLTrainer by onel · Pull Request #801 · PrimeIntellect-ai/verifiers

onel · 2026-01-29T00:45:07Z

Description

As per the comment in the issue, I create a plan about how to use RLTrainer, end to end.
From prerequisites, setting up, running, analyzing.

Issue #748

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Used an AI agent to write the content but checked and corrected the final output.

Note

Low Risk
Documentation-only addition with no runtime or API changes; main risk is potential user confusion if commands/config examples drift from current tooling.

Overview
Adds a new end-to-end documentation page, docs/rltrainer-tutorial.md, that walks users through running an RL training job with vf.RLTrainer (workspace setup via prime lab setup --vf-rl, configuring a sample reverse-text TOML, launching training, and understanding outputs).

Includes guidance on interpreting metrics/checkpoints plus common troubleshooting and advanced configuration knobs (batch sizing, generation params, LoRA vs full finetune, wandb logging), along with next-step links to other environments and prime-rl.

^{Written by Cursor Bugbot for commit ce6879b. This will update automatically on new commits. Configure here.}

CLAassistant · 2026-01-29T00:45:16Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ onel
❌ askmanu[bot]
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

willccbb · 2026-02-02T02:51:35Z

Hmmm I'm probably not gonna add this, sorry -- I don't really want to encourage people to use RLTrainer lol, it's unmaintained + not performance-optimized + not adding new features. Prime-rl should be used instead. It has all the same features, much better performance, and is heavily battle-tested + supports similar configuration. I'm keeping RLTrainer in the repo because some people already do use it + like it (much to my chagrin), and it's a convenient example for reading simple RL code for instructional purposes. But if someone wants to run a trainer as-is without modifying or reading the code, they should not use or think about RLTrainer at all.

onel · 2026-02-02T17:30:30Z

Hmmm I'm probably not gonna add this, sorry -- I don't really want to encourage people to use RLTrainer lol, it's unmaintained + not performance-optimized + not adding new features. Prime-rl should be used instead. It has all the same features, much better performance, and is heavily battle-tested + supports similar configuration. I'm keeping RLTrainer in the repo because some people already do use it + like it (much to my chagrin), and it's a convenient example for reading simple RL code for instructional purposes. But if someone wants to run a trainer as-is without modifying or reading the code, they should not use or think about RLTrainer at all.

@ChrisDelClea this might be helpful to you 👆

@willccbb makes perfect sense. do you want me to look into adding a similar tutorial for Prime-rl? would that be helpful?

willccbb · 2026-02-02T17:41:07Z

You're welcome to propose any ideas! Though FYI we're also working on our own content here + might be easier to wait for that and make suggestions for changes as needed?

onel · 2026-02-02T19:06:15Z

Both sound good. I'll wait for you to create new content.

If you have a certain type of documentation that's not priority or the team doesn't have bandwidth LMK, I can help with that.
Let's move the discussion to #748

Closing this PR for now.

onel and others added 3 commits January 28, 2026 11:44

Update docs/rltrainer-tutorial.md

7db00df

Update docs/rltrainer-tutorial.md

952b6a7

Fixed links to be absolute

ce6879b

onel changed the title ~~Added tutorial regarding RLTrainer~~ docs: Added tutorial regarding RLTrainer Jan 29, 2026

onel closed this Feb 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Added tutorial regarding RLTrainer#801

docs: Added tutorial regarding RLTrainer#801
onel wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
onel:askmanu/rl-trainer-1

onel commented Jan 29, 2026 •

edited by cursor Bot

Loading

Uh oh!

CLAassistant commented Jan 29, 2026

Uh oh!

willccbb commented Feb 2, 2026

Uh oh!

onel commented Feb 2, 2026 •

edited

Loading

Uh oh!

willccbb commented Feb 2, 2026

Uh oh!

onel commented Feb 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

onel commented Jan 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

CLAassistant commented Jan 29, 2026

Uh oh!

willccbb commented Feb 2, 2026

Uh oh!

onel commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willccbb commented Feb 2, 2026

Uh oh!

onel commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

onel commented Jan 29, 2026 •

edited by cursor Bot

Loading

onel commented Feb 2, 2026 •

edited

Loading

onel commented Feb 2, 2026 •

edited

Loading