Skip to content

docs: Added tutorial regarding RLTrainer#801

Closed
onel wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
onel:askmanu/rl-trainer-1
Closed

docs: Added tutorial regarding RLTrainer#801
onel wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
onel:askmanu/rl-trainer-1

Conversation

@onel
Copy link
Copy Markdown

@onel onel commented Jan 29, 2026

Description

As per the comment in the issue, I create a plan about how to use RLTrainer, end to end.
From prerequisites, setting up, running, analyzing.

Issue #748

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

Used an AI agent to write the content but checked and corrected the final output.


Note

Low Risk
Documentation-only addition with no runtime or API changes; main risk is potential user confusion if commands/config examples drift from current tooling.

Overview
Adds a new end-to-end documentation page, docs/rltrainer-tutorial.md, that walks users through running an RL training job with vf.RLTrainer (workspace setup via prime lab setup --vf-rl, configuring a sample reverse-text TOML, launching training, and understanding outputs).

Includes guidance on interpreting metrics/checkpoints plus common troubleshooting and advanced configuration knobs (batch sizing, generation params, LoRA vs full finetune, wandb logging), along with next-step links to other environments and prime-rl.

Written by Cursor Bugbot for commit ce6879b. This will update automatically on new commits. Configure here.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ onel
❌ askmanu[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

@onel onel changed the title Added tutorial regarding RLTrainer docs: Added tutorial regarding RLTrainer Jan 29, 2026
@willccbb
Copy link
Copy Markdown
Member

willccbb commented Feb 2, 2026

Hmmm I'm probably not gonna add this, sorry -- I don't really want to encourage people to use RLTrainer lol, it's unmaintained + not performance-optimized + not adding new features. Prime-rl should be used instead. It has all the same features, much better performance, and is heavily battle-tested + supports similar configuration. I'm keeping RLTrainer in the repo because some people already do use it + like it (much to my chagrin), and it's a convenient example for reading simple RL code for instructional purposes. But if someone wants to run a trainer as-is without modifying or reading the code, they should not use or think about RLTrainer at all.

@onel
Copy link
Copy Markdown
Author

onel commented Feb 2, 2026

Hmmm I'm probably not gonna add this, sorry -- I don't really want to encourage people to use RLTrainer lol, it's unmaintained + not performance-optimized + not adding new features. Prime-rl should be used instead. It has all the same features, much better performance, and is heavily battle-tested + supports similar configuration. I'm keeping RLTrainer in the repo because some people already do use it + like it (much to my chagrin), and it's a convenient example for reading simple RL code for instructional purposes. But if someone wants to run a trainer as-is without modifying or reading the code, they should not use or think about RLTrainer at all.

@ChrisDelClea this might be helpful to you 👆

@willccbb makes perfect sense. do you want me to look into adding a similar tutorial for Prime-rl? would that be helpful?

@willccbb
Copy link
Copy Markdown
Member

willccbb commented Feb 2, 2026

You're welcome to propose any ideas! Though FYI we're also working on our own content here + might be easier to wait for that and make suggestions for changes as needed?

@onel
Copy link
Copy Markdown
Author

onel commented Feb 2, 2026

Both sound good. I'll wait for you to create new content.

If you have a certain type of documentation that's not priority or the team doesn't have bandwidth LMK, I can help with that.
Let's move the discussion to #748

Closing this PR for now.

@onel onel closed this Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants