docs: Added tutorial regarding RLTrainer#801
docs: Added tutorial regarding RLTrainer#801onel wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
Conversation
|
|
|
Hmmm I'm probably not gonna add this, sorry -- I don't really want to encourage people to use RLTrainer lol, it's unmaintained + not performance-optimized + not adding new features. Prime-rl should be used instead. It has all the same features, much better performance, and is heavily battle-tested + supports similar configuration. I'm keeping RLTrainer in the repo because some people already do use it + like it (much to my chagrin), and it's a convenient example for reading simple RL code for instructional purposes. But if someone wants to run a trainer as-is without modifying or reading the code, they should not use or think about RLTrainer at all. |
@ChrisDelClea this might be helpful to you 👆 @willccbb makes perfect sense. do you want me to look into adding a similar tutorial for Prime-rl? would that be helpful? |
|
You're welcome to propose any ideas! Though FYI we're also working on our own content here + might be easier to wait for that and make suggestions for changes as needed? |
|
Both sound good. I'll wait for you to create new content. If you have a certain type of documentation that's not priority or the team doesn't have bandwidth LMK, I can help with that. Closing this PR for now. |
Description
As per the comment in the issue, I create a plan about how to use RLTrainer, end to end.
From prerequisites, setting up, running, analyzing.
Issue #748
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Used an AI agent to write the content but checked and corrected the final output.
Note
Low Risk
Documentation-only addition with no runtime or API changes; main risk is potential user confusion if commands/config examples drift from current tooling.
Overview
Adds a new end-to-end documentation page,
docs/rltrainer-tutorial.md, that walks users through running an RL training job withvf.RLTrainer(workspace setup viaprime lab setup --vf-rl, configuring a samplereverse-textTOML, launching training, and understanding outputs).Includes guidance on interpreting metrics/checkpoints plus common troubleshooting and advanced configuration knobs (batch sizing, generation params, LoRA vs full finetune, wandb logging), along with next-step links to other environments and
prime-rl.Written by Cursor Bugbot for commit ce6879b. This will update automatically on new commits. Configure here.