Polar is a RL rollout framework for real-world agent harnesses.
- Harness as Environment. Bring your agent harnesses as RL-ready environments without code change.
- Smart Rollout Pipeline. Save GPU hours with Polar's parallel rollout staging & runtime prewarm.
- Rollout as a Service. Server mode by design -- scaling Async RL with any training frameworks.
The Rollout Server manages and dispatches client requests into distributed Gateway Nodes, which asynchronously prepare runtime, execute agents, build trajectories and evaluate them. Agent harnesses are listened by a proxy that sits between agnostic agent execution processes and local inference servers.
uv venv
uv pip install -e .uv pip install --prerelease=allow sglang==0.5.10
bash scripts/patch/patch_sglang.shThe patch applies necessary TITO and prompt token id emission on pinned sglang version. We'll remove this once upstream supports go through. vllm integration is on the way.
Polar is trainer agnostic. So choice of Trainer and Training Backend are highly flexible given Polar's server boundaries.
Currently, we provide a demo-purpose Slime integration in Slime bridge installation guide.
uv pip install -e ".[swebench]"cd web && npm install && npm run build- ⭐ Choose your Agent Harness: pick a built-in harness, or use the generic shell harness with wrapped agents.
- 🚀 Trajectory Construction and Eval: See builder and evaluator guides for registered strategies.
- 🔧 Deployment Topology: configure the Polar service.
▶️ Request for Rollout: client side task submission via rollout API.
A typical local run uses five commands. Each takes the same topology.yaml.
polar serve_rollout -c topology.yaml # central orchestrator (port 8080)
polar serve_gateway -c topology.yaml --node-id <node> # one per gateway node (port 8100+)
polar dashboard -c topology.yaml [--port 8090] # observability & monitoring dashboard
polar submit <task.json|yaml> -c topology.yaml # submit a task and tail it
polar status -c topology.yaml # one-shot health / topology check
- Calculator: minimal smoke test.
- Count Stars: minimal test for VLM.
- SWE-bench Verified: benchmark-style evaluation on SWE-bench Verified tasks.
- SWE-Gym Slime GRPO: training path that connects Polar rollouts to Slime.
This project is under active development. We are adding new examples for different tasks / models on diverse hardware setups. Contributions are welcome!
Important
If you find it useful, please consider citing our work:
@article{zhang2026prorl,
title={ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents},
author={Zhang, Hao and Liu, Mingjie and Zhang, Shaokun and Han, Songyang and Hu, Jian and Jin, Zhenghui and Zhang, Yuchi and Diao, Shizhe and Lu, Ximing and Xu, Binfeng and others},
journal={arXiv preprint arXiv:2603.18815},
year={2026}
}

