-
Notifications
You must be signed in to change notification settings - Fork 481
Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu… #784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| from .envs.actor import Actor # noqa # isort: skip | ||
| from .envs.protocol import EpisodeRequest, GenerateResult, Protocol # noqa # isort: skip | ||
| from .envs.multiagent_env import MultiAgentEnv # noqa # isort: skip | ||
| from .rubrics.multiagent_rubric import MultiAgentRubric # noqa # isort: skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New multi-agent classes lack documentation updates
Medium Severity · Bugbot Rules
This PR adds major new user-facing classes (Actor, Protocol, MultiAgentEnv, MultiAgentRubric, MultiAgentOrchestrator) exported in __all__, but no corresponding documentation updates are included. Per the review rules, any PR adding core user-facing functionality needs to update relevant documentation in docs/environments.md, docs/training.md, and docs/reference.md.
Additional Locations (2)
verifiers/envs/protocol.py
Outdated
| env_name = inp.get("task") or self._get_default_env() | ||
| env = self.get_env(env_name) | ||
| if env.rubric: | ||
| await env.rubric.score_rollout(state, score_sem=score_sem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spawn() uses wrong scoring method for multi-agent rubrics
Medium Severity
The spawn() method calls score_rollout() for scoring when score=True (the default). However, MultiAgentRubric stores per-actor reward functions in actor_reward_funcs, which score_rollout() (inherited from parent Rubric) does not use—it only processes functions in self.funcs. Additionally, score_rollout() does not compute advantages, which are required for GRPO training. Spawned multi-agent states would have incomplete rewards and missing advantages.
…lay different models per actor. Should show in logs in CLI display. Example in twenty question/poker.
… Also, can setup currently to have different models with different system prompts play each other. Still validating environments
Description
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Adds first-class multi-agent support across the library with per-actor rollouts, scoring, and training.
Actor,Protocol,MultiAgentEnv, andMultiAgentRubricfor multi-agent turn management, spawning, per-actor trajectory tagging, and per-actor rewards/advantagesMultiAgentEnv.generate()flattens game states into per-actor states and computes per-actor GRPO advantages; results now includeactor_idMultiAgentOrchestratordrives training viaProtocol.generate()and builds microbatches from flattened, per-actor trajectoriesverifiers/__init__.py;eval_utils.save_rollout_resultswritesactor_idrock_paper_scissors(simultaneous moves with custom rollout) andtwenty_questions(alternating turns, asymmetric actors), each with datasets and rubrics for per-actor rewardsWritten by Cursor Bugbot for commit 1e5f474. This will update automatically on new commits. Configure here.