Skip to content

Conversation

@mikasenghaas
Copy link
Member

@mikasenghaas mikasenghaas commented Jan 28, 2026

Description

This PR introduces the EnvClient and EnvServer which expose the run_rollout and run_group methods of an environment from a separate process (pool). This is especially useful for multi-env training (e.g. in prime-rl) and multi-env evals (e.g. in vf-eval or online evals).

Example

Runnning vf-eval will spawn environments in env-server mode by default

uv run vf-eval gsm8k -n5 -r3 

Design

Env Server Mode

You can put an environment into "env server mode" by calling

env = vf.load_environment(env_id, **env_args)
await env.start_server()

This will implicitly start an env server as a sidecar (in a subprocess) and try to route all calls to run_rollout and run_group to the env server.

EnvServer

A EnvServer is initialized like a regular environment with an env_id and env_args

env_server = ZMQEnvServer(
    env_id=args.env_id,
    env_args=args.env_args,
    address=address
)

try:
    await server.run()
finally:
    await server.close()

EnvClient

A EnvClient communicates with a env server over the configured address

env = ZMQEnvClient(address=address)

await env.run_rollout(...) # same as Environment.run_rollout
await env.run_group(...) # same as Environment.run_group
await env.evaluate(...) # same as Environment.evaluate

Sidecar Pattern

To sidecar an env server (e.g. from vf-eval) simply wrap the run_server class method in a Process and connect the client to the same address

env_server = Process(
    target=ZMQEnvServer.run_server,
    args=(config.env_id, config.env_args),
    kwargs=dict(address=address)
)
env_server.start()
env = ZMQEnvClient(address=address)

try:
   results = await env.evaluate(...)
finally:
  env_worker.terminate()
  env_worker.join(timeout=5)
  if env_worker.is_alive():
      env_worker.kill()
      env_worker.join()

Misc Changes

  • vf.setup_logging(...) supports logging to file now as well
  • We now store error info in the serializable RolloutOutput to be able to display error chains as before

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

High Risk
Introduces a new multiprocess, networked execution path (ZMQ/msgpack) and refactors core rollout scheduling/serialization, which can affect correctness, performance, and cleanup behavior across evaluation runs.

Overview
Adds an environment “server mode” for evaluation/training. Environment can now spawn a sidecar ZMQEnvServer process and route run_rollout/run_group over a new EnvClient/ZMQEnvClient using ZMQ + msgpack, and vf-eval is updated to start/stop the server around each run.

Refactors rollout execution and serialization. Generation/scoring no longer use separate generation vs scoring semaphores; a single concurrency limit is applied via with_sem, tasks are always cleaned up on exit, and run_rollout/run_group now return pre-serialized RolloutOutput objects (builder now accumulates outputs, not states).

Changes error and logging surfaces. Rollout error is now a structured ErrorInfo (type + chain strings) instead of a repr string, ErrorChain string/repr semantics are swapped to preserve prior displays, and logging supports optional file output; tests/docs/CLI config are updated accordingly. Dependencies add pyzmq and msgpack.

Written by Cursor Bugbot for commit ed2b7d9. This will update automatically on new commits. Configure here.

@mikasenghaas mikasenghaas changed the base branch from overhaul-results-saving to main January 28, 2026 17:16
@mikasenghaas
Copy link
Member Author

  • still missing some unit tests which i will add (those will replace the __name__ == __main__ blocks in the env server/client impls which i used for debugging

@willccbb
Copy link
Member

I think removing the scoring concurrency is fine. Users can make this part of their rubrics if they want (via class_objects or globals), have used this for multi-part judge rubrics + works well.

@mikasenghaas mikasenghaas mentioned this pull request Jan 29, 2026
13 tasks
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@mikasenghaas mikasenghaas changed the title env server v2 env server Jan 29, 2026
@willccbb willccbb merged commit 53e50f7 into main Jan 30, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants