Skip to content

Conversation

@rasdani
Copy link
Contributor

@rasdani rasdani commented Jan 25, 2026

Description

Extract sandbox lifecycle management into a reusable mixin for CliAgentEnv.
Can be used in SandboxEnv too.

Changes

  • Add SandboxMixin class with ThreadedAsyncSandboxClient, retry logic, and bulk cleanup
  • CliAgentEnv now inherits SandboxMixin alongside MultiTurnEnv
  • Add active_sandboxes tracking to prevent orphaned sandboxes on Ctrl+C
  • Add @vf.teardown hooks for bulk sandbox cleanup and client teardown
  • Add retry with exponential backoff for sandbox API calls
  • Use connection-pooled ThreadedAsyncSandboxClient instead of per-call AsyncSandboxClient

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Touches sandbox lifecycle and async concurrency/streaming paths (creation, teardown, retries, and cancellation), where subtle timing issues can cause leaks or hangs; changes are scoped to experimental envs but affect infrastructure behavior.

Overview
Refactors experimental sandboxed agent environments by introducing SandboxMixin (with a pooled ThreadedAsyncSandboxClient, retry/backoff, active sandbox tracking, and bulk teardown) and making CliAgentEnv inherit it; CliAgentEnv now centralizes sandbox creation/deletion, adds per-rollout timeout hooks, and adds extensive request/stream/LLM flow logging plus HTTP future timeouts to avoid hung intercept handlers.

Improves reliability around interrupts: MultiTurnEnv.rollout now explicitly runs cleanup on asyncio.CancelledError, and CliAgentEnv adds @vf.teardown hooks to bulk-delete leftover sandboxes and tear down the sandbox client threadpool.

Enhances HarborEnv with per-task timeout/resource overrides from task.toml, optional capture of per-episode artifacts from /logs, and more robust verifier execution (background-job based with tenacity retries) while skipping tests on prior infra errors.

Documentation updates clarify that @vf.cleanup methods should be idempotent and update CliAgentEnv parameter docs accordingly.

Written by Cursor Bugbot for commit 3e1ec44. This will update automatically on new commits. Configure here.

@rasdani rasdani force-pushed the daniel/sandbox-mixin branch from e830b56 to a0e709d Compare January 25, 2026 21:59
@rasdani rasdani force-pushed the daniel/sandbox-mixin branch from a0e709d to e46ccc6 Compare January 28, 2026 03:22
@rasdani rasdani force-pushed the daniel/sandbox-mixin branch from e46ccc6 to 2c7a502 Compare January 28, 2026 05:17
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants