[fix] Resolve broken snapshot access in Daytona#4464
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR adds Daytona snapshot resolution with caching and retry logic. The sandbox creation flow now accepts a snapshot name via ChangesSnapshot Name Resolution with Caching and Retry
Sequence DiagramsequenceDiagram
participant Caller
participant CreateSandbox
participant SnapshotCache
participant DaytonaAPI
participant DaytonaCreate
Caller->>CreateSandbox: create_sandbox(snapshot_name, target)
CreateSandbox->>SnapshotCache: lookup (snapshot_name, target)
alt Cache Hit
SnapshotCache-->>CreateSandbox: snapshot_id
else Cache Miss
CreateSandbox->>DaytonaAPI: GET /snapshots
DaytonaAPI-->>CreateSandbox: snapshot list
CreateSandbox->>SnapshotCache: store (snapshot_name, target) → snapshot_id
end
CreateSandbox->>DaytonaCreate: create ephemeral sandbox (snapshot_id, target)
alt Creation Success
DaytonaCreate-->>CreateSandbox: sandbox_id
else Not Found Error
CreateSandbox->>SnapshotCache: evict (snapshot_name, target)
CreateSandbox->>DaytonaAPI: GET /snapshots (fresh lookup)
DaytonaAPI-->>CreateSandbox: snapshot list
CreateSandbox->>SnapshotCache: store (snapshot_name, target) → snapshot_id
CreateSandbox->>DaytonaCreate: retry create sandbox (snapshot_id, target)
DaytonaCreate-->>CreateSandbox: sandbox_id
end
CreateSandbox-->>Caller: sandbox_id
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d05b3bb0-34d6-4675-97a6-009b70d62d0d
📒 Files selected for processing (1)
sdks/python/agenta/sdk/engines/running/runners/daytona.py
| response = httpx.get( | ||
| f"{api_url.rstrip('/')}/snapshots", | ||
| params={"limit": 25}, | ||
| headers={"Authorization": f"Bearer {api_key}"}, | ||
| timeout=10.0, | ||
| ) | ||
| response.raise_for_status() | ||
| items = response.json().get("items", []) |
There was a problem hiding this comment.
Potential pagination issue with hardcoded limit=25.
If the Daytona account has more than 25 snapshots, the target snapshot may not appear in the first page of results, causing a false "not found" error. Consider increasing the limit or implementing pagination to ensure the snapshot is found.
Additionally, the httpx call lacks error handling—API failures will propagate as raw httpx.HTTPStatusError or httpx.TimeoutException rather than a clear RuntimeError.
🛡️ Proposed fix: increase limit and add error handling
+ try:
response = httpx.get(
f"{api_url.rstrip('/')}/snapshots",
- params={"limit": 25},
+ params={"limit": 100},
headers={"Authorization": f"Bearer {api_key}"},
timeout=10.0,
)
response.raise_for_status()
+ except httpx.HTTPStatusError as e:
+ raise RuntimeError(
+ f"Failed to list Daytona snapshots: HTTP {e.response.status_code}"
+ ) from e
+ except httpx.RequestError as e:
+ raise RuntimeError(f"Failed to connect to Daytona API: {e}") from e
items = response.json().get("items", [])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = httpx.get( | |
| f"{api_url.rstrip('/')}/snapshots", | |
| params={"limit": 25}, | |
| headers={"Authorization": f"Bearer {api_key}"}, | |
| timeout=10.0, | |
| ) | |
| response.raise_for_status() | |
| items = response.json().get("items", []) | |
| try: | |
| response = httpx.get( | |
| f"{api_url.rstrip('/')}/snapshots", | |
| params={"limit": 100}, | |
| headers={"Authorization": f"Bearer {api_key}"}, | |
| timeout=10.0, | |
| ) | |
| response.raise_for_status() | |
| except httpx.HTTPStatusError as e: | |
| raise RuntimeError( | |
| f"Failed to list Daytona snapshots: HTTP {e.response.status_code}" | |
| ) from e | |
| except httpx.RequestError as e: | |
| raise RuntimeError(f"Failed to connect to Daytona API: {e}") from e | |
| items = response.json().get("items", []) |
There was a problem hiding this comment.
Pull request overview
Fixes Daytona sandbox creation when a snapshot marked general: true is visible cross-org but cannot be resolved by name from the current org by adding a name→ID resolution step (with caching) and retrying sandbox creation if a cached ID becomes stale.
Changes:
- Added a TTL+LRU cache for
(snapshot_name, target_region) -> snapshot_idmappings to avoid repeated snapshot listing calls. - Implemented
_resolve_snapshot_id()to list Daytona snapshots and pick an active snapshot matching name + region, then cache its ID. - Updated sandbox creation to use the resolved snapshot ID and invalidate+retry once on snapshot “not found/not available” errors.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| response = httpx.get( | ||
| f"{api_url.rstrip('/')}/snapshots", | ||
| params={"limit": 25}, | ||
| headers={"Authorization": f"Bearer {api_key}"}, | ||
| timeout=10.0, | ||
| ) | ||
| response.raise_for_status() | ||
| items = response.json().get("items", []) | ||
|
|
||
| for item in items: | ||
| if ( | ||
| item.get("name") == name | ||
| and item.get("state") == "active" | ||
| and target in (item.get("regionIds") or []) | ||
| ): | ||
| snapshot_id = item["id"] | ||
| self._snapshot_id_cache.put(cache_key, snapshot_id) | ||
| return snapshot_id | ||
|
|
| target = os.getenv("DAYTONA_TARGET") or os.getenv("AGENTA_REGION") or "eu" | ||
| snapshot_id = self._resolve_snapshot_id(snapshot_ref, target) | ||
|
|
| try: | ||
| sandbox = _create(snapshot_id) | ||
| except Exception as e: | ||
| # Snapshot may have been rebuilt with a new ID mid-cache; | ||
| # invalidate and retry once with a fresh lookup. | ||
| message = str(e).lower() | ||
| if "not found" in message or "not available" in message: |
| def _resolve_snapshot_id(self, name: str, target: str) -> str: | ||
| """Resolve a snapshot name to its ID for the given region. | ||
|
|
||
| Sandbox creation by snapshot *name* only resolves snapshots owned by | ||
| the requesting org, even when the snapshot is marked ``general: true`` | ||
| and visible in the dashboard. Resolving by *ID* works cross-org, so we | ||
| list snapshots, find one matching name + region + active state, and | ||
| cache the result. | ||
| """ |
Railway Preview Environment
|
No description provided.