Skip to content

[fix] Resolve broken snapshot access in Daytona#4464

Open
junaway wants to merge 1 commit into
mainfrom
fix/broken-daytona-snapshots
Open

[fix] Resolve broken snapshot access in Daytona#4464
junaway wants to merge 1 commit into
mainfrom
fix/broken-daytona-snapshots

Conversation

@junaway
Copy link
Copy Markdown
Contributor

@junaway junaway commented May 27, 2026

No description provided.

Copilot AI review requested due to automatic review settings May 27, 2026 14:48
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment May 27, 2026 2:49pm

Request Review

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working python Pull requests that update Python code labels May 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Automatic snapshot resolution during sandbox creation with intelligent retry mechanism for improved reliability and seamless error recovery.
  • Performance

    • Added caching layer to optimize snapshot lookups, reducing overhead and accelerating sandbox creation in repeated scenarios.

Walkthrough

This PR adds Daytona snapshot resolution with caching and retry logic. The sandbox creation flow now accepts a snapshot name via DAYTONA_SNAPSHOT, resolves it to a concrete snapshot ID for the target region using a new API call, caches the mapping with a 24-hour TTL, and retries sandbox creation with a fresh lookup if the cached ID becomes unavailable.

Changes

Snapshot Name Resolution with Caching and Retry

Layer / File(s) Summary
Cache Infrastructure Setup
sdks/python/agenta/sdk/engines/running/runners/daytona.py
Adds httpx and TTLLRUCache imports and initializes a class-level TTL LRU cache field (_snapshot_id_cache) to store resolved (snapshot_name, target) pairs for 24 hours.
Snapshot Resolution via Daytona API
sdks/python/agenta/sdk/engines/running/runners/daytona.py
Implements _resolve_snapshot_id() method that calls Daytona's /snapshots API, filters snapshots by name, active status, and target region membership, caches results, and raises RuntimeError if no suitable snapshot is found.
Sandbox Creation with Snapshot Resolution and Retry
sdks/python/agenta/sdk/engines/running/runners/daytona.py
Updates _create_sandbox() to require DAYTONA_SNAPSHOT, resolve it to a concrete snapshot ID via target region, and wraps creation with retry logic that evicts stale cache entries and re-resolves snapshot IDs on "not found"/"not available" errors.

Sequence Diagram

sequenceDiagram
  participant Caller
  participant CreateSandbox
  participant SnapshotCache
  participant DaytonaAPI
  participant DaytonaCreate
  Caller->>CreateSandbox: create_sandbox(snapshot_name, target)
  CreateSandbox->>SnapshotCache: lookup (snapshot_name, target)
  alt Cache Hit
    SnapshotCache-->>CreateSandbox: snapshot_id
  else Cache Miss
    CreateSandbox->>DaytonaAPI: GET /snapshots
    DaytonaAPI-->>CreateSandbox: snapshot list
    CreateSandbox->>SnapshotCache: store (snapshot_name, target) → snapshot_id
  end
  CreateSandbox->>DaytonaCreate: create ephemeral sandbox (snapshot_id, target)
  alt Creation Success
    DaytonaCreate-->>CreateSandbox: sandbox_id
  else Not Found Error
    CreateSandbox->>SnapshotCache: evict (snapshot_name, target)
    CreateSandbox->>DaytonaAPI: GET /snapshots (fresh lookup)
    DaytonaAPI-->>CreateSandbox: snapshot list
    CreateSandbox->>SnapshotCache: store (snapshot_name, target) → snapshot_id
    CreateSandbox->>DaytonaCreate: retry create sandbox (snapshot_id, target)
    DaytonaCreate-->>CreateSandbox: sandbox_id
  end
  CreateSandbox-->>Caller: sandbox_id
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning No pull request description was provided by the author, so the description is completely empty and fails to convey any information about the changeset. Add a description explaining the snapshot access issue, how the snapshot ID resolution mechanism works, and why caching was added to improve performance.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '[fix] Resolve broken snapshot access in Daytona' directly aligns with the changeset, which implements snapshot ID resolution to fix broken snapshot access in the Daytona runner module.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 60.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/broken-daytona-snapshots

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d05b3bb0-34d6-4675-97a6-009b70d62d0d

📥 Commits

Reviewing files that changed from the base of the PR and between fec4391 and 0049015.

📒 Files selected for processing (1)
  • sdks/python/agenta/sdk/engines/running/runners/daytona.py

Comment on lines +184 to +191
response = httpx.get(
f"{api_url.rstrip('/')}/snapshots",
params={"limit": 25},
headers={"Authorization": f"Bearer {api_key}"},
timeout=10.0,
)
response.raise_for_status()
items = response.json().get("items", [])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Potential pagination issue with hardcoded limit=25.

If the Daytona account has more than 25 snapshots, the target snapshot may not appear in the first page of results, causing a false "not found" error. Consider increasing the limit or implementing pagination to ensure the snapshot is found.

Additionally, the httpx call lacks error handling—API failures will propagate as raw httpx.HTTPStatusError or httpx.TimeoutException rather than a clear RuntimeError.

🛡️ Proposed fix: increase limit and add error handling
+        try:
             response = httpx.get(
                 f"{api_url.rstrip('/')}/snapshots",
-                params={"limit": 25},
+                params={"limit": 100},
                 headers={"Authorization": f"Bearer {api_key}"},
                 timeout=10.0,
             )
             response.raise_for_status()
+        except httpx.HTTPStatusError as e:
+            raise RuntimeError(
+                f"Failed to list Daytona snapshots: HTTP {e.response.status_code}"
+            ) from e
+        except httpx.RequestError as e:
+            raise RuntimeError(f"Failed to connect to Daytona API: {e}") from e
         items = response.json().get("items", [])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
response = httpx.get(
f"{api_url.rstrip('/')}/snapshots",
params={"limit": 25},
headers={"Authorization": f"Bearer {api_key}"},
timeout=10.0,
)
response.raise_for_status()
items = response.json().get("items", [])
try:
response = httpx.get(
f"{api_url.rstrip('/')}/snapshots",
params={"limit": 100},
headers={"Authorization": f"Bearer {api_key}"},
timeout=10.0,
)
response.raise_for_status()
except httpx.HTTPStatusError as e:
raise RuntimeError(
f"Failed to list Daytona snapshots: HTTP {e.response.status_code}"
) from e
except httpx.RequestError as e:
raise RuntimeError(f"Failed to connect to Daytona API: {e}") from e
items = response.json().get("items", [])

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Daytona sandbox creation when a snapshot marked general: true is visible cross-org but cannot be resolved by name from the current org by adding a name→ID resolution step (with caching) and retrying sandbox creation if a cached ID becomes stale.

Changes:

  • Added a TTL+LRU cache for (snapshot_name, target_region) -> snapshot_id mappings to avoid repeated snapshot listing calls.
  • Implemented _resolve_snapshot_id() to list Daytona snapshots and pick an active snapshot matching name + region, then cache its ID.
  • Updated sandbox creation to use the resolved snapshot ID and invalidate+retry once on snapshot “not found/not available” errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +184 to +202
response = httpx.get(
f"{api_url.rstrip('/')}/snapshots",
params={"limit": 25},
headers={"Authorization": f"Bearer {api_key}"},
timeout=10.0,
)
response.raise_for_status()
items = response.json().get("items", [])

for item in items:
if (
item.get("name") == name
and item.get("state") == "active"
and target in (item.get("regionIds") or [])
):
snapshot_id = item["id"]
self._snapshot_id_cache.put(cache_key, snapshot_id)
return snapshot_id

Comment on lines +229 to +231
target = os.getenv("DAYTONA_TARGET") or os.getenv("AGENTA_REGION") or "eu"
snapshot_id = self._resolve_snapshot_id(snapshot_ref, target)

Comment on lines +277 to +283
try:
sandbox = _create(snapshot_id)
except Exception as e:
# Snapshot may have been rebuilt with a new ID mid-cache;
# invalidate and retry once with a fresh lookup.
message = str(e).lower()
if "not found" in message or "not available" in message:
Comment on lines +167 to +175
def _resolve_snapshot_id(self, name: str, target: str) -> str:
"""Resolve a snapshot name to its ID for the given region.

Sandbox creation by snapshot *name* only resolves snapshots owned by
the requesting org, even when the snapshot is marked ``general: true``
and visible in the dashboard. Resolving by *ID* works cross-org, so we
list snapshots, find one matching name + region + active state, and
cache the result.
"""
@github-actions
Copy link
Copy Markdown
Contributor

Railway Preview Environment

Preview URL https://gateway-production-7c33.up.railway.app/w
Image tag pr-4464-a93039e
Status Failed
Railway logs Open logs
Logs View workflow run
Updated at 2026-05-27T15:00:05.077Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python Pull requests that update Python code size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants