[fix] Resolve broken snapshot access in Daytona by junaway · Pull Request #4464 · Agenta-AI/agenta

junaway · 2026-05-27T14:48:58Z

No description provided.

vercel · 2026-05-27T14:49:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	May 27, 2026 2:49pm

coderabbitai · 2026-05-27T14:49:24Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Automatic snapshot resolution during sandbox creation with intelligent retry mechanism for improved reliability and seamless error recovery.
Performance
- Added caching layer to optimize snapshot lookups, reducing overhead and accelerating sandbox creation in repeated scenarios.

Walkthrough

This PR adds Daytona snapshot resolution with caching and retry logic. The sandbox creation flow now accepts a snapshot name via DAYTONA_SNAPSHOT, resolves it to a concrete snapshot ID for the target region using a new API call, caches the mapping with a 24-hour TTL, and retries sandbox creation with a fresh lookup if the cached ID becomes unavailable.

Changes

Snapshot Name Resolution with Caching and Retry

Layer / File(s)	Summary
Cache Infrastructure Setup `sdks/python/agenta/sdk/engines/running/runners/daytona.py`	Adds `httpx` and `TTLLRUCache` imports and initializes a class-level TTL LRU cache field (`_snapshot_id_cache`) to store resolved `(snapshot_name, target)` pairs for 24 hours.
Snapshot Resolution via Daytona API `sdks/python/agenta/sdk/engines/running/runners/daytona.py`	Implements `_resolve_snapshot_id()` method that calls Daytona's `/snapshots` API, filters snapshots by name, active status, and target region membership, caches results, and raises `RuntimeError` if no suitable snapshot is found.
Sandbox Creation with Snapshot Resolution and Retry `sdks/python/agenta/sdk/engines/running/runners/daytona.py`	Updates `_create_sandbox()` to require `DAYTONA_SNAPSHOT`, resolve it to a concrete snapshot ID via target region, and wraps creation with retry logic that evicts stale cache entries and re-resolves snapshot IDs on "not found"/"not available" errors.

Sequence Diagram

sequenceDiagram
  participant Caller
  participant CreateSandbox
  participant SnapshotCache
  participant DaytonaAPI
  participant DaytonaCreate
  Caller->>CreateSandbox: create_sandbox(snapshot_name, target)
  CreateSandbox->>SnapshotCache: lookup (snapshot_name, target)
  alt Cache Hit
    SnapshotCache-->>CreateSandbox: snapshot_id
  else Cache Miss
    CreateSandbox->>DaytonaAPI: GET /snapshots
    DaytonaAPI-->>CreateSandbox: snapshot list
    CreateSandbox->>SnapshotCache: store (snapshot_name, target) → snapshot_id
  end
  CreateSandbox->>DaytonaCreate: create ephemeral sandbox (snapshot_id, target)
  alt Creation Success
    DaytonaCreate-->>CreateSandbox: sandbox_id
  else Not Found Error
    CreateSandbox->>SnapshotCache: evict (snapshot_name, target)
    CreateSandbox->>DaytonaAPI: GET /snapshots (fresh lookup)
    DaytonaAPI-->>CreateSandbox: snapshot list
    CreateSandbox->>SnapshotCache: store (snapshot_name, target) → snapshot_id
    CreateSandbox->>DaytonaCreate: retry create sandbox (snapshot_id, target)
    DaytonaCreate-->>CreateSandbox: sandbox_id
  end
  CreateSandbox-->>Caller: sandbox_id

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	No pull request description was provided by the author, so the description is completely empty and fails to convey any information about the changeset.	Add a description explaining the snapshot access issue, how the snapshot ID resolution mechanism works, and why caching was added to improve performance.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[fix] Resolve broken snapshot access in Daytona' directly aligns with the changeset, which implements snapshot ID resolution to fix broken snapshot access in the Daytona runner module.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 60.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/broken-daytona-snapshots

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d05b3bb0-34d6-4675-97a6-009b70d62d0d

📥 Commits

Reviewing files that changed from the base of the PR and between fec4391 and 0049015.

📒 Files selected for processing (1)

sdks/python/agenta/sdk/engines/running/runners/daytona.py

coderabbitai · 2026-05-27T14:53:16Z

+        response = httpx.get(
+            f"{api_url.rstrip('/')}/snapshots",
+            params={"limit": 25},
+            headers={"Authorization": f"Bearer {api_key}"},
+            timeout=10.0,
+        )
+        response.raise_for_status()
+        items = response.json().get("items", [])


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Potential pagination issue with hardcoded limit=25.

If the Daytona account has more than 25 snapshots, the target snapshot may not appear in the first page of results, causing a false "not found" error. Consider increasing the limit or implementing pagination to ensure the snapshot is found.

Additionally, the httpx call lacks error handling—API failures will propagate as raw httpx.HTTPStatusError or httpx.TimeoutException rather than a clear RuntimeError.

🛡️ Proposed fix: increase limit and add error handling

+ try: response = httpx.get( f"{api_url.rstrip('/')}/snapshots", - params={"limit": 25}, + params={"limit": 100}, headers={"Authorization": f"Bearer {api_key}"}, timeout=10.0, ) response.raise_for_status() + except httpx.HTTPStatusError as e: + raise RuntimeError( + f"Failed to list Daytona snapshots: HTTP {e.response.status_code}" + ) from e + except httpx.RequestError as e: + raise RuntimeError(f"Failed to connect to Daytona API: {e}") from e items = response.json().get("items", [])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

response = httpx.get(

f"{api_url.rstrip('/')}/snapshots",

params={"limit": 25},

headers={"Authorization": f"Bearer {api_key}"},

timeout=10.0,

)

response.raise_for_status()

items = response.json().get("items", [])

try:

response = httpx.get(

f"{api_url.rstrip('/')}/snapshots",

params={"limit": 100},

headers={"Authorization": f"Bearer {api_key}"},

timeout=10.0,

)

response.raise_for_status()

except httpx.HTTPStatusError as e:

raise RuntimeError(

f"Failed to list Daytona snapshots: HTTP {e.response.status_code}"

) from e

except httpx.RequestError as e:

raise RuntimeError(f"Failed to connect to Daytona API: {e}") from e

items = response.json().get("items", [])

Copilot

Pull request overview

Fixes Daytona sandbox creation when a snapshot marked general: true is visible cross-org but cannot be resolved by name from the current org by adding a name→ID resolution step (with caching) and retrying sandbox creation if a cached ID becomes stale.

Changes:

Added a TTL+LRU cache for (snapshot_name, target_region) -> snapshot_id mappings to avoid repeated snapshot listing calls.
Implemented _resolve_snapshot_id() to list Daytona snapshots and pick an active snapshot matching name + region, then cache its ID.
Updated sandbox creation to use the resolved snapshot ID and invalidate+retry once on snapshot “not found/not available” errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        response = httpx.get(
+            f"{api_url.rstrip('/')}/snapshots",
+            params={"limit": 25},
+            headers={"Authorization": f"Bearer {api_key}"},
+            timeout=10.0,
+        )
+        response.raise_for_status()
+        items = response.json().get("items", [])
+
+        for item in items:
+            if (
+                item.get("name") == name
+                and item.get("state") == "active"
+                and target in (item.get("regionIds") or [])
+            ):
+                snapshot_id = item["id"]
+                self._snapshot_id_cache.put(cache_key, snapshot_id)
+                return snapshot_id
+


+            target = os.getenv("DAYTONA_TARGET") or os.getenv("AGENTA_REGION") or "eu"
+            snapshot_id = self._resolve_snapshot_id(snapshot_ref, target)
+


+            try:
+                sandbox = _create(snapshot_id)
+            except Exception as e:
+                # Snapshot may have been rebuilt with a new ID mid-cache;
+                # invalidate and retry once with a fresh lookup.
+                message = str(e).lower()
+                if "not found" in message or "not available" in message:


+    def _resolve_snapshot_id(self, name: str, target: str) -> str:
+        """Resolve a snapshot name to its ID for the given region.
+
+        Sandbox creation by snapshot *name* only resolves snapshots owned by
+        the requesting org, even when the snapshot is marked ``general: true``
+        and visible in the dashboard. Resolving by *ID* works cross-org, so we
+        list snapshots, find one matching name + region + active state, and
+        cache the result.
+        """


github-actions · 2026-05-27T15:00:06Z

Railway Preview Environment


Preview URL	https://gateway-production-7c33.up.railway.app/w
Image tag	`pr-4464-a93039e`
Status	Failed
Railway logs	Open logs
Logs	View workflow run
Updated at 2026-05-27T15:00:05.077Z

[fix] Resolve broken snapshot access in Daytona

0049015

Copilot AI review requested due to automatic review settings May 27, 2026 14:48

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working python Pull requests that update Python code labels May 27, 2026

Copilot started reviewing on behalf of junaway May 27, 2026 14:49 View session

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Copilot AI reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Resolve broken snapshot access in Daytona#4464

[fix] Resolve broken snapshot access in Daytona#4464
junaway wants to merge 1 commit into
mainfrom
fix/broken-daytona-snapshots

junaway commented May 27, 2026

Uh oh!

vercel Bot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		target = os.getenv("DAYTONA_TARGET") or os.getenv("AGENTA_REGION") or "eu"
		snapshot_id = self._resolve_snapshot_id(snapshot_ref, target)

Conversation

junaway commented May 27, 2026

Uh oh!

vercel Bot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented May 27, 2026

Railway Preview Environment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 27, 2026 •

edited

Loading