Skip to content

Serialize expensive editor operations with leases#1184

Open
liuzqk wants to merge 1 commit into
CoplayDev:betafrom
liuzqk:fix/editor-operation-lease
Open

Serialize expensive editor operations with leases#1184
liuzqk wants to merge 1 commit into
CoplayDev:betafrom
liuzqk:fix/editor-operation-lease

Conversation

@liuzqk
Copy link
Copy Markdown

@liuzqk liuzqk commented May 31, 2026

Summary

Add a small cross-process operation lease for expensive editor operations that target the same Unity instance.

  • add an atomic file lease shared by MCP server processes for the same Unity instance
  • return structured operation_busy + hint=retry responses when another agent owns the operation window
  • guard refresh_unity for the full refresh/wait operation
  • guard the run_tests startup path while preflight and async job dispatch run
  • allow reentrant acquisition for the same tool invocation so run_tests preflight can still refresh dirty assets without dropping the outer lease
  • document the intentionally small scope and non-goals

Why

This is aimed at the multi-agent workflow where two or more MCP clients point at the same Unity project. In that setup, agents can independently decide to refresh assets, request compilation, or start tests before Unity has updated its own busy state. The result is avoidable contention: duplicate refreshes, overlapping compile waits, or test-start attempts that make the project feel much slower.

The lease gives the server a small coordination layer before those expensive operations reach Unity. It does not serialize normal read-only tools, and it does not try to become a full job queue. Clients still get a retryable response, so they can back off and retry instead of piling more work onto the editor.

Validation

  • uv run --extra dev python -m pytest tests/integration/test_editor_operation_lease.py tests/integration/test_run_tests_async.py tests/integration/test_refresh_unity_retry_recovery.py -q
  • git diff --check

Summary by CodeRabbit

Release Notes

  • New Features

    • Multi-agent coordination prevents concurrent operations on the same Editor instance. Multiple agents can now safely interact with the same Unity Editor without operations interfering with each other.
    • Refresh and test operations now return a retry response when already in-progress under another agent.
  • Tests

    • Added integration tests for operation coordination and busy responses.
  • Documentation

    • Added operation lease specification document.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This pull request introduces a file-based lease mechanism to coordinate "Unity editor operations" across concurrent MCP agents. It prevents destructive operations (refresh_unity and run_tests) from executing simultaneously by acquiring exclusive leases keyed to a Unity instance and operation type, with atomic file creation, configurable TTL, expiry detection, and reentrancy support for same-owner reacquisition.

Changes

Editor Operation Lease Coordination

Layer / File(s) Summary
Lease Data Model and Acquisition Logic
Server/src/services/tools/editor_operation_lease.py
Introduces EditorOperationLeaseInfo (immutable lease metadata) and EditorOperationLease (mutable handle with token and release tracking). Implements try_acquire_editor_operation_lease() with atomic O_EXCL file creation, payload validation, expiry detection via explicit timestamp or file mtime+TTL, and reentrant acquire when owner matches. Provides operation_owner_from_context() to derive owner from process/context identity and operation_busy_response() to format retry responses. Includes storage path resolution, TTL parsing with safety clamping, and defensive JSON payload parsing with fallback field derivation.
refresh_unity Tool Integration
Server/src/services/tools/refresh_unity.py
Wraps refresh_unity execution with lease acquisition keyed to the target Unity instance. Returns operation_busy response immediately if another owner holds the lease; otherwise acquires the lease, calls internal _refresh_unity_locked(...) helper, and releases the lease in a try/finally block to guarantee cleanup.
run_tests Tool Integration
Server/src/services/tools/run_tests.py
Integrates lease acquisition before preflight checks and test dispatch. Attempts to acquire a "run_tests" lease for the target Unity instance; returns operation_busy response on contention. Wraps request preparation and Unity command dispatch in try/finally to guarantee lease release regardless of operation success or failure.
Integration Tests and Lease Verification
Server/tests/integration/test_editor_operation_lease.py
Tests lease blocking behavior (same instance/operation, different owners return busy), lease file expiry and reclamation after TTL, per-context ownership derivation, reentrancy allowing same owner to acquire multiple operations concurrently while blocking other owners, and cross-tool contention (run_tests blocked by refresh lease and vice versa). Verifies lease release after dispatch via immediate reacquisition of a different operation. Uses autouse fixture to isolate lease storage per test in temp directory.
Design Specification and Verification Plan
docs/plans/2026-05-31-multi-agent-operation-lease-spec.md
Documents the multi-agent operation lease mechanism: problem (concurrent agents causing overlapping destructive operations), lease data model (instance/operation/owner/PID/token/start/expiry), atomic file-based acquisition in per-user directory, structured retry response format, per-operation hold duration guidance, and verification checklist (unit tests for blocking/expiry/reentrancy, tool-level tests for contention and lease release).

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


A rabbit hops through systems neat and tight,
With leases granted, holding operation's might,
Two tools once chaotic, now serene and slow,
File locks ensure no conflict's raging flow. 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: introducing lease serialization for expensive editor operations.
Description check ✅ Passed The description is comprehensive, covering summary, motivation, and validation. However, it does not follow the provided template structure with required sections like Type of Change, Testing/Screenshots, or Documentation Updates checkboxes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@liuzqk liuzqk marked this pull request as ready for review May 31, 2026 07:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/plans/2026-05-31-multi-agent-operation-lease-spec.md (1)

68-69: ⚡ Quick win

Add trailing newline at end of file.

The file should end with a newline character after line 68. This follows POSIX standards and is a common best practice for text files.

📝 Suggested fix
 - Tool-test that `run_tests` releases the startup lease after dispatch.
+
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/plans/2026-05-31-multi-agent-operation-lease-spec.md` around lines 68 -
69, The file ends without a trailing newline; add a single newline character at
the end of docs/plans/2026-05-31-multi-agent-operation-lease-spec.md so the last
line ("Tool-test that `run_tests` releases the startup lease after dispatch.")
is terminated by a newline character to satisfy POSIX/file-format conventions.
Server/src/services/tools/editor_operation_lease.py (1)

126-134: 💤 Low value

Minor race window between payload read and lease reclaim.

Between reading the payload at line 126 and unlinking at line 129, another process could reclaim the expired lease. If the original owner releases and a new owner creates in that window, the stale payload's expires_unix_ms still appears expired, causing the unlink to delete a valid lease.

The window is narrow and the impact is limited (occasional duplicate work, handled by retry semantics), but for a more robust guarantee, consider re-reading/re-checking expiry immediately before unlink or using atomic rename.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Server/src/services/tools/editor_operation_lease.py` around lines 126 - 134,
There’s a race between reading the payload with _read_payload and deleting it
with path.unlink: before unlinking, re-read the file and re-check expiry with
_is_lease_expired (and/or compare the freshly read payload metadata/body against
the earlier `existing` to confirm it hasn’t been replaced) so you only unlink
when the file is still the same expired lease; alternatively perform an atomic
operation (rename/move to a tombstone name and validate) before removing to
avoid deleting a newly-created valid lease — update the block that calls
_read_payload, _is_lease_expired, path.unlink and _payload_to_info to perform
this extra verification step (use instance_id or payload contents/timestamp to
detect changes).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/plans/2026-05-31-multi-agent-operation-lease-spec.md`:
- Around line 33-61: Document that the lease supports reentrant acquisition by
the same owner (e.g., allow nested acquisition within a single tool invocation
so operations like run_tests preflight refresh can acquire the same lease held
for refresh_unity without releasing it), and add a Lease duration subsection
that lists default TTLs and that they are configurable (specify defaults for
refresh_unity and run_tests startup phase and note that expiry time in the lease
file is used to reclaim crashes); update the Design section to mention the lease
file fields (owner, operation, PID, token, start time, expiry time), reentrancy
behavior, default TTL values, and that TTLs are configurable.

---

Nitpick comments:
In `@docs/plans/2026-05-31-multi-agent-operation-lease-spec.md`:
- Around line 68-69: The file ends without a trailing newline; add a single
newline character at the end of
docs/plans/2026-05-31-multi-agent-operation-lease-spec.md so the last line
("Tool-test that `run_tests` releases the startup lease after dispatch.") is
terminated by a newline character to satisfy POSIX/file-format conventions.

In `@Server/src/services/tools/editor_operation_lease.py`:
- Around line 126-134: There’s a race between reading the payload with
_read_payload and deleting it with path.unlink: before unlinking, re-read the
file and re-check expiry with _is_lease_expired (and/or compare the freshly read
payload metadata/body against the earlier `existing` to confirm it hasn’t been
replaced) so you only unlink when the file is still the same expired lease;
alternatively perform an atomic operation (rename/move to a tombstone name and
validate) before removing to avoid deleting a newly-created valid lease — update
the block that calls _read_payload, _is_lease_expired, path.unlink and
_payload_to_info to perform this extra verification step (use instance_id or
payload contents/timestamp to detect changes).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ddad71a6-1bfb-4ad8-8e6a-700f22c7a746

📥 Commits

Reviewing files that changed from the base of the PR and between 2d2bdc5 and a5db77f.

📒 Files selected for processing (5)
  • Server/src/services/tools/editor_operation_lease.py
  • Server/src/services/tools/refresh_unity.py
  • Server/src/services/tools/run_tests.py
  • Server/tests/integration/test_editor_operation_lease.py
  • docs/plans/2026-05-31-multi-agent-operation-lease-spec.md

Comment on lines +33 to +61
## Design

The Python server creates one atomic file lease per Unity instance in a shared
per-user directory. The lease file contains owner, operation, PID, token, start
time, and expiry time.

Acquire uses exclusive file creation so separate MCP server processes agree on
one owner. If the file exists and is not expired, the tool returns:

```json
{
"success": false,
"error": "operation_busy",
"hint": "retry",
"data": {
"reason": "operation_busy",
"operation": "refresh_unity",
"owner": "session:...",
"retry_after_ms": 2000
}
}
```

`refresh_unity` holds the lease for the whole refresh/wait operation.

`run_tests` holds the lease only while it performs preflight and starts the
async Unity test job. The actual test execution continues to be represented by
Unity's test state and `get_test_job`; this keeps the change small while
removing the most common race window at job startup.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Document reentrancy support and TTL/expiry values.

The design section is missing two important aspects mentioned in the PR objectives and stack context:

  1. Reentrancy support: The PR objectives state "Reentrant acquisition is allowed for the same tool invocation so nested operations can proceed without releasing an outer lease." This is a key design feature that should be documented to help readers understand how nested operations (e.g., run_tests preflight refreshing dirty assets) interact with the lease mechanism.

  2. TTL/expiry values: While line 37 mentions "expiry time" as a lease field, the design section doesn't specify default TTL values or mention that they're configurable (as indicated in the stack context). Consider adding a subsection that specifies the default lease duration for each operation type.

📝 Suggested additions to the Design section

Add after line 39:

### Reentrancy

The lease allows reentrant acquisition by the same owner within a single tool invocation. This enables nested operations (e.g., `run_tests` performing a preflight refresh) to proceed without deadlock or requiring the outer lease to be released.

### Lease duration

Each operation type has a default TTL:
- `refresh_unity`: [specify duration, e.g., 300 seconds]
- `run_tests`: [specify duration for the startup phase, e.g., 60 seconds]

Expired leases are automatically reclaimed to recover from crashed MCP server processes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/plans/2026-05-31-multi-agent-operation-lease-spec.md` around lines 33 -
61, Document that the lease supports reentrant acquisition by the same owner
(e.g., allow nested acquisition within a single tool invocation so operations
like run_tests preflight refresh can acquire the same lease held for
refresh_unity without releasing it), and add a Lease duration subsection that
lists default TTLs and that they are configurable (specify defaults for
refresh_unity and run_tests startup phase and note that expiry time in the lease
file is used to reclaim crashes); update the Design section to mention the lease
file fields (owner, operation, PID, token, start time, expiry time), reentrancy
behavior, default TTL values, and that TTLs are configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant