fix: thread abort signals through host waits#94
Conversation
|
/coder-agents-review |
There was a problem hiding this comment.
The abort signal threading is a solid structural improvement: deterministic cleanup via ResourceScope, consistent abort propagation through host waits, and the run-completion test verifies sentinel bytes stay hidden after abort. Clean decomposition of the generic waitForScopedOperation helper.
This is a first-pass (Netero) review only. These are mechanical findings; the full review panel has not yet reviewed this PR and will review after these findings are addressed.
1 P1, 1 P2, 2 P3, 1 Note.
The P1 is coverage: src/util/abort.ts exports three functions used by five modules, including an invariant that crashes on misuse, with zero unit tests. Every other utility in src/util/ has a corresponding test file. The P2 is the broader test ratio: 416 new source lines across 6 files, 37 test lines covering one path.
"44 lines, 3 exported functions, an invariant that crashes the process on misuse, and not a single test. Other utility modules in src/util/ have corresponding test files; abort.ts breaks that pattern." (Netero)
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Strong response to R1: all five findings addressed in a single commit, the settlement pattern centralized into shared utilities, and test coverage went from 37 lines to 327 lines. The createResourceScopedSettlers + waitForScopedOperation composition is clean and well-tested. Multiple reviewers independently verified the settlement ordering, abort propagation, and scope cleanup semantics.
1 P2, 9 P3, 3 Nit. 19-reviewer panel.
The P2 is a resource accumulation issue: each aborted wait --exit request appends a .then() handler to the shared ptyExitPromise that is never cleaned up by the ResourceScope. Under automation with short client timeouts, this grows without bound. The idle-wait path correctly registers its poll interval on the scope; the exit path does not have an equivalent.
The P3s cluster around three themes: (1) test coverage for the pre-aborted signal fast path, which three reviewers independently flagged; (2) documentation gaps on the new shared utilities and MethodContext; (3) type-level enforcement and error identity consistency.
"ptyExitPromise is a singleton for the host lifetime. Each aborted wait --exit request appends another .then() handler. An agent polling wait --exit with short client timeouts accumulates one closure per attempt." (Hisoka)
src/host/hostMain.ts:734
P2 [ATT94-6] The exit-wait path creates waitCondition = ptyExitPromise.then(...) without registering anything on waitScope. When the client disconnects and waitForScopedOperation rejects with AbortError, the scope closes empty, but the .then() handler on ptyExitPromise still holds a closure over state and fires when the PTY eventually exits.
For a single aborted wait this is trivial. But ptyExitPromise is a singleton for the host lifetime. Each aborted wait --exit appends another .then() handler. An agent polling with short client timeouts accumulates one closure per attempt. At thousands of abandoned waits, all closures fire synchronously on PTY exit, each calling state.snapshot() and allocating a result object nobody reads.
The idle-wait path (line 752) correctly adds checkInterval to waitScope, so scope close stops the poll. The exit path has no equivalent cleanup. Consider wrapping ptyExitPromise.then() in a cancellable pattern (e.g., racing against a scope-owned deferred that rejects on scope close). (Hisoka)
🤖
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
All 13 R2 findings verified fixed across 13 reviewers. The R3 fixes are thorough: ptyExitWaiters Set eliminates the .then() leak, pollDelay normalizes abort error identity, runAbortCallback guards onAbort throws, requestScope.close() is wrapped in try/catch, and the TimeoutConfig discriminated union enforces the timeout pairing at compile time. Multiple reviewers (Hisoka, Takumi, Knov, Pariston) independently traced the settlement races and confirmed correctness.
2 P3, 2 Nit. 13-reviewer panel (9 clean).
The two P3s are minor: a test gap where the most common production path through waitForScopedOperation (operation resolves normally) has zero coverage, and the new makeAbortReason/makeAbortError naming distinction needs a one-liner JSDoc each so callers pick the right one.
"Delete that line and all 11 tests still pass." (Bisky, on
void operation.then(settlers.resolve, settlers.reject))
PS. The commit subject "fix: clean up abort review findings" describes the process, not the change. Something like fix: guard scope cleanup, normalize abort reasons, add abort JSDoc names what shipped.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
All 4 R3 findings verified fixed. CI green (11/11). 9-reviewer panel, 7 clean. No correctness issues found across 4 rounds.
Four rounds, 32 findings total: 22 fixed, 7 dropped, 3 unposted Notes. Every posted finding was addressed within one round. The abort threading is structurally sound: multiple reviewers independently traced the settlement races, scope cleanup ordering, and resource lifecycle across all paths. Test coverage is healthy at 1.3:1 source-to-test ratio with 29 targeted tests.
One Nit: ptyExitWaiters could use a comment explaining why it exists alongside ptyExitPromise.
"The threads I pulled all held." (Hisoka)
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
ATT94-32 verified. All prior findings resolved. 9-reviewer panel, 8 clean. No correctness issues across 5 rounds.
2 Nit: both are JSDoc completeness on abort.ts exports. throwIfAborted is the only exported function without a comment, and the createResourceScopedSettlers JSDoc documents the reject-path AggregateError but omits the resolve-path close failure mode.
Five rounds, 34 findings total: 23 fixed, 7 dropped, 4 open (2 Nit posted, 3 unposted Notes). The abort signal threading is structurally sound and well-tested.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
1 similar comment
|
/coder-agents-review |
There was a problem hiding this comment.
R5 Nits verified fixed. 9-reviewer panel, 8 clean. No correctness issues.
2 P3, both test coverage for edge paths in the abort utilities. The resolve-close-failure asymmetry (ATT94-35) is a re-raise of ATT94-17 (dropped R2) with new evidence: the existing reject-close-failure test at line 88 proves cleanup CAN throw, invalidating the original drop reasoning. The timeout-cleanup gap (ATT94-36) is the most common production path (command succeeds before timeout) with no assertion that the timer is cleaned up.
"The reject-close-failure test at line 88 explicitly proves cleanup CAN throw. The resolve path has different behavior under the same condition, and no test pins it." (Bisky)
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Review blocked: the two open findings from R6 have no response and no code push.
- ATT94-35 (P3):
createResourceScopedSettlersresolve-close-failure path is untested. The reject-close-failure path has a test; the resolve path (which silently converts a success into a rejection) does not. - ATT94-36 (P3): No
waitForScopedOperationtest where the operation wins the race against a configured timeout. This is the normal production path for every successfulrun,wait, andwaitForRender.
Further review is blocked until the author responds or pushes fixes for these findings.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
ATT94-35 and ATT94-36 verified fixed. 9-reviewer panel, all clean. Netero clean. No new findings.
Eight rounds, 36 findings: 27 fixed, 7 dropped, 3 unposted Notes. Every posted finding addressed. No correctness issues found by any reviewer at any point in the review. The abort signal threading is structurally sound, well-documented, and thoroughly tested.
🤖 This review was automatically generated with Coder Agents.
Summary
AbortSignalthrough host-side wait, render polling, run-completion, lifecycle polling, and RPC request paths.ResourceScopeso aborted or completed waits release resources deterministically.src/util/abort.ts, including preserving original operation errors when cleanup also fails.sendRpcaborts, and RPC request-context abort on client disconnect.wait --exitcleanup cancellable, normalizing lifecycle delay abort errors, documenting shared contracts, and resetting unrelatedmise.lockdrift.Fixes #84.
Notes
Validation
npm run typecheck -- --pretty falsenpm run test -- test/unit/host/runCompletionCoordinator.test.ts test/unit/host/hostMain.test.tsnpm run lintnpm run format:checknpm run test -- test/integration/wait-render.test.tsnpm run test -- test/integration/lifecycle.test.tsnpm run test -- test/unit/util/abort.test.ts test/unit/host/rpcClient.test.ts test/unit/host/rpcServer.test.ts test/unit/host/runCompletionCoordinator.test.ts test/unit/host/hostMain.test.tsnpm run verifyGenerated with
mux• Model:openai:gpt-5.5• Thinking:high