Skip to content

fix: thread abort signals through host waits#94

Merged
ThomasK33 merged 3 commits into
mainfrom
agent-tty-fqde
May 12, 2026
Merged

fix: thread abort signals through host waits#94
ThomasK33 merged 3 commits into
mainfrom
agent-tty-fqde

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

@ThomasK33 ThomasK33 commented May 12, 2026

Summary

  • Thread AbortSignal through host-side wait, render polling, run-completion, lifecycle polling, and RPC request paths.
  • Register timer/socket/listener cleanup with ResourceScope so aborted or completed waits release resources deterministically.
  • Centralize scoped settlement/abort helpers in src/util/abort.ts, including preserving original operation errors when cleanup also fails.
  • Add targeted coverage for abort utilities, run-completion aborts, sendRpc aborts, and RPC request-context abort on client disconnect.
  • Address review follow-ups by making wait --exit cleanup cancellable, normalizing lifecycle delay abort errors, documenting shared contracts, and resetting unrelated mise.lock drift.

Fixes #84.

Notes

  • The stale RPC socket liveness probe now has a bounded 1s probe. If the connect attempt neither succeeds nor fails promptly, the socket is treated as stale so host startup does not hang indefinitely.

Validation

  • npm run typecheck -- --pretty false
  • npm run test -- test/unit/host/runCompletionCoordinator.test.ts test/unit/host/hostMain.test.ts
  • npm run lint
  • npm run format:check
  • npm run test -- test/integration/wait-render.test.ts
  • npm run test -- test/integration/lifecycle.test.ts
  • npm run test -- test/unit/util/abort.test.ts test/unit/host/rpcClient.test.ts test/unit/host/rpcServer.test.ts test/unit/host/runCompletionCoordinator.test.ts test/unit/host/hostMain.test.ts
  • npm run verify

Generated with mux • Model: openai:gpt-5.5 • Thinking: high

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abort signal threading is a solid structural improvement: deterministic cleanup via ResourceScope, consistent abort propagation through host waits, and the run-completion test verifies sentinel bytes stay hidden after abort. Clean decomposition of the generic waitForScopedOperation helper.

This is a first-pass (Netero) review only. These are mechanical findings; the full review panel has not yet reviewed this PR and will review after these findings are addressed.

1 P1, 1 P2, 2 P3, 1 Note.

The P1 is coverage: src/util/abort.ts exports three functions used by five modules, including an invariant that crashes on misuse, with zero unit tests. Every other utility in src/util/ has a corresponding test file. The P2 is the broader test ratio: 416 new source lines across 6 files, 37 test lines covering one path.

"44 lines, 3 exported functions, an invariant that crashes the process on misuse, and not a single test. Other utility modules in src/util/ have corresponding test files; abort.ts breaks that pattern." (Netero)

🤖 This review was automatically generated with Coder Agents.

Comment thread src/util/abort.ts
Comment thread src/host/hostMain.ts Outdated
Comment thread src/host/rpcServer.ts
Comment thread src/host/rpcServer.ts
Comment thread src/host/hostMain.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong response to R1: all five findings addressed in a single commit, the settlement pattern centralized into shared utilities, and test coverage went from 37 lines to 327 lines. The createResourceScopedSettlers + waitForScopedOperation composition is clean and well-tested. Multiple reviewers independently verified the settlement ordering, abort propagation, and scope cleanup semantics.

1 P2, 9 P3, 3 Nit. 19-reviewer panel.

The P2 is a resource accumulation issue: each aborted wait --exit request appends a .then() handler to the shared ptyExitPromise that is never cleaned up by the ResourceScope. Under automation with short client timeouts, this grows without bound. The idle-wait path correctly registers its poll interval on the scope; the exit path does not have an equivalent.

The P3s cluster around three themes: (1) test coverage for the pre-aborted signal fast path, which three reviewers independently flagged; (2) documentation gaps on the new shared utilities and MethodContext; (3) type-level enforcement and error identity consistency.

"ptyExitPromise is a singleton for the host lifetime. Each aborted wait --exit request appends another .then() handler. An agent polling wait --exit with short client timeouts accumulates one closure per attempt." (Hisoka)


src/host/hostMain.ts:734

P2 [ATT94-6] The exit-wait path creates waitCondition = ptyExitPromise.then(...) without registering anything on waitScope. When the client disconnects and waitForScopedOperation rejects with AbortError, the scope closes empty, but the .then() handler on ptyExitPromise still holds a closure over state and fires when the PTY eventually exits.

For a single aborted wait this is trivial. But ptyExitPromise is a singleton for the host lifetime. Each aborted wait --exit appends another .then() handler. An agent polling with short client timeouts accumulates one closure per attempt. At thousands of abandoned waits, all closures fire synchronously on PTY exit, each calling state.snapshot() and allocating a result object nobody reads.

The idle-wait path (line 752) correctly adds checkInterval to waitScope, so scope close stops the poll. The exit path has no equivalent cleanup. Consider wrapping ptyExitPromise.then() in a cancellable pattern (e.g., racing against a scope-owned deferred that rejects on scope close). (Hisoka)

🤖

🤖 This review was automatically generated with Coder Agents.

Comment thread src/util/abort.ts
Comment thread src/host/rpcServer.ts
Comment thread src/host/lifecycle.ts Outdated
Comment thread src/util/abort.ts
Comment thread src/host/runCompletionCoordinator.ts Outdated
Comment thread src/host/rpcServer.ts Outdated
Comment thread src/util/abort.ts Outdated
Comment thread mise.lock Outdated
Comment thread src/host/hostMain.ts Outdated
Comment thread src/host/runCompletionCoordinator.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 13 R2 findings verified fixed across 13 reviewers. The R3 fixes are thorough: ptyExitWaiters Set eliminates the .then() leak, pollDelay normalizes abort error identity, runAbortCallback guards onAbort throws, requestScope.close() is wrapped in try/catch, and the TimeoutConfig discriminated union enforces the timeout pairing at compile time. Multiple reviewers (Hisoka, Takumi, Knov, Pariston) independently traced the settlement races and confirmed correctness.

2 P3, 2 Nit. 13-reviewer panel (9 clean).

The two P3s are minor: a test gap where the most common production path through waitForScopedOperation (operation resolves normally) has zero coverage, and the new makeAbortReason/makeAbortError naming distinction needs a one-liner JSDoc each so callers pick the right one.

"Delete that line and all 11 tests still pass." (Bisky, on void operation.then(settlers.resolve, settlers.reject))

PS. The commit subject "fix: clean up abort review findings" describes the process, not the change. Something like fix: guard scope cleanup, normalize abort reasons, add abort JSDoc names what shipped.

🤖 This review was automatically generated with Coder Agents.

Comment thread test/unit/util/abort.test.ts
Comment thread src/util/abort.ts
Comment thread src/host/rpcClient.ts Outdated
Comment thread src/util/abort.ts
@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 4 R3 findings verified fixed. CI green (11/11). 9-reviewer panel, 7 clean. No correctness issues found across 4 rounds.

Four rounds, 32 findings total: 22 fixed, 7 dropped, 3 unposted Notes. Every posted finding was addressed within one round. The abort threading is structurally sound: multiple reviewers independently traced the settlement races, scope cleanup ordering, and resource lifecycle across all paths. Test coverage is healthy at 1.3:1 source-to-test ratio with 29 targeted tests.

One Nit: ptyExitWaiters could use a comment explaining why it exists alongside ptyExitPromise.

"The threads I pulled all held." (Hisoka)

🤖 This review was automatically generated with Coder Agents.

Comment thread src/host/hostMain.ts
@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATT94-32 verified. All prior findings resolved. 9-reviewer panel, 8 clean. No correctness issues across 5 rounds.

2 Nit: both are JSDoc completeness on abort.ts exports. throwIfAborted is the only exported function without a comment, and the createResourceScopedSettlers JSDoc documents the reject-path AggregateError but omits the resolve-path close failure mode.

Five rounds, 34 findings total: 23 fixed, 7 dropped, 4 open (2 Nit posted, 3 unposted Notes). The abort signal threading is structurally sound and well-tested.

🤖 This review was automatically generated with Coder Agents.

Comment thread src/util/abort.ts
Comment thread src/util/abort.ts
@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

1 similar comment
@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R5 Nits verified fixed. 9-reviewer panel, 8 clean. No correctness issues.

2 P3, both test coverage for edge paths in the abort utilities. The resolve-close-failure asymmetry (ATT94-35) is a re-raise of ATT94-17 (dropped R2) with new evidence: the existing reject-close-failure test at line 88 proves cleanup CAN throw, invalidating the original drop reasoning. The timeout-cleanup gap (ATT94-36) is the most common production path (command succeeds before timeout) with no assertion that the timer is cleaned up.

"The reject-close-failure test at line 88 explicitly proves cleanup CAN throw. The resolve path has different behavior under the same condition, and no test pins it." (Bisky)

🤖 This review was automatically generated with Coder Agents.

Comment thread test/unit/util/abort.test.ts
Comment thread test/unit/util/abort.test.ts
@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review blocked: the two open findings from R6 have no response and no code push.

  • ATT94-35 (P3): createResourceScopedSettlers resolve-close-failure path is untested. The reject-close-failure path has a test; the resolve path (which silently converts a success into a rejection) does not.
  • ATT94-36 (P3): No waitForScopedOperation test where the operation wins the race against a configured timeout. This is the normal production path for every successful run, wait, and waitForRender.

Further review is blocked until the author responds or pushes fixes for these findings.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATT94-35 and ATT94-36 verified fixed. 9-reviewer panel, all clean. Netero clean. No new findings.

Eight rounds, 36 findings: 27 fixed, 7 dropped, 3 unposted Notes. Every posted finding addressed. No correctness issues found by any reviewer at any point in the review. The abort signal threading is structurally sound, well-documented, and thoroughly tested.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33 ThomasK33 merged commit 1da0a0e into main May 12, 2026
11 checks passed
@ThomasK33 ThomasK33 deleted the agent-tty-fqde branch May 12, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thread AbortSignal through host waits and polling paths

1 participant