Skip to content

Fix AuggieCLIProbe hanging indefinitely when auggie CLI is unresponsive#481

Merged
ratulsarna merged 4 commits intosteipete:mainfrom
bryant24hao:fix/auggie-cli-timeout
Mar 13, 2026
Merged

Fix AuggieCLIProbe hanging indefinitely when auggie CLI is unresponsive#481
ratulsarna merged 4 commits intosteipete:mainfrom
bryant24hao:fix/auggie-cli-timeout

Conversation

@bryant24hao
Copy link
Contributor

Summary

  • AuggieCLIProbe.runAuggieAccountStatus() used raw Process + waitUntilExit() with no timeout, which could block the refresh cycle indefinitely if the auggie CLI hangs
  • Replace with SubprocessRunner.run() which provides a 15-second timeout and automatic SIGTERM→SIGKILL cleanup
  • This is the same class of bug described in Augment CLI fetch can hang refresh due to unbounded waitUntilExit #474 (Augment CLI fetch can hang refresh due to unbounded waitUntilExit)

Context

I encountered two orphaned codex processes spawned by CodexBar that had been stuck at 100% CPU for 9 days. While investigating, I noticed AuggieCLIProbe still uses the same unbounded waitUntilExit() pattern that was identified as problematic in #474.

The existing SubprocessRunner already implements the correct timeout + kill pattern and is used by other providers (e.g., Antigravity). This PR simply switches AuggieCLIProbe to use it as well.

Test plan

  • swift build passes
  • Existing tests pass (unrelated OAuth keychain failures only)
  • Manual: verify Augment usage still displays correctly with auggie CLI installed

🤖 Generated with Claude Code

Replace raw Process + waitUntilExit() with SubprocessRunner which provides
a 15-second timeout and SIGTERM→SIGKILL cleanup on hang.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ratulsarna
Copy link
Collaborator

Looks correct overall. This fixes the exact class of bug from #474 by removing the unbounded waitUntilExit() path and routing through the shared timeout-bounded subprocess runner.

My only ask before merge: please add a focused regression test for the timeout case (hung CLI / runner timeout) and verify we fall back to web instead of stalling refresh. This is the kind of behavior fix that’s easy to regress later.

- SubprocessRunnerTests: verify hung process throws .timedOut
- AugmentCLIFetchStrategyFallbackTests: verify all SubprocessRunnerError
  and AuggieCLIError variants trigger correct fallback behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bryant24hao
Copy link
Contributor Author

Thanks for the review and the suggestion! I've added focused regression tests in 93d655f:

  • SubprocessRunnerTests.throwsTimedOutWhenProcessHangs — verifies a hung process triggers .timedOut instead of blocking indefinitely
  • AugmentCLIFetchStrategyFallbackTests — 7 tests covering all SubprocessRunnerError and AuggieCLIError variants, confirming timeout/infrastructure errors fall back to web while parseError does not

All 9 tests passing. Ready for another look when you get a chance!

bryant24hao and others added 2 commits March 13, 2026 12:57
- docComments: separate implementation comment from @test declaration
- hoistPatternLet: use `case let .timedOut(label)` pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test relied on Task.sleep firing before process.waitUntilExit(),
but waitUntilExit() blocks the cooperative thread pool, starving the
timeout task on low-core CI runners. The timeout→fallback behavior
is already covered by AugmentCLIFetchStrategyFallbackTests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ratulsarna ratulsarna merged commit b015660 into steipete:main Mar 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants