Skip to content

feat(desktop): client-side stall detection for chat tool calls#7555

Open
eulicesl wants to merge 2 commits into
BasedHardware:mainfrom
eulicesl:feat/desktop-chat-stall-detection
Open

feat(desktop): client-side stall detection for chat tool calls#7555
eulicesl wants to merge 2 commits into
BasedHardware:mainfrom
eulicesl:feat/desktop-chat-stall-detection

Conversation

@eulicesl
Copy link
Copy Markdown

Summary

  • Problem: A running tool call had only two states — running / completed. When a tool is slow or hangs, the user sees an indefinite spinner with no signal that anything is wrong and no way to bail out. This is a common source of "the chat is stuck / broken" perception.
  • What changed: Adds client-side stall detection. A StallDetector actor watches two clocks per turn (the inter-event gap and each in-flight tool's own duration) and promotes UI state through .running → .slow → .stalled at configurable thresholds. The UI surfaces a "still working…" annotation on slow tools and a message-level Cancel banner on stalled ones, wired to the existing AgentBridge.interrupt().
  • What did NOT change (scope boundary): No bridge/protocol or backend changes — the detector reads the timestamped events the bridge already emits. No tool-cancellation timeouts (it only surfaces UI + offers manual Cancel). No telemetry changes.

Linked Issue / Bounty

  • N/A — independent contribution.

Root Cause (bug fixes only)

  • N/A — new feature (UX reliability affordance).

How It Works

  • StallDetector is pure logic: the current time is passed in as a parameter on every operation (step(kind:atMs:), tick(atMs:)), so it has no wall-clock dependency and is fully deterministic under test. Production wraps it in a periodic tick task while a turn is active so promotions still fire during silent gaps when no events arrive.
  • StallThresholds holds the promotion cutoffs (slowGapMs / stalledGapMs, default 8s / 20s) as a Sendable/Equatable struct so they can be tuned later with no behavioral churn.
  • ToolCallStatus is extended from { running, completed } to { running, slow, stalled, completed, failed } with an isInFlight helper. The exhaustive switches that consume it are updated across the chat, onboarding, and task-chat surfaces (this is the bulk of the changed-line count — mechanical enum-extension ripple).
  • ChatProvider feeds every bridge callback to the detector and maps detector transitions onto the tool-call rows; the stalled state drives the Cancel banner.

Change Type

  • Feature

Scope

  • Desktop app (Swift/macOS)

Security Impact

  • New permissions or capabilities introduced? No
  • Auth or token handling changed? No
  • Data access scope changed (screen data, OCR, user data)? No
  • New or changed network calls? No
  • Plugin or tool execution surface changed? No — Cancel routes to the existing AgentBridge.interrupt(). The detector observes only event timing, never content.

Testing

  • Built locally — xcrun swift build -c debug --package-path DesktopBuild complete!
  • Manual verification: compile + unit tests on macOS. See "Human Verification" for the runtime gap.
  • Existing tests pass
  • New tests added — StallDetectorTests (13 tests: inter-event promotion, per-tool timers tracked independently, reset-on-event, threshold guard) and ToolCallStatusTests (3 tests: the extended-enum + isInFlight contract).

Human Verification

  • Verified scenarios: clean debug build against current main; 16 new tests pass; full test target compiles.
  • Edge cases checked (via tests): a tool's own timer promoting independently of the inter-event gap; multiple parallel tools tracked separately; repeated tick at the same timestamp emitting no duplicate transition; thresholds precondition.
  • What I did not verify (and why): I did not drive the running app to screenshot each .slow / .stalled state. Deterministically forcing a real stall end-to-end needs a fake-bridge harness (out of scope for this PR), so runtime UI evidence is a known gap.

Evidence

  • Test output:
    Test Suite 'StallDetectorTests' passed  — Executed 13 tests, 0 failures
    Test Suite 'ToolCallStatusTests' passed — Executed 3 tests, 0 failures
    Build complete!
    
  • No runtime screenshot — honest gap (see Human Verification). Happy to add one if a reviewer wants it.

Docs

  • N/A — no docs impact

Performance, Privacy, and Reliability

  • Performance: one actor + a 500ms tick task active only during an in-flight turn; negligible. Privacy: the detector sees event kind and timing only — never prompt, tool input, or output text. Reliability: strictly additive — when nothing stalls, behavior is identical to today.

Migration / Backward Compatibility

  • Backward compatible? Yes
  • Migration needed? No — the ToolCallStatus extension is additive; persistence treats the new in-flight states as transient and never relies on them being present in stored history.

Risks and Mitigations

  • Risk: Extending ToolCallStatus could miss an exhaustive switch. Mitigation: Swift's exhaustiveness checking forces every consumer to handle the new cases; the compile is green, and ToolCallStatusTests locks the isInFlight contract.
  • Risk: A periodic tick task leaking past turn end. Mitigation: the tick task is cancelled via defer on turn-scope exit (success or throw).
  • Known limitation: thresholds (8s / 20s) are first-pass defaults, intended to be tuned against real timing data later.

AI Disclosure

  • Tools used: Claude Code (Claude Opus).
  • AI-assisted portions: implementation and this PR description.
  • How I verified correctness: ran the local debug build and the unit-test suite (all green); manually reviewed every diff and confirmed the detector never touches message content.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 30, 2026

Greptile Summary

Adds client-side stall detection for chat tool calls on macOS. A new StallDetector actor watches two independent timers — an inter-event gap and a per-tool duration — and promotes ToolCallStatus through .running → .slow → .stalled at configurable thresholds (8 s / 20 s). The UI gains an amber spinner for slow tools and a Cancel banner for stalled ones, wired to the existing AgentBridge.interrupt().

  • StallDetector / StallThresholds: Pure-logic actor with deterministic time injection; well-covered by 13 new unit tests.
  • ToolCallStatus extension: Adds .slow, .stalled, .failed + isInFlight helper; exhaustive switch ripple updated across chat, onboarding, task-chat, and persistence surfaces.
  • ChatProvider integration: Feeds every bridge callback and a 500 ms tick task into the detector; one bug — stall promotions are silently dropped for tools whose bridge-provided toolUseId is nil because the synthetic \"untracked-\\(name)\" key never matches the nil stored in the content block.

Confidence Score: 3/5

The core stall detection logic and UI wiring are sound, but the integration between the detector's synthetic tool IDs and the content-block matching has a functional gap that prevents the banner from appearing for tools without a bridge-supplied toolUseId.

The applyStallTransitions method registers tools under a synthetic key untracked-name when toolUseId is nil, but then matches content blocks by tuid == id where tuid is also nil — so the comparison is always false. Any tool the bridge delivers without a toolUseId will never show the amber spinner or Cancel banner, which is the primary deliverable of this feature. The fix is a one-line change, but until it lands the feature is partially inoperative for that code path.

desktop/Desktop/Sources/Providers/ChatProvider.swift — specifically applyStallTransitions (the synthetic-ID / nil-tuid mismatch). All other changed files look correct.

Important Files Changed

Filename Overview
desktop/Desktop/Sources/Chat/StallDetector.swift New actor implementing pure-logic stall detection with deterministic time injection; two independent timers (inter-event gap + per-tool duration) producing typed Transition values; well-tested and idempotent.
desktop/Desktop/Sources/Providers/ChatProvider.swift Integrates StallDetector into the chat turn; contains a bug in applyStallTransitions where tools registered under a synthetic "untracked-{name}" key never match their content blocks (toolUseId is nil), silently dropping all stall promotions for those tools.
desktop/Desktop/Sources/MainWindow/Pages/ChatPage.swift Extends ToolCallsGroup with aggregateStatus, statusIcon helper, and ToolCallStalledBanner; Cancel button wired via onCancel closure chain to AgentBridge.interrupt(); exhaustive switches updated for new enum cases.
desktop/Desktop/Sources/Rewind/Core/TaskChatMessageStorage.swift Persistence encoding updated to map .slow/.stalled to "running" and adds .failed round-trip; forward-compat default for unknown status codes.
desktop/Desktop/Tests/StallDetectorTests.swift 13 deterministic tests covering inter-event promotion, per-tool timer independence, reset-on-event, idempotent ticks, and parallel tools; no test for the untracked-id integration path.

Sequence Diagram

sequenceDiagram
    participant Bridge as AgentBridge
    participant CP as ChatProvider (MainActor)
    participant SD as StallDetector (actor)
    participant UI as ToolCallsGroup

    Bridge->>CP: toolActivityHandler("started", toolUseId)
    CP->>CP: addToolActivity(.running)
    CP->>SD: await step(.toolStarted(id), atMs)
    SD-->>CP: [] no transitions yet

    loop every 500ms stallTickTask
        CP->>SD: await tick(atMs now)
        alt gap under 8s
            SD-->>CP: []
        else gap reaches 8s
            SD-->>CP: [.tool(id, .running to .slow)]
            CP->>CP: "applyStallTransitions block status = .slow"
            CP-->>UI: amber spinner
        else gap reaches 20s
            SD-->>CP: [.tool(id, .slow to .stalled)]
            CP->>CP: "applyStallTransitions block status = .stalled"
            CP-->>UI: Cancel banner shown
        end
    end

    UI->>CP: onCancelTurn tapped
    CP->>Bridge: stopAgent interrupt

    Bridge->>CP: toolActivityHandler("completed", toolUseId)
    CP->>CP: addToolActivity(.completed)
    CP->>SD: await step(.toolCompleted(id), atMs)
    SD-->>CP: [.tool(id, .stalled to .running)]
Loading

Reviews (1): Last reviewed commit: "feat(desktop): client-side stall detecti..." | Re-trigger Greptile

Comment thread desktop/Desktop/Sources/Providers/ChatProvider.swift Outdated
eulicesl added a commit to eulicesl/omi-fork that referenced this pull request May 31, 2026
Greptile P1 on BasedHardware#7555: tools that arrive without a `toolUseId` were
registered in the StallDetector under a synthetic `untracked-<name>`
key, but their content block stored `toolUseId: nil`. The transition
match used `tuid == id`, so `nil == "untracked-<name>"` was always
false and every stall transition for those tools was silently dropped
— the spinner never went amber and the Cancel banner never appeared,
the exact case the feature exists to fix.

Route both the registration site and the match site through a single
`ChatProvider.stallTrackingId(toolUseId:name:)` helper so the key is
derived identically and cannot diverge again. Add tests locking the
derivation (real id passes through; nil falls back to the name key).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@kodjima33 kodjima33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Desktop feature: client-side stall detection, well-tested actor pattern. Approve only — feature >500 lines and mergeStateStatus BLOCKED, author needs to rebase.

eulicesl and others added 2 commits May 31, 2026 11:08
A running tool call previously had only two states (running /
completed), so a slow or hung tool showed an indefinite spinner with
no signal and no way out.

Adds a `StallDetector` actor that watches two clocks per turn — the
inter-event gap and each in-flight tool's own duration — and promotes
state through `.running → .slow → .stalled` at configurable thresholds
(`StallThresholds`, default 8s / 20s). The detector is pure logic:
time is passed in on every call, so it is fully deterministic and
testable without wall-clock waits.

`ToolCallStatus` is extended to `{ running, slow, stalled, completed,
failed }` with an `isInFlight` helper, and the exhaustive switches that
consume it are updated across the chat, onboarding, and task-chat
surfaces. `ChatProvider` feeds every bridge event to the detector and
runs a periodic tick task so promotions fire during silent gaps. The
UI surfaces a "still working…" annotation on slow tools and a
message-level Cancel banner on stalled ones, wired to
`AgentBridge.interrupt()`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile P1 on BasedHardware#7555: tools that arrive without a `toolUseId` were
registered in the StallDetector under a synthetic `untracked-<name>`
key, but their content block stored `toolUseId: nil`. The transition
match used `tuid == id`, so `nil == "untracked-<name>"` was always
false and every stall transition for those tools was silently dropped
— the spinner never went amber and the Cancel banner never appeared,
the exact case the feature exists to fix.

Route both the registration site and the match site through a single
`ChatProvider.stallTrackingId(toolUseId:name:)` helper so the key is
derived identically and cannot diverge again. Add tests locking the
derivation (real id passes through; nil falls back to the name key).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@eulicesl eulicesl force-pushed the feat/desktop-chat-stall-detection branch from 147b9b7 to a10264a Compare May 31, 2026 15:10
@eulicesl
Copy link
Copy Markdown
Author

Rebased onto the latest main, checks green, and all review threads resolved — this one's ready to merge whenever convenient. Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants