Skip to content

fix(desktop): close saveMessage vs poll race with a pending-save counter#7559

Open
eulicesl wants to merge 4 commits into
BasedHardware:mainfrom
eulicesl:feat/desktop-chat-save-race-fix
Open

fix(desktop): close saveMessage vs poll race with a pending-save counter#7559
eulicesl wants to merge 4 commits into
BasedHardware:mainfrom
eulicesl:feat/desktop-chat-save-race-fix

Conversation

@eulicesl
Copy link
Copy Markdown

Summary

  • Problem: A duplicate / "ghost" chat message can appear after a turn completes. isSending is released before the AI message's save finishes (deliberately, to unblock the next query), so pollForNewMessages can run during that save's await, observe the just-saved AI message before its local UUID has been reconciled to the server id, and append it again as if it arrived from another platform.
  • What changed: Adds PendingSaveCounter — a @MainActor multi-holder counter. Every saveMessage call site brackets its save with begin()/end(), and the poll guard gains !pendingSaves.isActive, suppressing the poll for the full lifetime of any in-flight save. This closes the race window directly instead of relying on text heuristics.
  • What did NOT change (scope boundary): The pre-existing 200-char text-prefix merge in pollForNewMessages stays as a secondary safety net. The five save sites are not consolidated (see Risks). No backend, telemetry, or protocol changes.

Linked Issue / Bounty

  • N/A — independent contribution.

Root Cause (bug fixes only)

  • Root cause: isSending is cleared before the AI-response save resolves, so the !isSending poll guard is already satisfied while the save is still in flight. The poll then reads the saved-but-not-yet-reconciled message as new-from-another-platform and duplicates it.
  • Why it wasn't caught before: An existing 200-char text-prefix merge (keyed on sender + unsynced flag) masks most occurrences, so the duplicate only surfaces on short, common replies ("Yes", "Got it", "Sure") that can repeat verbatim across turns — rare and hard to reproduce by hand.

How It Works

  • PendingSaveCounter mirrors the existing ReentrancyGate pattern but supports multiple concurrent holders (several saves can be in flight at once). end() is bounded at 0 so a stray call can't drive the count negative and wedge the poll off.
  • Each of the five saveMessage sites calls begin() before launching the save and end() on both the success and catch paths (via MainActor.run where the surrounding Task closure isn't @MainActor).
  • The poll's existing guard !isSending, !isLoading, !isLoadingSessions gains !pendingSaves.isActive.

Change Type

  • Bug fix

Scope

  • Desktop app (Swift/macOS)

Security Impact

  • New permissions or capabilities introduced? No
  • Auth or token handling changed? No
  • Data access scope changed (screen data, OCR, user data)? No
  • New or changed network calls? No — same saveMessage / poll calls, only their interleaving is gated.
  • Plugin or tool execution surface changed? No

Testing

  • Built locally — xcrun swift build -c debug --package-path DesktopBuild complete!
  • Manual verification: compile + unit tests on macOS. See "Human Verification".
  • Existing tests pass
  • New tests added — PendingSaveCounterTests (6 tests): fresh-is-inactive, begin activates, end decrements, multiple holders stack (the concurrent-save case), end() bounded at zero, and interleaved/out-of-order begin/end.

Human Verification

  • Verified scenarios: clean debug build against current main; all 6 counter tests pass.
  • Edge cases checked (via tests): concurrent saves keeping the poll suppressed until the last one lands; stray end() not driving the count negative; out-of-order begin/end pairs.
  • What I did not verify (and why): there is no end-to-end reproducer of the duplicate-on-race in this PR. Reproducing it deterministically needs a mockable APIClient.saveMessage delay driven through the full ChatProvider lifecycle, which is larger test-infrastructure work. The synchronization primitive's contract is verified in isolation; the cross-flow behavior is asserted by construction (the poll cannot run while any save holds the counter).

Evidence

  • Test output:
    Test Suite 'PendingSaveCounterTests' passed — Executed 6 tests, 0 failures
    Build complete!
    
  • N/A — change is not directly user-visible (it removes an intermittent duplicate); no screenshot.

Docs

  • N/A — no docs impact

Performance, Privacy, and Reliability

  • Performance: a single integer counter check added to the poll guard; negligible. Privacy: no content is inspected — the counter only tracks save in-flight state. Reliability: strictly tightens an existing race; when no save is in flight, behavior is identical to today.

Migration / Backward Compatibility

  • Backward compatible? Yes
  • Migration needed? No — purely in-memory synchronization; no schema or stored-data change.

Risks and Mitigations

  • Risk: An unbalanced begin() without a matching end() would suppress the poll indefinitely. Mitigation: every site calls end() on both success and catch paths; end() is bounded at 0; the counter is unit-tested for the interleaved/stray-end cases.
  • Risk: Five separate save sites are easy to under-bracket. Mitigation: each site is documented inline and brackets symmetrically; consolidating them is deliberately deferred because they persist different message types at different lifecycle points and collapsing them could mask other races.

AI Disclosure

  • Tools used: Claude Code (Claude Opus).
  • AI-assisted portions: implementation and this PR description.
  • How I verified correctness: ran the local debug build and the unit-test suite (all green); manually traced each of the five save sites to confirm balanced begin()/end() on success and error paths.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 31, 2026

Greptile Summary

This PR introduces PendingSaveCounter, a @MainActor integer counter that brackets all five saveMessage call sites in ChatProvider and adds !pendingSaves.isActive to the pollForNewMessages guard, closing the race window where isSending is cleared before the AI-message save resolves and the poll could duplicate the just-saved message.

  • PendingSaveCounter.swift: New synchronization primitive with begin()/end() (bounded at zero to prevent negative counts) and isActive for the poll guard; well-documented contract and six unit tests cover fresh/single/stacked/bounded/interleaved scenarios.
  • ChatProvider.swift: All five fire-and-forget and inline save sites are symmetrically bracketed; for the four async Task sites, end() is correctly dispatched through await MainActor.run {} to satisfy the @MainActor isolation requirement.
  • Site 4 (critical inline do/catch): end() is duplicated in the success and catch branches rather than using defer, which leaves a small maintenance surface if a future change adds a guard or early-exit that forgets to call end().

Confidence Score: 4/5

The change is purely additive in-memory synchronization and cannot regress existing behaviour when no saves are in flight.

All five saveMessage sites are correctly bracketed with matching begin()/end() on both success and error paths. The only gap is site 4 duplicating end() in the do/catch instead of using defer — a future edit adding an early exit could forget end() and suppress the poll indefinitely, but this is a maintenance concern rather than a current defect.

The inline do/catch at site 4 in ChatProvider.swift is the only spot worth a second look to ensure future edits maintain the end()/defer discipline.

Important Files Changed

Filename Overview
desktop/Desktop/Sources/PendingSaveCounter.swift New @mainactor counter primitive — clean, well-documented, bounded at zero; contract matches all five call sites.
desktop/Desktop/Sources/Providers/ChatProvider.swift Adds pendingSaves to pollForNewMessages guard and brackets all five saveMessage sites; site 4 duplicates end() instead of using defer.
desktop/Desktop/Tests/PendingSaveCounterTests.swift Six synchronous contract tests on @mainactor cover all counter states correctly.

Reviews (1): Last reviewed commit: "fix(desktop): close saveMessage vs poll ..." | Re-trigger Greptile

Comment thread desktop/Desktop/Sources/Providers/ChatProvider.swift
Comment thread desktop/Desktop/Sources/PendingSaveCounter.swift
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9fbcdd2656

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread desktop/Desktop/Sources/Providers/ChatProvider.swift Outdated
eulicesl added a commit to eulicesl/omi-fork that referenced this pull request May 31, 2026
Addresses three review findings on BasedHardware#7559:

- Use `defer { pendingSaves.end() }` at the critical inline AI-save site
  instead of calling end() in both the success and catch branches, so a
  future early-exit can't skip the release and permanently suppress the
  poll (Greptile P2).
- Re-check `!pendingSaves.isActive` after getMessages returns. A proactive
  message appended via appendAssistantMessage while the fetch was in
  flight could otherwise be processed before its UUID reconciles — the
  top-of-poll guard alone doesn't cover saves that begin during the
  fetch (Codex P2).
- Add a debug `assert(count > 0)` in PendingSaveCounter.end() so an
  unbalanced end() surfaces immediately in debug; the release guard
  still bounds it at zero. Updated the bound test to verify the counter
  returns to zero and stays usable across rounds without relying on a
  stray end() (Greptile P2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae0aec65a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread desktop/Desktop/Sources/Providers/ChatProvider.swift Outdated
Copy link
Copy Markdown
Collaborator

@kodjima33 kodjima33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Desktop bug fix: closes saveMessage/poll race with a counter-gated guard, unit tested. Approve only — mergeStateStatus BLOCKED, author needs to rebase before merge.

eulicesl and others added 3 commits May 31, 2026 11:05
`isSending` is released *before* the AI message save completes (so the
next query isn't blocked), which opens a window where
`pollForNewMessages` can observe the just-saved AI message before its
local UUID is reconciled to the server id — and append it again as if
it arrived from another platform. An existing 200-char text-prefix
merge catches most cases but is fragile on short, common replies
("Yes", "Got it") that repeat verbatim across turns.

Adds `PendingSaveCounter`, a `@MainActor` multi-holder counter (mirrors
the existing `ReentrancyGate` pattern, bounded at 0 so stray `end()`
calls can't go negative). Every saveMessage call site brackets its save
with `begin()` / `end()`, and the poll guard gains `!pendingSaves.isActive`
— suppressing the poll for the full lifetime of any in-flight save
rather than relying on text heuristics. The pre-existing merge stays as
a secondary safety net.

The five save sites (user follow-up, AI-from-notification, user turn
start, AI success — the critical one — and partial-AI-on-error) are
each documented inline. They are intentionally not consolidated: they
persist different message types at different lifecycle points, and
collapsing them risks masking other races.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses three review findings on BasedHardware#7559:

- Use `defer { pendingSaves.end() }` at the critical inline AI-save site
  instead of calling end() in both the success and catch branches, so a
  future early-exit can't skip the release and permanently suppress the
  poll (Greptile P2).
- Re-check `!pendingSaves.isActive` after getMessages returns. A proactive
  message appended via appendAssistantMessage while the fetch was in
  flight could otherwise be processed before its UUID reconciles — the
  top-of-poll guard alone doesn't cover saves that begin during the
  fetch (Codex P2).
- Add a debug `assert(count > 0)` in PendingSaveCounter.end() so an
  unbalanced end() surfaces immediately in debug; the release guard
  still bounds it at zero. Updated the bound test to verify the counter
  returns to zero and stays usable across rounds without relying on a
  stray end() (Greptile P2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the poll-suppression hardening: suppressing the poll while
a save is in flight could permanently drop that cycle, since
pollForNewMessages only fires on activation / Cmd+R (no periodic poll).
Messages from other platforms sent while the app was inactive would
then stay unfetched until the next activation (Codex P2).

PendingSaveCounter gains an `onDrained` callback fired when the count
returns to 0. ChatProvider sets `pollDeferredDuringSave` whenever a poll
bails due to an in-flight save (top guard or post-fetch recheck) and, on
drain, re-runs the poll exactly once — only when a cycle was actually
deferred, so the common no-save path adds no extra polling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@eulicesl eulicesl force-pushed the feat/desktop-chat-save-race-fix branch from 24ac362 to f2ad021 Compare May 31, 2026 15:07
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2ad021073

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread desktop/Desktop/Sources/Providers/ChatProvider.swift Outdated
Follow-up to the drain-retry: onDrained cleared pollDeferredDuringSave
before the retry had passed pollForNewMessages's other guards. If the
retry then bailed on isSending/streaming (a new turn started mid-flight),
the deferral was lost and the original poll's other-platform messages
weren't fetched until the next activation (Codex P2).

Clear the flag only inside pollForNewMessages, once it gets past all the
deferral-relevant guards and commits to a fetch — not in onDrained. A
retry that bails early now leaves the flag set, so the next drain (e.g.
the AI-response save, which completes after isSending is released)
retries and succeeds. The post-fetch recheck still re-sets the flag if a
save sneaks in during getMessages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@eulicesl
Copy link
Copy Markdown
Author

Desktop bug fix: closes saveMessage/poll race with a counter-gated guard, unit tested. Approve only — mergeStateStatus BLOCKED, author needs to rebase before merge.

Rebase Completed

@eulicesl
Copy link
Copy Markdown
Author

Rebased onto the latest main, checks green, and all review threads resolved — this one's ready to merge whenever convenient. Thanks for the review!

Copy link
Copy Markdown
Collaborator

@kodjima33 kodjima33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Desktop bug fix (pending-save counter closes saveMessage↔poll race). CLEAN, 308 lines, scoped tests. Approve only — author below the 3-prior-merged-PRs bar in the default gate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants