fix(desktop): close saveMessage vs poll race with a pending-save counter#7559
fix(desktop): close saveMessage vs poll race with a pending-save counter#7559eulicesl wants to merge 4 commits into
Conversation
Greptile SummaryThis PR introduces
Confidence Score: 4/5The change is purely additive in-memory synchronization and cannot regress existing behaviour when no saves are in flight. All five saveMessage sites are correctly bracketed with matching begin()/end() on both success and error paths. The only gap is site 4 duplicating end() in the do/catch instead of using defer — a future edit adding an early exit could forget end() and suppress the poll indefinitely, but this is a maintenance concern rather than a current defect. The inline do/catch at site 4 in ChatProvider.swift is the only spot worth a second look to ensure future edits maintain the end()/defer discipline. Important Files Changed
Reviews (1): Last reviewed commit: "fix(desktop): close saveMessage vs poll ..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9fbcdd2656
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Addresses three review findings on BasedHardware#7559: - Use `defer { pendingSaves.end() }` at the critical inline AI-save site instead of calling end() in both the success and catch branches, so a future early-exit can't skip the release and permanently suppress the poll (Greptile P2). - Re-check `!pendingSaves.isActive` after getMessages returns. A proactive message appended via appendAssistantMessage while the fetch was in flight could otherwise be processed before its UUID reconciles — the top-of-poll guard alone doesn't cover saves that begin during the fetch (Codex P2). - Add a debug `assert(count > 0)` in PendingSaveCounter.end() so an unbalanced end() surfaces immediately in debug; the release guard still bounds it at zero. Updated the bound test to verify the counter returns to zero and stays usable across rounds without relying on a stray end() (Greptile P2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ae0aec65a5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
kodjima33
left a comment
There was a problem hiding this comment.
Desktop bug fix: closes saveMessage/poll race with a counter-gated guard, unit tested. Approve only — mergeStateStatus BLOCKED, author needs to rebase before merge.
`isSending` is released *before* the AI message save completes (so the
next query isn't blocked), which opens a window where
`pollForNewMessages` can observe the just-saved AI message before its
local UUID is reconciled to the server id — and append it again as if
it arrived from another platform. An existing 200-char text-prefix
merge catches most cases but is fragile on short, common replies
("Yes", "Got it") that repeat verbatim across turns.
Adds `PendingSaveCounter`, a `@MainActor` multi-holder counter (mirrors
the existing `ReentrancyGate` pattern, bounded at 0 so stray `end()`
calls can't go negative). Every saveMessage call site brackets its save
with `begin()` / `end()`, and the poll guard gains `!pendingSaves.isActive`
— suppressing the poll for the full lifetime of any in-flight save
rather than relying on text heuristics. The pre-existing merge stays as
a secondary safety net.
The five save sites (user follow-up, AI-from-notification, user turn
start, AI success — the critical one — and partial-AI-on-error) are
each documented inline. They are intentionally not consolidated: they
persist different message types at different lifecycle points, and
collapsing them risks masking other races.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses three review findings on BasedHardware#7559: - Use `defer { pendingSaves.end() }` at the critical inline AI-save site instead of calling end() in both the success and catch branches, so a future early-exit can't skip the release and permanently suppress the poll (Greptile P2). - Re-check `!pendingSaves.isActive` after getMessages returns. A proactive message appended via appendAssistantMessage while the fetch was in flight could otherwise be processed before its UUID reconciles — the top-of-poll guard alone doesn't cover saves that begin during the fetch (Codex P2). - Add a debug `assert(count > 0)` in PendingSaveCounter.end() so an unbalanced end() surfaces immediately in debug; the release guard still bounds it at zero. Updated the bound test to verify the counter returns to zero and stays usable across rounds without relying on a stray end() (Greptile P2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the poll-suppression hardening: suppressing the poll while a save is in flight could permanently drop that cycle, since pollForNewMessages only fires on activation / Cmd+R (no periodic poll). Messages from other platforms sent while the app was inactive would then stay unfetched until the next activation (Codex P2). PendingSaveCounter gains an `onDrained` callback fired when the count returns to 0. ChatProvider sets `pollDeferredDuringSave` whenever a poll bails due to an in-flight save (top guard or post-fetch recheck) and, on drain, re-runs the poll exactly once — only when a cycle was actually deferred, so the common no-save path adds no extra polling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
24ac362 to
f2ad021
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f2ad021073
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Follow-up to the drain-retry: onDrained cleared pollDeferredDuringSave before the retry had passed pollForNewMessages's other guards. If the retry then bailed on isSending/streaming (a new turn started mid-flight), the deferral was lost and the original poll's other-platform messages weren't fetched until the next activation (Codex P2). Clear the flag only inside pollForNewMessages, once it gets past all the deferral-relevant guards and commits to a fetch — not in onDrained. A retry that bails early now leaves the flag set, so the next drain (e.g. the AI-response save, which completes after isSending is released) retries and succeeds. The post-fetch recheck still re-sets the flag if a save sneaks in during getMessages. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rebase Completed |
|
Rebased onto the latest |
kodjima33
left a comment
There was a problem hiding this comment.
Desktop bug fix (pending-save counter closes saveMessage↔poll race). CLEAN, 308 lines, scoped tests. Approve only — author below the 3-prior-merged-PRs bar in the default gate.
Summary
isSendingis released before the AI message's save finishes (deliberately, to unblock the next query), sopollForNewMessagescan run during that save'sawait, observe the just-saved AI message before its local UUID has been reconciled to the server id, and append it again as if it arrived from another platform.PendingSaveCounter— a@MainActormulti-holder counter. EverysaveMessagecall site brackets its save withbegin()/end(), and the poll guard gains!pendingSaves.isActive, suppressing the poll for the full lifetime of any in-flight save. This closes the race window directly instead of relying on text heuristics.pollForNewMessagesstays as a secondary safety net. The five save sites are not consolidated (see Risks). No backend, telemetry, or protocol changes.Linked Issue / Bounty
Root Cause (bug fixes only)
isSendingis cleared before the AI-response save resolves, so the!isSendingpoll guard is already satisfied while the save is still in flight. The poll then reads the saved-but-not-yet-reconciled message as new-from-another-platform and duplicates it.How It Works
PendingSaveCountermirrors the existingReentrancyGatepattern but supports multiple concurrent holders (several saves can be in flight at once).end()is bounded at 0 so a stray call can't drive the count negative and wedge the poll off.saveMessagesites callsbegin()before launching the save andend()on both the success and catch paths (viaMainActor.runwhere the surroundingTaskclosure isn't@MainActor).!isSending, !isLoading, !isLoadingSessionsgains!pendingSaves.isActive.Change Type
Scope
Security Impact
saveMessage/ poll calls, only their interleaving is gated.Testing
xcrun swift build -c debug --package-path Desktop→Build complete!PendingSaveCounterTests(6 tests): fresh-is-inactive, begin activates, end decrements, multiple holders stack (the concurrent-save case),end()bounded at zero, and interleaved/out-of-order begin/end.Human Verification
main; all 6 counter tests pass.end()not driving the count negative; out-of-order begin/end pairs.APIClient.saveMessagedelay driven through the fullChatProviderlifecycle, which is larger test-infrastructure work. The synchronization primitive's contract is verified in isolation; the cross-flow behavior is asserted by construction (the poll cannot run while any save holds the counter).Evidence
Docs
Performance, Privacy, and Reliability
Migration / Backward Compatibility
Risks and Mitigations
begin()without a matchingend()would suppress the poll indefinitely. Mitigation: every site callsend()on both success and catch paths;end()is bounded at 0; the counter is unit-tested for the interleaved/stray-end cases.AI Disclosure
begin()/end()on success and error paths.