confrisk: preparatory changes#1154
Conversation
|
@claude please review |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces preparatory fixes for dynamic confirmation work by improving the robustness of static loop-in and notification delivery flows. The changes focus on preventing system stalls caused by slow subscribers, ensuring proper cleanup of transient resources like invoices during failure scenarios, and improving the reliability of state tracking for loop-in swaps. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces non-blocking notification delivery for subscribers, asynchronous deposit finalization, and automatic invoice cleanup on initialization failures. Specifically, a per-subscriber FIFO queue is added to prevent slow subscribers from blocking the main loop, and FinalizeDepositAction is updated to run asynchronously. Additionally, InitHtlcAction now ensures that pre-created invoices are canceled if an error occurs before persistence. Feedback on the changes highlights a potential memory leak in the FIFO queue implementation where sliced elements are not zeroed out, which could prevent garbage collection of pointer types.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| ntfn := q.pending[0] | ||
| q.pending = q.pending[1:] | ||
| q.Unlock() |
There was a problem hiding this comment.
In Go, slicing a slice (e.g., q.pending = q.pending[1:]) does not zero out the retired element. If T is a pointer type, the underlying array will retain a reference to the popped notification, preventing it from being garbage collected until that index in the underlying array is overwritten or the queue is garbage collected. To avoid this memory leak, explicitly zero out the popped element before slicing.
ntfn := q.pending[0]
var zero T
q.pending[0] = zero
q.pending = q.pending[1:]
q.Unlock()
Code ReviewOverall this is a well-scoped preparatory PR that addresses real failure modes. The fixes are independent and the PR description maps cleanly to the diffs. A few things worth flagging before merge:
|
adfd82a to
283bc5f
Compare
f321b97 to
cb83ba9
Compare
|
@claude please re-review after I added queued notification handling of reservations as well. |
Code ReviewPR: confrisk: preparatory changes The overall direction is sound — each fix targets a real failure mode, test coverage is strong, and the changes are well-scoped. A few areas worth discussing:
|
| Area | Verdict |
|---|---|
| SQL state query bug fix | ✅ Correct and important |
| Async deposit finalization | ✅ Clean fix, good tests |
| Invoice cleanup on init failure | ✅ Correct, minor naming suggestion |
| Notification queue design | |
| Test coverage | ✅ Good |
The main items worth resolving before merge are the PR description/code mismatch on reservation notification queuing, and the recover() approach for the closed-channel race.
starius
left a comment
There was a problem hiding this comment.
LGTM! 💾
A couple of comments left.
|
|
||
| func toJointStringStates(states []fsm.StateType) string { | ||
| return "{" + strings.Join(toStrings(states), ",") + "}" | ||
| return strings.Join(toStrings(states), ",") |
There was a problem hiding this comment.
Good fix! Can you adjust the commit message to explain that now it covers the first and the last states as well, and add a test for first state, please?
There was a problem hiding this comment.
Thanks, I added context to the commit message and added tests for it!
| return | ||
| } | ||
|
|
||
| f.cancelSwapInvoice(ctx) |
There was a problem hiding this comment.
This fix is correct for pre-storage situations.
What happens with an invoice in SignHtlcTx/MonitorInvoiceAndHtlcTx → OnError → UnlockDeposits paths?
There was a problem hiding this comment.
good catch, I made UnlockDepositsAction call cancelSwapInvoice() so we always cancel the invoice on failed swaps.
| case <-deadlineChan: | ||
| // If the server didn't pay the invoice on time, we | ||
| // cancel the invoice and keep monitoring the htlc tx | ||
| // confirmation. We also need to unlock the deposits to | ||
| // re-enable them for loop-ins and withdrawals. | ||
| cancelInvoice() | ||
|
|
||
| event := f.UnlockDepositsAction(ctx, nil) | ||
| if event != fsm.OnError { | ||
| f.Errorf("unable to unlock deposits after " + | ||
| "payment deadline") | ||
| } | ||
|
|
There was a problem hiding this comment.
This guard is dead code. UnlockDepositsAction returns fsm.OnError on both paths — the success path returns fsm.OnError (line 879), and the failure path returns f.HandleError(err), which is also fsm.OnError. So event != fsm.OnError is never true and the "unable to unlock deposits after payment deadline" error is never logged, including when the unlock genuinely fails.
Concretely: when the payment deadline fires we cancel the invoice, try to unlock the deposits, and keep monitoring the htlc. If TransitionDeposits(..., deposit.Deposited) fails here we get no signal for it — the contextual log can't fire — and the FSM keeps running as though the deposits were freed while they actually stay locked until the swap resolves another way (e.g. the htlc timeout sweep). The deposits are the user's funds, so this is a "don't silently fail an operation" issue: low severity, but strictly worse than what's trivially available, since we can just check the real error.
I confirmed the premise on a live regtest run: on a failed init the loop-in FSM entered UnlockDeposits, the deposit transitioned LoopingIn -> Deposited (a successful unlock), and the FSM still emitted OnError -> Failed — i.e. UnlockDepositsAction returns fsm.OnError even on success, which is exactly why this guard can never fire.
Suggested fix: https://gist.github.com/starius/915dee5b117c07f2042170d992274ec6
There was a problem hiding this comment.
Thank you, this is now fixed by
- extract the actual deposit transition into unlockDeposits(ctx) error,
- stop calling UnlockDepositsAction from the payment-deadline path,
- call unlockDeposits(ctx) directly there so we can inspect the real error.
A static address loop-in that creates an invoice but then fails before storage cannot be recovered after restart, and persisted error paths that unlock deposits should not leave the swap invoice live either. The payment-deadline path also needs to report the real deposit unlock error instead of checking the FSM event returned by UnlockDepositsAction, because that action returns OnError on both success and failure. Cancel private swap invoices on pre-storage init failures, monitor timeouts, and stored unlock paths using a detached timeout-limited helper. Factor deposit unlocking into an error-returning helper so deadline handling can log transition failures, and add coverage for early-init cancellation plus generic unlock invoice cleanup and error propagation.
Final deposit states should not stall while deposit locks are held, because a blocked manager receive loop can otherwise hold up the deposit FSM; if shutdown happens before notification delivery, startup recovery can still resume from the final state. Send finalization notifications from a goroutine so final states are recorded without waiting on the manager receive loop, and add tests for blocked manager delivery and shutdown races.
The final-state query previously formatted the state list with braces, which broke the comma-delimited match for both boundary entries: the first state was compared with a leading brace and the last state with a trailing brace. Final-state callers need swaps at both ends of the list, including HtlcTimeoutSwept and Failed, to remain visible once they reach a terminal status. Return the final-state list as plain comma-separated state names, update the store comment to match the query format, and extend the store test so it inserts and retrieves swaps in the first final state, a middle final state, and the last final state without relying on result order.
A slow optional notification subscriber must not stall the manager lock, while recovery and sweep work requests still need reliable local delivery to keep daemon progress deterministic. Treating all subscribers as blocking makes best-effort fanout a backpressure source for unrelated required work. Make reservation fanout nonblocking for slow subscribers, keep recovery and sweep notifications on a required cancellation-aware delivery path, and cover both slow best-effort subscribers and required subscribers in manager tests.
cb83ba9 to
e1f0fd0
Compare
Required notification fanout should not block the manager lock, but subscribers still need ordered delivery once brief backpressure clears. Sending must-deliver notifications directly can couple manager progress to subscriber receive timing, while queueing optional reservation notifications would contradict their best-effort delivery semantics. Add bounded per-subscriber queues for must-deliver notifications, let those queues own channel shutdown instead of relying on recover for closed-channel sends, keep reservation fanout best-effort, and cover queued delivery, queue cleanup, and capacity drops in manager tests.
e1f0fd0 to
0bb06b0
Compare
|
@claude re-review this please |
Code Review:
|
| Area | Verdict |
|---|---|
SQL state-query fix (toJointStringStates) |
✅ Real bug fix, well-tested |
| Notification queue implementation | ✅ Correct; minor documentation gaps |
| Async deposit finalization | ✅ Correct; goroutine leak worth confirming |
| Invoice cleanup on init failure | ✅ Correct; inline unlockDeposits bypass worth clarifying |
| Test coverage | ✅ Comprehensive; one minor timing concern |
Test data bug fix (testReservationId2) |
✅ Pre-existing bug correctly fixed |
The most actionable items before merge:
- Clarify whether the payment-deadline path in
MonitorInvoiceAndHtlcTxActionshould callcancelSwapInvoice()(currently only callscancelInvoice()on the subscription). - Confirm
quitChanis always closed on all shutdown paths to bound the goroutine fromFinalizeDepositAction. - Add a comment to
toJointStringStatesexplaining the no-braces requirement to prevent future regression.
extracted commits from #1141
This PR pulls out general correctness and notification-delivery fixes ahead of the dynamic confirmation work.
It fixes a few independent issues in static address loop-in/deposit handling and hardens notification fanout so slow subscribers cannot
block unrelated notification delivery.
What
Why
These changes reduce failure modes that are not specific to dynamic confirmations, but become more important as static loop-ins rely more
heavily on notification-driven flows.
Moving them first keeps the dynamic confirmation PR focused on the actual feature behavior while landing independent fixes separately.