test: fix intermittent IS lock timeouts by syncing mempools before waiting by UdjinM6 · Pull Request #7241 · dashpay/dash

UdjinM6 · 2026-03-21T20:50:29Z

Issue being fixed or feature implemented

When mocktime is set, NodeClock::now() returns a frozen value. After bump_mocktime(), the trickle relay timer fires once, then m_next_inv_send_time is set ahead of the frozen clock and never fires again. If a TX inv misses that single relay window, it stays stuck in the sender's mempool indefinitely — regardless of real-time elapsed.

What was done?

Add sync_mempools() after bump_mocktime() in all test paths that send a TX and immediately wait for an IS lock. This ensures all nodes (especially MNs that need to sign) have the TX before polling begins.

How Has This Been Tested?

Run tests, should no longer fail in these parts

Breaking Changes

n/a

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated relevant unit/integration/functional/e2e tests
I have made corresponding changes to the documentation
I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

…iting When mocktime is set, NodeClock::now() returns a frozen value. After bump_mocktime(), the trickle relay timer fires once, then m_next_inv_send_time is set ahead of the frozen clock and never fires again. If a TX inv misses that single relay window, it stays stuck in the sender's mempool indefinitely — regardless of real-time elapsed. Add sync_mempools() after bump_mocktime() in all test paths that send a TX and immediately wait for an IS lock. This ensures all nodes (especially MNs that need to sign) have the TX before polling begins. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-21T20:50:59Z

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

coderabbitai · 2026-03-21T20:53:58Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7ef8ae76-6d4e-4f9d-9076-f626e352f8b3

📥 Commits

Reviewing files that changed from the base of the PR and between 003e50b and d8ac781.

📒 Files selected for processing (3)

test/functional/feature_llmq_singlenode.py
test/functional/feature_notifications.py
test/functional/interface_zmq_dash.py

🚧 Files skipped from review as they are similar to previous changes (1)

test/functional/interface_zmq_dash.py

Walkthrough

The test framework's wait_for_instantlock was generalized from a single txid/node call to accept multiple txids, an explicit nodes list, a timeout, and a skip_sync flag. The implementation now optionally bumps mock time and syncs mempools before polling, and it requires InstantLock=true for all provided txids on each node in the nodes list. Multiple functional tests were updated to remove per-node wait loops, individual mock-time bumps, and some mempool syncs, replacing them with centralized calls to the new wait_for_instantlock signature (sometimes using skip_sync=True).

Sequence Diagram(s)

sequenceDiagram
    participant TH as TestHarness
    participant N as Node(s)
    participant M as Mempools
    participant IS as InstantSend/Quorum

    TH->>N: send/broadcast tx(s)
    alt skip_sync is False
        TH->>TH: wait_for_instantlock(*txids, nodes=..., timeout, skip_sync=False)
        TH->>N: bump_mocktime(+30s)
        TH->>M: sync_mempools() rgba(0,128,255,0.5)
    else skip_sync is True
        TH->>TH: wait_for_instantlock(*txids, nodes=..., skip_sync=True)
    end
    N->>IS: propagate tx(s)
    IS-->>N: instantlock produced
    TH->>N: poll getrawtransaction/getmempool for InstantLock on all txids/nodes

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: fixing intermittent InstantSend lock timeouts by ensuring mempools are synchronized before waiting for locks.
Description check	✅ Passed	The description clearly explains the issue, the solution, and how it was tested, with all details relating directly to the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 62d47f7476

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-21T20:55:29Z

test/functional/feature_llmq_is_retroactive.py

        # 3 nodes should be enough to create an IS lock even if nodes 4 and 5 (which have no tx itself)
        # are the only "neighbours" in intra-quorum connections for one of them.
        self.bump_mocktime(30)
+        self.sync_mempools(self.nodes[:3])


Sync three masternodes before waiting for the normal IS lock

In this test node 0 is only the control node while nodes 1-4 are the masternodes (feature_llmq_is_retroactive.py:11-13), and mine_cycle_quorum() uses the DIP0024 test quorum, which needs 3 members to sign. self.nodes[:3] only covers node0, node1, and node2, so this wait guarantees the tx reached at most two signers. If node3 is the third quorum member and it misses the relay window, wait_for_instantlock() can still time out here, leaving the intermittent failure in the main "trying normal IS lock" path.

Useful? React with 👍 / 👎.

thepastaclaw

Code Review

The PR correctly fixes the identified intermittent IS lock timeouts by replacing wait_for_tx with sync_mempools, which auto-advances mocktime during polling and thus unfreezes the trickle relay timer. However, the same unpatched send→bump_mocktime→wait_for_instantlock pattern exists in feature_llmq_is_cl_conflicts.py (no whitelist, no mitigation), and the now-unused wait_for_tx method should be removed.

Reviewed commit: 62d47f7

🟡 2 suggestion(s) | 💬 1 nitpick(s)

3 additional findings

🟡 suggestion: Same unpatched send→bump_mocktime→wait_for_instantlock pattern

test/functional/feature_llmq_is_cl_conflicts.py (lines 244-248)

This test has no -whitelist on any node, yet follows the exact pattern the PR is fixing: sendrawtransaction(rawtx2) → bump_mocktime(60) → wait_for_instantlock on all nodes. Under frozen mocktime, the trickle relay timer fires once after the bump then freezes, so rawtx2 may never reach masternodes on other nodes. The larger bump (60 vs 30) doesn't help because m_next_inv_send_time still gets set ahead of the frozen clock after the single fire. sync_mempools() would fix this because it auto-advances mocktime by 3s on each poll iteration (line 854-855 of test_framework.py), repeatedly unfreezing the relay timer.

💡 Suggested change

        self.nodes[0].sendrawtransaction(rawtx2)
        # bump mocktime to force tx relay
        self.bump_mocktime(60)
        self.sync_mempools()
        for node in self.nodes:
            self.wait_for_instantlock(rawtx2_txid, node)

🟡 suggestion: Same pattern in ZMQ test at three locations (lines 298, 351, 385)

test/functional/interface_zmq_dash.py (lines 298-300)

Three instances of send→bump_mocktime(30)→wait_for_instantlock without sync_mempools exist in this file. Node 0 has -whitelist=127.0.0.1 (line 120), which bypasses trickle relay for relay from node 0 to directly connected peers, significantly reducing the risk. However, the PR itself adds sync_mempools to feature_llmq_is_retroactive.py which also has -whitelist=127.0.0.1 on node 0, suggesting whitelist alone may not be fully sufficient (e.g., if relay between masternodes over intra-quorum connections is needed). Worth considering for consistency, but lower priority since the whitelist provides substantial mitigation.

💬 nitpick: wait_for_tx is now dead code

test/functional/feature_llmq_is_retroactive.py (lines 39-46)

All four call sites of wait_for_tx have been replaced with sync_mempools. Grep confirms the method name only appears at its definition (line 39). It should be removed to avoid confusion.

🤖 Prompt for all review comments with AI agents

These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `test/functional/feature_llmq_is_cl_conflicts.py`:
- [SUGGESTION] lines 244-248: Same unpatched send→bump_mocktime→wait_for_instantlock pattern
  This test has no `-whitelist` on any node, yet follows the exact pattern the PR is fixing: `sendrawtransaction(rawtx2)` → `bump_mocktime(60)` → `wait_for_instantlock` on all nodes. Under frozen mocktime, the trickle relay timer fires once after the bump then freezes, so rawtx2 may never reach masternodes on other nodes. The larger bump (60 vs 30) doesn't help because `m_next_inv_send_time` still gets set ahead of the frozen clock after the single fire. `sync_mempools()` would fix this because it auto-advances mocktime by 3s on each poll iteration (line 854-855 of test_framework.py), repeatedly unfreezing the relay timer.

In `test/functional/interface_zmq_dash.py`:
- [SUGGESTION] lines 298-300: Same pattern in ZMQ test at three locations (lines 298, 351, 385)
  Three instances of send→bump_mocktime(30)→wait_for_instantlock without sync_mempools exist in this file. Node 0 has `-whitelist=127.0.0.1` (line 120), which bypasses trickle relay for relay *from* node 0 to directly connected peers, significantly reducing the risk. However, the PR itself adds sync_mempools to `feature_llmq_is_retroactive.py` which also has `-whitelist=127.0.0.1` on node 0, suggesting whitelist alone may not be fully sufficient (e.g., if relay between masternodes over intra-quorum connections is needed). Worth considering for consistency, but lower priority since the whitelist provides substantial mitigation.

…ead code Add sync_mempools() to two more tests with the same vulnerable pattern: - feature_llmq_is_cl_conflicts.py: no whitelist, waits on all nodes - interface_zmq_dash.py: three locations (low risk due to whitelist, but fixed for consistency) Remove dead wait_for_tx() method from feature_llmq_is_retroactive.py whose call sites were all replaced with sync_mempools in the prior commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

thepastaclaw

Incremental Review (`62d47f7` → `a0e4260`)

Reviewed the incremental changes — no new issues found.

Changes reviewed:

feature_llmq_is_cl_conflicts.py — sync_mempools() correctly added between bump_mocktime(60) and the wait_for_instantlock loop at line 247
feature_llmq_is_retroactive.py — removed wait_for_tx method confirmed as dead code (no remaining callers)
interface_zmq_dash.py — three new sync_mempools() calls at lines 300, 354, 389 follow the established bump_mocktime → sync_mempools → wait_for_instantlock pattern

The single-node test (feature_llmq_singlenode.py) correctly omits sync_mempools() as there's nothing to sync.

✅ All changes look correct.

knst

concept ack, ~~but check 841858c~~

Check c79c2d9 ; it fixes interface_zmq_dash.py which has been broken by previous patch

… mocktime

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c79c2d981d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

test/functional/feature_llmq_singlenode.py

test/functional/feature_notifications.py

thepastaclaw

Code Review

The refactoring cleanly centralizes the bump+sync+wait pattern into wait_for_instantlock and all callers are correctly converted to the new variadic signature. Prior findings (missing sync in feature_llmq_is_cl_conflicts.py, dead wait_for_tx code) are fully addressed. However, the nodes parameter now conflates sync scope with check scope: callers passing nodes=[single_node] make sync_mempools a no-op (one node is trivially synced with itself), potentially reintroducing the propagation gap this PR aims to fix.

Reviewed commit: 003e50b

🟡 3 suggestion(s)

🤖 Prompt for all review comments with AI agents

These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `test/functional/test_framework/test_framework.py`:
- [SUGGESTION] lines 1949-1954: `nodes` parameter conflates sync scope with check scope — single-node callers skip effective mempool sync
  The `nodes` parameter controls both which nodes are passed to `sync_mempools(nodes)` and which nodes are checked for the IS lock. When callers pass `nodes=[self.nodes[0]]`, `sync_mempools` with a single node returns immediately (trivially satisfied — one node's mempool always equals itself). No multi-hop mocktime bumping occurs.

The original code at sites like `rpc_verifyislock.py:44` and `interface_zmq_dash.py:384` did `sync_mempools()` (all nodes), which kept bumping mocktime by 3s each iteration until full propagation. Since `check_instantlock`'s `wait_until` loop does NOT bump mocktime, quorum members that didn't receive the tx in the single `bump_mocktime(30)` relay window will never get it — the exact failure mode this PR addresses.

Consider always syncing all connected nodes' mempools regardless of which nodes are checked for the IS lock — e.g., `self.sync_mempools()` instead of `self.sync_mempools(nodes)`.
- [SUGGESTION] lines 1958-1968: Combined `all()` check gives multi-tx callers a shared timeout budget instead of per-pair budgets
  The refactored helper waits for `all(node.getrawtransaction(txid, True)["instantlock"] for txid in txids for node in nodes)` inside a single `wait_until(..., timeout=60)`. Before this change, call sites like `feature_llmq_is_cl_conflicts.py` waited each tx/node pair independently, so each pair got its own 60-second budget. Now `self.wait_for_instantlock(rawtx1_txid, rawtx4_txid)` gives every node and both txids only one shared 60-second window.

In practice this is unlikely to cause issues since IS locks typically arrive within seconds of each other, but it's a semantic change worth noting. If timing becomes a problem on slow CI, consider iterating per-txid inside the helper.

In `test/functional/interface_zmq_dash.py`:
- [SUGGESTION] line 384: `nodes=[self.nodes[0]]` drops all-node sync that was in original code
  The original code did `self.sync_mempools()` (all 6 nodes) then `self.wait_for_instantlock(collateral, self.nodes[0])`. The new code does `self.wait_for_instantlock(collateral, nodes=[self.nodes[0]])` which only syncs node0's mempool (a no-op). The governance collateral tx needs to reach quorum members for the IS lock to form. Simplest fix: use the default `nodes=self.nodes`.

test/functional/test_framework/test_framework.py

test/functional/interface_zmq_dash.py

…[0]] wait_for_instantlock(nodes=[self.nodes[0]]) makes the internal sync_mempools a no-op since it only syncs the sender with itself. The MN signer nodes still need the TX, so drop the nodes= restriction to sync all nodes including dynamically added masternodes. Fixes: feature_llmq_singlenode.py (2 sites), feature_notifications.py, interface_zmq_dash.py (governance collateral) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

thepastaclaw

Code Review

Automated code review.

Reviewed commit: d8ac781

thepastaclaw · 2026-03-23T12:28:27Z

Incremental Review (`003e50b` → `d8ac781`)

Reviewed the incremental changes — no new issues found.

Changes reviewed: Single commit removes nodes=[self.nodes[0]] from wait_for_instantlock in interface_zmq_dash.py, switching to the default all-node sync. This directly addresses our prior suggestion about the governance collateral tx needing to reach quorum members.

Prior findings resolved:

✅ nodes=[self.nodes[0]] drops all-node sync that was in original code — now fixed

Reviewed commit: d8ac781

UdjinM6 added this to the 24 milestone Mar 21, 2026

chatgpt-codex-connector bot reviewed Mar 21, 2026

View reviewed changes

thepastaclaw reviewed Mar 21, 2026

View reviewed changes

github-actions bot mentioned this pull request Mar 22, 2026

backport: bitcoin#22764, #24505, #28230, #29277, #28597 #7242

Draft

knst reviewed Mar 23, 2026

View reviewed changes

refactor: improve is-waiting helper by including mempool sync bump of…

c79c2d9

… mocktime

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

test/functional/feature_llmq_singlenode.py Outdated Show resolved Hide resolved

test/functional/feature_notifications.py Outdated Show resolved Hide resolved

thepastaclaw reviewed Mar 23, 2026

View reviewed changes

test/functional/test_framework/test_framework.py Show resolved Hide resolved

test/functional/test_framework/test_framework.py Show resolved Hide resolved

test/functional/interface_zmq_dash.py Outdated Show resolved Hide resolved

UdjinM6 force-pushed the fix/test-is-lock-sync-mempools branch from 003e50b to d8ac781 Compare March 23, 2026 11:30

thepastaclaw reviewed Mar 23, 2026

View reviewed changes

Conversation

UdjinM6 commented Mar 21, 2026

Issue being fixed or feature implemented

What was done?

How Has This Been Tested?

Breaking Changes

Checklist:

Uh oh!

github-actions bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ No Merge Conflicts Detected

Uh oh!

coderabbitai bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

thepastaclaw left a comment

Choose a reason for hiding this comment

Code Review

3 additional findings

Uh oh!

thepastaclaw left a comment

Choose a reason for hiding this comment

Incremental Review (62d47f7 → a0e4260)

Uh oh!

knst left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

thepastaclaw left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thepastaclaw left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

thepastaclaw commented Mar 23, 2026

Incremental Review (003e50b → d8ac781)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 21, 2026 •

edited

Loading

coderabbitai bot commented Mar 21, 2026 •

edited

Loading

Incremental Review (`62d47f7` → `a0e4260`)

knst left a comment •

edited

Loading

Incremental Review (`003e50b` → `d8ac781`)