Note: This draft was AI-generated and may contain mistakes. Reproduces on main (7b63ffa)
This was found while triaging a rare crash from chanmon consistency fuzzing. The
fuzz case was reduced to a focused functional test that does not use splice
commands.
Problem
MonitorUpdateCompletionAction::FreeDuplicateClaimImmediately is documented as
an in-memory action that should not be written to disk. In one duplicate HTLC
claim path, however, it can be pushed into monitor_update_blocked_actions while
a previous-hop monitor update is still in flight. If the ChannelManager is
serialized in that state and then reloaded, deserialization finds the queued
FreeDuplicateClaimImmediately action and hits the debug assertion:
Non-event-generating channel freeing should not appear in our queue
Affected behavior / minimal sequence
The minimized functional reproducer uses three nodes, A, B, and C:
- Route a payment A -> B -> C.
- C claims the payment and sends B an
update_fulfill_htlc.
- B learns the preimage and tries to claim the previous-hop HTLC from A.
- B's A-B monitor update is left
InProgress, so B does not yet emit the
fulfill back to A.
- C goes on-chain with its commitment and HTLC-success transaction.
- B's B-C monitor observes the same preimage on-chain and replays the claim.
- The replay is a duplicate claim against A-B, but A-B still has an in-flight
monitor update, so the duplicate-free action is queued.
- Serializing and reloading B hits the assertion during
ChannelManager read.
Expected behavior
Reload should not assert on valid duplicate monitor replay. Either the
duplicate-free action should be handled immediately, or it should be represented
in persisted state in a way that the reload path accepts and processes
correctly.
Actual behavior
The duplicate-free action can be serialized in monitor_update_blocked_actions.
On reload, the read path treats that variant as invalid queued state and triggers
the debug assertion above.
Why this looks like a project bug
This does not appear to depend on fuzz-harness-only behavior. The focused
reproducer uses normal functional-test helpers for channels, payments, async
monitor persistence, block connection, and node reload. It also does not require
splicing.
The on-chain duplicate preimage path is expected in general: a monitor can later
report an HTLC resolution that the ChannelManager already learned offchain.
The bug appears to be the combination of that duplicate claim with an in-flight
previous-hop monitor update, which causes an action documented as non-persistent
to enter serialized manager state.
Impact
The verified impact is a debug-assertion crash on reload. The assertion is a
debug_assert!, so the exact panic is not expected in normal release builds.
A remote downstream peer can influence the payment, message, and on-chain parts
of the sequence, but the exact state also requires local timing: async monitor
persistence must leave the previous-hop update in flight, and the node must
serialize and reload in that window. I have not verified funds loss. I also have
not yet verified whether release builds always recover cleanly, or whether they
can leave a channel temporarily or permanently blocked after the invalid action
is persisted.
Focused reproducer test
#[test]
#[should_panic(expected = "Non-event-generating channel freeing should not appear in our queue")]
fn test_duplicate_onchain_claim_reloaded_with_inflight_prev_hop_update() {
let chanmon_cfgs = create_chanmon_cfgs(3);
let node_cfgs = create_node_cfgs(3, &chanmon_cfgs);
let persister;
let chain_mon;
let node_b_reload;
let legacy_cfg = test_legacy_channel_config();
let node_chanmgrs = create_node_chanmgrs(
3,
&node_cfgs,
&[Some(legacy_cfg.clone()), Some(legacy_cfg.clone()), Some(legacy_cfg)],
);
let mut nodes = create_network(3, &node_cfgs, &node_chanmgrs);
let node_b_id = nodes[1].node.get_our_node_id();
let node_c_id = nodes[2].node.get_our_node_id();
let chan_id_ab = create_announced_chan_between_nodes(&nodes, 0, 1).2;
let chan_bc = create_announced_chan_between_nodes(&nodes, 1, 2);
// This payment gives B both an inbound edge, A-B, and an outbound edge, B-C. The crash
// requires B to first learn the preimage from C, then later learn the same preimage again
// from the B-C monitor after C has gone on-chain.
let (payment_preimage, payment_hash, ..) =
route_payment(&nodes[0], &[&nodes[1], &nodes[2]], 1_000_000);
nodes[2].node.claim_funds(payment_preimage);
check_added_monitors(&nodes[2], 1);
expect_payment_claimed!(nodes[2], payment_hash, 1_000_000);
// Leave B's A-B preimage monitor update in flight. This is the critical persistence state:
// B has enough information to claim from A, but the monitor update that makes the preimage
// durable for the inbound edge has not completed.
chanmon_cfgs[1].persister.set_update_ret(ChannelMonitorUpdateStatus::InProgress);
let cs_updates = get_htlc_update_msgs(&nodes[2], &node_b_id);
nodes[1].node.handle_update_fulfill_htlc(node_c_id, cs_updates.update_fulfill_htlcs[0].clone());
check_added_monitors(&nodes[1], 1);
// If B emitted the fulfill here, the A-B monitor update would not be blocked and the later
// duplicate claim would free inline instead of being parked for monitor completion.
assert!(nodes[1].node.get_and_clear_pending_msg_events().is_empty());
// C's local commitment still contains the HTLC because B has only received the fulfill, not
// completed C's commitment_signed exchange. Mining C's commitment plus HTLC-success makes B's
// monitor observe the same preimage through the on-chain path.
let cs_txn = get_local_commitment_txn!(nodes[2], chan_bc.2);
assert!(cs_txn.len() >= 2, "Expected commitment + HTLC-success tx, got {}", cs_txn.len());
mine_transaction(&nodes[1], &cs_txn[0]);
check_closed_broadcast(&nodes[1], 1, true);
check_added_monitors(&nodes[1], 1);
let events = nodes[1].node.get_and_clear_pending_events();
assert!(events.iter().any(|e| matches!(e, Event::ChannelClosed { .. })));
mine_transaction(&nodes[1], &cs_txn[1]);
connect_blocks(&nodes[1], ANTI_REORG_DELAY);
// Draining events here forces B to process the monitor HTLC event before serialization. The
// replayed preimage is a duplicate against A-B, but A-B still has an in-flight monitor update,
// so the duplicate-free action is queued instead of run immediately.
assert!(nodes[1].node.get_and_clear_pending_events().is_empty());
// Reload with the manager serialized after the duplicate action was queued. During read, the
// queued action is found in monitor_update_blocked_actions even though that action class is
// expected to be purely in-memory and non-persistent.
let mon_ab = get_monitor!(nodes[1], chan_id_ab).encode();
let mon_bc = get_monitor!(nodes[1], chan_bc.2).encode();
let manager_b = nodes[1].node.encode();
reload_node!(nodes[1], &manager_b, &[&mon_ab, &mon_bc], persister, chain_mon, node_b_reload);
}
Note: This draft was AI-generated and may contain mistakes. Reproduces on main (7b63ffa)
This was found while triaging a rare crash from chanmon consistency fuzzing. The
fuzz case was reduced to a focused functional test that does not use splice
commands.
Problem
MonitorUpdateCompletionAction::FreeDuplicateClaimImmediatelyis documented asan in-memory action that should not be written to disk. In one duplicate HTLC
claim path, however, it can be pushed into
monitor_update_blocked_actionswhilea previous-hop monitor update is still in flight. If the
ChannelManagerisserialized in that state and then reloaded, deserialization finds the queued
FreeDuplicateClaimImmediatelyaction and hits the debug assertion:Affected behavior / minimal sequence
The minimized functional reproducer uses three nodes, A, B, and C:
update_fulfill_htlc.InProgress, so B does not yet emit thefulfill back to A.
monitor update, so the duplicate-free action is queued.
ChannelManagerread.Expected behavior
Reload should not assert on valid duplicate monitor replay. Either the
duplicate-free action should be handled immediately, or it should be represented
in persisted state in a way that the reload path accepts and processes
correctly.
Actual behavior
The duplicate-free action can be serialized in
monitor_update_blocked_actions.On reload, the read path treats that variant as invalid queued state and triggers
the debug assertion above.
Why this looks like a project bug
This does not appear to depend on fuzz-harness-only behavior. The focused
reproducer uses normal functional-test helpers for channels, payments, async
monitor persistence, block connection, and node reload. It also does not require
splicing.
The on-chain duplicate preimage path is expected in general: a monitor can later
report an HTLC resolution that the
ChannelManageralready learned offchain.The bug appears to be the combination of that duplicate claim with an in-flight
previous-hop monitor update, which causes an action documented as non-persistent
to enter serialized manager state.
Impact
The verified impact is a debug-assertion crash on reload. The assertion is a
debug_assert!, so the exact panic is not expected in normal release builds.A remote downstream peer can influence the payment, message, and on-chain parts
of the sequence, but the exact state also requires local timing: async monitor
persistence must leave the previous-hop update in flight, and the node must
serialize and reload in that window. I have not verified funds loss. I also have
not yet verified whether release builds always recover cleanly, or whether they
can leave a channel temporarily or permanently blocked after the invalid action
is persisted.
Focused reproducer test