Skip to content

Persisted duplicate-claim completion action can trip reload assertion #4738

Description

@joostjager

Note: This draft was AI-generated and may contain mistakes. Reproduces on main (7b63ffa)

This was found while triaging a rare crash from chanmon consistency fuzzing. The
fuzz case was reduced to a focused functional test that does not use splice
commands.

Problem

MonitorUpdateCompletionAction::FreeDuplicateClaimImmediately is documented as
an in-memory action that should not be written to disk. In one duplicate HTLC
claim path, however, it can be pushed into monitor_update_blocked_actions while
a previous-hop monitor update is still in flight. If the ChannelManager is
serialized in that state and then reloaded, deserialization finds the queued
FreeDuplicateClaimImmediately action and hits the debug assertion:

Non-event-generating channel freeing should not appear in our queue

Affected behavior / minimal sequence

The minimized functional reproducer uses three nodes, A, B, and C:

  1. Route a payment A -> B -> C.
  2. C claims the payment and sends B an update_fulfill_htlc.
  3. B learns the preimage and tries to claim the previous-hop HTLC from A.
  4. B's A-B monitor update is left InProgress, so B does not yet emit the
    fulfill back to A.
  5. C goes on-chain with its commitment and HTLC-success transaction.
  6. B's B-C monitor observes the same preimage on-chain and replays the claim.
  7. The replay is a duplicate claim against A-B, but A-B still has an in-flight
    monitor update, so the duplicate-free action is queued.
  8. Serializing and reloading B hits the assertion during ChannelManager read.

Expected behavior

Reload should not assert on valid duplicate monitor replay. Either the
duplicate-free action should be handled immediately, or it should be represented
in persisted state in a way that the reload path accepts and processes
correctly.

Actual behavior

The duplicate-free action can be serialized in monitor_update_blocked_actions.
On reload, the read path treats that variant as invalid queued state and triggers
the debug assertion above.

Why this looks like a project bug

This does not appear to depend on fuzz-harness-only behavior. The focused
reproducer uses normal functional-test helpers for channels, payments, async
monitor persistence, block connection, and node reload. It also does not require
splicing.

The on-chain duplicate preimage path is expected in general: a monitor can later
report an HTLC resolution that the ChannelManager already learned offchain.
The bug appears to be the combination of that duplicate claim with an in-flight
previous-hop monitor update, which causes an action documented as non-persistent
to enter serialized manager state.

Impact

The verified impact is a debug-assertion crash on reload. The assertion is a
debug_assert!, so the exact panic is not expected in normal release builds.

A remote downstream peer can influence the payment, message, and on-chain parts
of the sequence, but the exact state also requires local timing: async monitor
persistence must leave the previous-hop update in flight, and the node must
serialize and reload in that window. I have not verified funds loss. I also have
not yet verified whether release builds always recover cleanly, or whether they
can leave a channel temporarily or permanently blocked after the invalid action
is persisted.

Focused reproducer test

#[test]
#[should_panic(expected = "Non-event-generating channel freeing should not appear in our queue")]
fn test_duplicate_onchain_claim_reloaded_with_inflight_prev_hop_update() {
	let chanmon_cfgs = create_chanmon_cfgs(3);
	let node_cfgs = create_node_cfgs(3, &chanmon_cfgs);

	let persister;
	let chain_mon;
	let node_b_reload;

	let legacy_cfg = test_legacy_channel_config();
	let node_chanmgrs = create_node_chanmgrs(
		3,
		&node_cfgs,
		&[Some(legacy_cfg.clone()), Some(legacy_cfg.clone()), Some(legacy_cfg)],
	);
	let mut nodes = create_network(3, &node_cfgs, &node_chanmgrs);

	let node_b_id = nodes[1].node.get_our_node_id();
	let node_c_id = nodes[2].node.get_our_node_id();

	let chan_id_ab = create_announced_chan_between_nodes(&nodes, 0, 1).2;
	let chan_bc = create_announced_chan_between_nodes(&nodes, 1, 2);

	// This payment gives B both an inbound edge, A-B, and an outbound edge, B-C. The crash
	// requires B to first learn the preimage from C, then later learn the same preimage again
	// from the B-C monitor after C has gone on-chain.
	let (payment_preimage, payment_hash, ..) =
		route_payment(&nodes[0], &[&nodes[1], &nodes[2]], 1_000_000);

	nodes[2].node.claim_funds(payment_preimage);
	check_added_monitors(&nodes[2], 1);
	expect_payment_claimed!(nodes[2], payment_hash, 1_000_000);

	// Leave B's A-B preimage monitor update in flight. This is the critical persistence state:
	// B has enough information to claim from A, but the monitor update that makes the preimage
	// durable for the inbound edge has not completed.
	chanmon_cfgs[1].persister.set_update_ret(ChannelMonitorUpdateStatus::InProgress);
	let cs_updates = get_htlc_update_msgs(&nodes[2], &node_b_id);
	nodes[1].node.handle_update_fulfill_htlc(node_c_id, cs_updates.update_fulfill_htlcs[0].clone());
	check_added_monitors(&nodes[1], 1);
	// If B emitted the fulfill here, the A-B monitor update would not be blocked and the later
	// duplicate claim would free inline instead of being parked for monitor completion.
	assert!(nodes[1].node.get_and_clear_pending_msg_events().is_empty());

	// C's local commitment still contains the HTLC because B has only received the fulfill, not
	// completed C's commitment_signed exchange. Mining C's commitment plus HTLC-success makes B's
	// monitor observe the same preimage through the on-chain path.
	let cs_txn = get_local_commitment_txn!(nodes[2], chan_bc.2);
	assert!(cs_txn.len() >= 2, "Expected commitment + HTLC-success tx, got {}", cs_txn.len());

	mine_transaction(&nodes[1], &cs_txn[0]);
	check_closed_broadcast(&nodes[1], 1, true);
	check_added_monitors(&nodes[1], 1);
	let events = nodes[1].node.get_and_clear_pending_events();
	assert!(events.iter().any(|e| matches!(e, Event::ChannelClosed { .. })));

	mine_transaction(&nodes[1], &cs_txn[1]);
	connect_blocks(&nodes[1], ANTI_REORG_DELAY);
	// Draining events here forces B to process the monitor HTLC event before serialization. The
	// replayed preimage is a duplicate against A-B, but A-B still has an in-flight monitor update,
	// so the duplicate-free action is queued instead of run immediately.
	assert!(nodes[1].node.get_and_clear_pending_events().is_empty());

	// Reload with the manager serialized after the duplicate action was queued. During read, the
	// queued action is found in monitor_update_blocked_actions even though that action class is
	// expected to be purely in-memory and non-persistent.
	let mon_ab = get_monitor!(nodes[1], chan_id_ab).encode();
	let mon_bc = get_monitor!(nodes[1], chan_bc.2).encode();
	let manager_b = nodes[1].node.encode();
	reload_node!(nodes[1], &manager_b, &[&mon_ab, &mon_bc], persister, chain_mon, node_b_reload);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions