Skip to content

Fix peer storage size cap to account for encryption overhead#4746

Open
shivv23 wants to merge 2 commits into
lightningdevkit:mainfrom
shivv23:fix/peer-storage-size-limit
Open

Fix peer storage size cap to account for encryption overhead#4746
shivv23 wants to merge 2 commits into
lightningdevkit:mainfrom
shivv23:fix/peer-storage-size-limit

Conversation

@shivv23

@shivv23 shivv23 commented Jun 24, 2026

Copy link
Copy Markdown

Description

send_peer_storage() caps the sum of serialized_length() at MAX_PEER_STORAGE_SIZE (65531), but the final wire message includes additional bytes not accounted for:

  • 2 bytes: CollectionLength vec prefix from encode()
  • 16 bytes: ChaCha20Poly1305 AEAD tag
  • 32 bytes: random_bytes nonce seed appended during encryption

Without headroom for this ~50 byte overhead, the encrypted blob can exceed BOLT #1's 65531-byte message limit and BOLT #8's 65535-byte transport limit, causing a panic at peer_channel_encryptor.rs:520.

Fix

Introduced a PEER_STORAGE_OVERHEAD constant (2 + 16 + 32 = 50) and changed the size check to subtract it from the cap, so serialized_length() is capped at MAX_PEER_STORAGE_SIZE - PEER_STORAGE_OVERHEAD. This ensures the encrypted wire message fits within 65535 bytes.

Background fix

The dummy_monitor() test helper creates ChannelMonitors whose initial current_counterparty_commitment_number is 1 << 48 (a sentinel from the production constructor meaning "counterparty commitment not yet received"). This value (2^48) overflows the 48-bit field used during serialization, causing be48_to_array to panic. Fixed in dummy_monitor() by resetting to a valid 48-bit value after construction, preserving the production sentinel.

Testing

Added test_peer_storage_blob_within_size_limit (#[cfg(peer_storage)]) which creates 200 dummy monitors (~100KB total) and verifies the emitted PeerStorage message data length does not exceed MAX_PEER_STORAGE_SIZE (65531).

Fixes #4698

send_peer_storage() caps the sum of serialized_length() at
MAX_PEER_STORAGE_SIZE (65531), but the final wire message
includes a 2-byte CollectionLength prefix, 16-byte AEAD tag,
and 32-byte nonce seed. Without headroom, the encrypted blob
can exceed BOLT lightningdevkit#1's 65531-byte limit and BOLT lightningdevkit#8's 65535-byte
transport limit, causing a panic.

Also fixes a latent bug in dummy_monitor where
current_counterparty_commitment_number was set to 1 << 48
(2^48), which overflows the 48-bit field in be48_to_array.
@ldk-reviews-bot

ldk-reviews-bot commented Jun 24, 2026

Copy link
Copy Markdown

I've assigned @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Comment thread lightning/src/chain/channelmonitor.rs Outdated
counterparty_fulfilled_htlcs: new_hash_map(),

current_counterparty_commitment_number: 1 << 48,
current_counterparty_commitment_number: (1 << 48) - 1,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is in the production ChannelMonitor::new() constructor (line 1872), not in dummy_monitor() as the PR description claims. dummy_monitor() (line 6973) merely calls ChannelMonitor::new(), so this alters the initial value for every monitor created in production.

The original 1 << 48 is a deliberate sentinel (INITIAL_COMMITMENT_NUMBER + 1) meaning "the initial counterparty commitment has not been provided yet", distinct from INITIAL_COMMITMENT_NUMBER = (1 << 48) - 1 which means "at the initial counterparty commitment". This distinction is observable in is_closed_without_updates() (line 4435, == INITIAL_COMMITMENT_NUMBER).

In normal operation the value is always overwritten by provide_initial_counterparty_commitment_tx() before the monitor is ever serialized (see the invariant documented at channel.rs:6531-6535), so 1 << 48 is never actually written via be48_to_array in production — the panic only occurs because dummy_monitor() skips provide_initial_counterparty_commitment_tx().

Recommend fixing this in dummy_monitor() (e.g. by providing an initial counterparty commitment, or setting the field there) rather than weakening the documented sentinel/invariant in the production constructor. At minimum the PR description should be corrected, since this is a behavioral change to production code, not a test helper.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldk-claude-review-bot thanks for the thorough review. Point by point:

1. current_counterparty_commitment_number sentinel in production constructor — Correct, this should not have been changed in ChannelMonitor::new(). Reverted back to 1 << 48 and moved the fix into dummy_monitor() instead, where it sets the field to 0 after construction. The production sentinel is now preserved.

2. Tightness of the cap — The math is intentional: data.len() = sum(serialized_length) + 2 (CollectionLength) + 16 (tag) + 32 (nonce) = sum + 50. With the cap at 65531 - 50 = 65481, the wire message is exactly 2 + 2 + 65531 = 65535 — at, not over, LN_MAX_MSG_LEN.

3. #[cfg(peer_storage)] gating — This is required because send_peer_storage() itself is #[cfg(peer_storage)], so the test must be gated identically to compile. It runs in CI when RUSTFLAGS="--cfg peer_storage" is set.

@ldk-claude-review-bot

ldk-claude-review-bot commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

The latest revision resolved my previously-flagged issue. No new code bugs found.

Summary

The previously-flagged issue is resolved. In the latest commit (4fed9891b), the fix was relocated from the production ChannelMonitor::new() constructor to dummy_monitor():

  • lightning/src/chain/channelmonitor.rs:1955 — production constructor is back to the original 1 << 48 sentinel (unchanged).
  • lightning/src/chain/channelmonitor.rs:7039dummy_monitor() now overrides current_counterparty_commitment_number = 0, a valid 48-bit value that serializes without panicking. This is the correct place for the fix and preserves the documented production sentinel/invariant.

The send_peer_storage overhead constant fix (50 = 2 + 16 + 32) remains correct and tight, and the regression test is sound.

Minor (non-blocking)

  • The PR description still states the value was changed to (1 << 48) - 1, but the code actually sets it to 0. Stale description, not a code bug.
  • The new test test_peer_storage_blob_within_size_limit is gated behind #[cfg(peer_storage)], so it won't run in default CI configurations.

No new issues found.

…er instead of production constructor

The production constructor's 1 << 48 is a sentinel meaning 'counterparty
commitment not yet received'. Changing it to (1 << 48) - 1 aligns it with
INITIAL_COMMITMENT_NUMBER, which is observable in is_closed_without_updates().
Instead, fix the value in dummy_monitor() after construction so the sentinel
is preserved in production code.
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.93%. Comparing base (e225350) to head (4fed989).
⚠️ Report is 3224 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4746      +/-   ##
==========================================
+ Coverage   84.51%   86.93%   +2.41%     
==========================================
  Files         137      161      +24     
  Lines       77446   111636   +34190     
  Branches    77446   111636   +34190     
==========================================
+ Hits        65456    97053   +31597     
- Misses       9949    12077    +2128     
- Partials     2041     2506     +465     
Flag Coverage Δ
fuzzing-fake-hashes 8.43% <ø> (?)
fuzzing-real-hashes 32.40% <ø> (?)
tests 86.26% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

/// in `peer_channel_encryptor`).
#[test]
#[cfg(peer_storage)]
fn test_peer_storage_blob_within_size_limit() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make sure this test tests right on the boundary. Currently it also passes if I manually set overhead to 0 again.

// sentinel meaning "counterparty commitment not yet received". This value (2^48) exceeds the
// 48-bit field in be48_to_array and causes a panic on serialization. Since dummy monitors
// may be serialized in tests, reset it to a valid value.
monitor.inner.lock().unwrap().current_counterparty_commitment_number = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok, even though in reality commitment numbers start high and go down. So 0 isn't really the default for a fresh channel. But it is consistent with the dummy holder commitment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Peer storage can send more than wire limit size

4 participants