Skip to content

Upstream VssStoreBuilder and VssStore to lightning-persister#4323

Open
tnull wants to merge 13 commits intolightningdevkit:mainfrom
tnull:2026-01-upstream-vss-store
Open

Upstream VssStoreBuilder and VssStore to lightning-persister#4323
tnull wants to merge 13 commits intolightningdevkit:mainfrom
tnull:2026-01-upstream-vss-store

Conversation

@tnull
Copy link
Contributor

@tnull tnull commented Jan 19, 2026

Since the beginning of VSS we intended to upstream the corresponding KVStore implementation to lightning-persister.

Here, we finally do it. I'll put this in draft until lightningdevkit/ldk-node#755 lands so we can directly upstream the version already upgraded for pubkey-auth.

@tnull tnull requested review from TheBlueMatt and tankyleo January 19, 2026 15:03
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jan 19, 2026

🎉 This PR is now ready for review!
Please choose at least one reviewer by assigning them on the right bar.
If no reviewers are assigned within 10 minutes, I'll automatically assign one.
Once the first reviewer has submitted a review, a second will be assigned if required.

@tnull tnull marked this pull request as draft January 19, 2026 15:03
@tnull tnull force-pushed the 2026-01-upstream-vss-store branch 3 times, most recently from 0817a96 to 2e8b7cd Compare January 19, 2026 15:10
@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.17%. Comparing base (d830f10) to head (9686cb9).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4323      +/-   ##
==========================================
- Coverage   86.18%   86.17%   -0.01%     
==========================================
  Files         160      160              
  Lines      107536   107536              
  Branches   107536   107536              
==========================================
- Hits        92680    92670      -10     
- Misses      12231    12237       +6     
- Partials     2625     2629       +4     
Flag Coverage Δ
tests 86.17% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Maybe should wait on vss-client removing the reqwest and url deps as well? Maybe more importantly base64 which is a solo-maintainer crate and can be implemented in 50 LoC.

@tnull
Copy link
Contributor Author

tnull commented Jan 20, 2026

Nice. Maybe should wait on vss-client removing the reqwest and url deps as well? Maybe more importantly base64 which is a solo-maintainer crate and can be implemented in 50 LoC.

The former is done here lightningdevkit/vss-client#56, for url I'll have to first spend some more time on my Url implementation and upstream it, so depending how timelines work out I wouldn't want to hold for that.

@ldk-claude-review-bot
Copy link
Collaborator

ldk-claude-review-bot commented Mar 23, 2026

Review Summary

Two new issues found beyond prior review comments:

Inline comments posted

  1. lightning-persister/src/vss_store.rs:992-1003 — Test vss_read_write_remove_list_persist will panic: it uses #[test] (no tokio runtime) but VssStore::new and all KVStoreSync methods call tokio::task::block_in_place, which requires a multi-threaded tokio runtime context.

  2. lightning-persister/src/vss_store.rs:789-792 — V0 schema version is never persisted, making detection heuristic-based on every startup. If bdk_wallet data is ever deleted from a V0 store while other data remains, the store will be misclassified as V1, causing data loss (wrong key format + wrong AAD for decryption).

Previously flagged issues still open (not re-posted)

  • .unwrap() on runtime build (line 110) — should propagate via ?
  • Missing tokio sync feature in Cargo.toml — relies on transitive enablement from vss-client-ng
  • Drop::drop block_in_place panics outside tokio runtime (line 378)
  • Version counter overflow check after fetch_add (line 178)
  • Lazy-remove JoinHandle silently dropped (lines 270, 356)
  • Unnecessary lightning-macros dependency (Cargo.toml line 20)

@tnull
Copy link
Contributor Author

tnull commented Mar 23, 2026

Now rebased / updated to incorporate latest VSS changes from LDK Node. Should be good for review.

LDK Node draft PR: lightningdevkit/ldk-node#837

@tnull tnull force-pushed the 2026-01-upstream-vss-store branch from d72fb40 to 124e08d Compare March 23, 2026 11:58
tnull added 6 commits March 23, 2026 14:24
Since the beginning of VSS we intended to upstream the corresponding
`KVStore` implementation to `lightning-persister`. Here, we finally do
it.

Signed-off-by: Elias Rohrer <dev@tnull.de>
Bump `vss-client-ng` 0.4 -> 0.5, add `build_with_sigs_auth()` method,
rename `build()` to `build_with_lnurl()`, replace `RandEntropySource`
with `VssEntropySource` backed by LDK's `RandomBytes`, drop `rand`
from the `vss` feature in favor of `getrandom`, re-export `vss_client`
from the crate root, remove `AuthProviderSetupFailed` error variant, and
update tests to use `VssStoreBuilder`.

Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
Fix test import path: `crate::io::test_utils` -> `crate::test_utils`.

Co-Authored-By: HAL 9000
Stop `list_all_keys` on empty page tokens

Handle both `None` and empty `next_page_token` values so VSS
pagination terminates regardless of which sentinel the server uses.

Co-Authored-By: HAL 9000
Reject missing VSS object values explicitly

Convert malformed responses into `io::Error`s instead of panicking when
`value` is missing from `get_object` replies.

Co-Authored-By: HAL 9000
@tnull tnull force-pushed the 2026-01-upstream-vss-store branch from 124e08d to dd583b0 Compare March 23, 2026 13:51
tnull added 7 commits March 23, 2026 14:56
Use an MSRV-compatible closure in `write_internal`

Replace the async closure syntax in `write_internal` with the older
`move || async move` pattern supported on Rust 1.75.

Co-Authored-By: HAL 9000
Move owned buffers into `StorableBuilder`

Pass the existing `Vec<u8>` to `StorableBuilder::build` instead of
cloning it before every write.

Co-Authored-By: HAL 9000
Use an MSRV-compatible closure in `remove_internal`

Replace the async closure syntax in `remove_internal` with the older
`move || async move` pattern supported on Rust 1.75.

Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
Wait for the VSS server and use `checkout@v4`

Poll the server before running tests so the workflow does not race the
startup path, and align the new job with the repository's current
`actions/checkout` version.

Co-Authored-By: HAL 9000
Drop the leftover workflow debug listing

Remove the stray `ls -la` output so the VSS test job only emits the
signal needed for debugging failures.

Co-Authored-By: HAL 9000
Keep the V0 prefix and extraction logic aligned with stores that write
empty-namespace keys without any `#` separator.

Co-Authored-By: HAL 9000
@tnull tnull force-pushed the 2026-01-upstream-vss-store branch from dd583b0 to 9686cb9 Compare March 23, 2026 13:56
bitcoin = "0.32.2"
lightning = { version = "0.3.0", path = "../lightning" }
lightning-macros = { version = "0.2.0", path = "../lightning-macros", optional = true }
tokio = { version = "1.35", optional = true, default-features = false, features = ["rt-multi-thread"] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code uses tokio::sync::Mutex extensively (7+ call sites in vss_store.rs), but the tokio dependency only declares features = ["rt-multi-thread"]. The sync feature is required for tokio::sync::Mutex to be available.

This likely compiles today only because vss-client-ng transitively enables tokio/sync. If that crate ever changes its tokio feature requirements, this will break.

Suggested change
tokio = { version = "1.35", optional = true, default-features = false, features = ["rt-multi-thread"] }
tokio = { version = "1.35", optional = true, default-features = false, features = ["rt-multi-thread", "sync"] }

.worker_threads(INTERNAL_RUNTIME_WORKERS)
.max_blocking_threads(INTERNAL_RUNTIME_WORKERS)
.build()
.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: .unwrap() will panic if the OS fails to create the runtime (e.g., resource limits, thread creation failure). Since this function returns io::Result<Self>, the error should be propagated instead:

Suggested change
.unwrap();
.build()?;

Comment on lines +992 to +1003
#[test]
fn vss_read_write_remove_list_persist() {
let vss_base_url = std::env::var("TEST_VSS_BASE_URL").unwrap();
let mut rng = rng();
let rand_store_id: String = (0..7).map(|_| rng.sample(Alphanumeric) as char).collect();
let mut vss_seed = [0u8; 32];
rng.fill_bytes(&mut vss_seed);
let vss_xprv = Xpriv::new_master(bitcoin::Network::Testnet, &vss_seed).unwrap();
let vss_store = VssStoreBuilder::new(vss_xprv, vss_base_url, rand_store_id)
.build_with_fixed_headers(HashMap::new())
.unwrap();
do_read_write_remove_list_persist(&vss_store);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: This test will panic at runtime. VssStore::new (line 128) and all KVStoreSync methods (lines 207, 238, 272, 290) use tokio::task::block_in_place, which panics if called from outside a multi-threaded tokio runtime. This #[test] runs on a plain OS thread with no tokio runtime context.

The second test (vss_read_write_remove_list_persist_in_runtime_context) correctly uses #[tokio::test(flavor = "multi_thread", worker_threads = 1)].

Either make this test #[tokio::test(flavor = "multi_thread")] as well, or handle the non-runtime case in the constructor/sync methods:

fn block_on_internal<F: Future>(runtime: &tokio::runtime::Runtime, fut: F) -> F::Output {
    match tokio::runtime::Handle::try_current() {
        Ok(_) => tokio::task::block_in_place(|| runtime.handle().block_on(fut)),
        Err(_) => runtime.handle().block_on(fut),
    }
}

If the intent is that VssStore always requires a tokio runtime context (even for sync usage), this needs to be documented on the struct/builder, and this test should be updated.

Comment on lines +789 to +792
let wallet_data_present = !response.key_versions.is_empty();
if wallet_data_present {
// If the wallet data is present, it means we're not running for the first time.
Ok(VssSchemaVersion::V0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential data loss: When V0 is detected, the schema version is NOT written to storage. This means every future startup re-runs the detection heuristic (checking for bdk_wallet# prefix). If a V0 store ever loses all its bdk_wallet data (e.g., wallet reset, selective key deletion) while retaining other data (channel monitors, etc.), the next startup will misclassify it as a fresh V1 store. This causes:

  1. All existing V0 keys become inaccessible (V1 uses obfuscated namespace prefixes, V0 uses plaintext)
  2. Decryption fails (V1 uses the store key as AAD, V0 uses empty AAD)

Consider persisting VssSchemaVersion::V0 to storage (like V1 does) to make the detection durable:

if wallet_data_present {
    // Persist V0 schema version so future startups don't rely on the heuristic
    let schema_version = VssSchemaVersion::V0;
    let encoded_version = schema_version.encode();
    let storable_builder = StorableBuilder::new(VssEntropySource(entropy_source));
    let vss_version = -1;
    let aad = store_key.as_bytes();
    let storable = storable_builder.build(encoded_version, vss_version, &data_encryption_key, aad);
    let request = PutObjectRequest {
        store_id: store_id.clone(),
        global_version: None,
        transaction_items: vec![KeyValue { key: store_key, version: vss_version, value: storable.encode_to_vec() }],
        delete_items: vec![],
    };
    // Best-effort write; if it fails, we'll re-detect next time
    let _ = client.put_object(&request).await;
    Ok(schema_version)
} else {

// `PeerManager::process_pending_events` -> `ChannelManager::get_and_clear_pending_msg_events`
// would deadlock when trying to acquire sync `Mutex` locks that are held by the thread
// currently being blocked waiting on the VSS operation to finish.
internal_runtime: Option<tokio::runtime::Runtime>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does it make sense to upstream this hack or should we go ahead and split the store struct into an async and sync version? I assume pretty soon LDK Node will only need/want the async version anyway?

Ok(keys)
}

async fn execute_locked_write<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we DRY this up with the filesystem store at all?

// Check if any `bdk_wallet` data was written by listing keys under the respective
// (unobfuscated) prefix.
const V0_BDK_WALLET_PREFIX: &str = "bdk_wallet#";
let request = ListKeyVersionsRequest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, its a bit weird to upstream LDK Node-specific migration logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Goal: Merge

Development

Successfully merging this pull request may close these issues.

4 participants