Skip to content

fix(partitions): persist data on vsr backups, add data integrity test#3512

Open
hubcio wants to merge 2 commits into
masterfrom
data-integrity-vsr
Open

fix(partitions): persist data on vsr backups, add data integrity test#3512
hubcio wants to merge 2 commits into
masterfrom
data-integrity-vsr

Conversation

@hubcio

@hubcio hubcio commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

VSR's hash chain and checksum-keyed recovery require a committed op's
on-disk bytes to match on every replica. Two defects broke that: the
partition commit path drained the pipeline, which only the primary
fills, so backups journaled replicated prepares but never flushed them
(0-byte segments); and append re-stamped base_timestamp from a local
now(), diverging bytes even once persisted.

commit_journal now falls back to the journal when the pipeline is
empty, so backups persist like the metadata plane. base_timestamp
reuses the prepare's monotonic timestamp, stamped once by the primary
and replicated. The 3-node data integrity test is un-ignored and gates
this.

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 19, 2026
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.24%. Comparing base (4a48008) to head (aff604f).

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3512      +/-   ##
============================================
- Coverage     74.27%   74.24%   -0.03%     
  Complexity      937      937              
============================================
  Files          1259     1259              
  Lines        125969   125917      -52     
  Branches     101644   101636       -8     
============================================
- Hits          93558    93489      -69     
+ Misses        29396    29369      -27     
- Partials       3015     3059      +44     
Components Coverage Δ
Rust Core 75.17% <ø> (+0.01%) ⬆️
Java SDK 58.57% <ø> (ø)
C# SDK 71.40% <ø> (-0.71%) ⬇️
Python SDK 88.88% <ø> (ø)
PHP SDK 84.29% <ø> (ø)
Node SDK 91.22% <ø> (ø)
Go SDK 40.36% <ø> (ø)
Files with missing lines Coverage Δ
core/consensus/src/impls.rs 79.02% <ø> (-0.22%) ⬇️
core/consensus/src/plane_helpers.rs 91.46% <ø> (ø)
core/journal/src/prepare_journal.rs 86.64% <ø> (ø)
core/metadata/src/impls/metadata.rs 39.07% <ø> (+0.17%) ⬆️
core/partitions/src/iggy_partition.rs 42.64% <ø> (+0.45%) ⬆️
core/partitions/src/journal.rs 32.85% <ø> (+0.37%) ⬆️
core/shard/src/lib.rs 73.77% <ø> (-0.17%) ⬇️

... and 17 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hubcio hubcio force-pushed the data-integrity-vsr branch from 26b7423 to fa2c7f2 Compare June 19, 2026 11:22
VSR's hash chain and checksum-keyed recovery require a committed
op's on-disk bytes to match on every replica. Backups persisted
0-byte segments: the partition commit path drained the in-memory
pipeline, which only the primary fills. And the append path
stamped base_timestamp from a local now(), diverging bytes per
node even once persisted.

Backups now source committable ops from the journal when the
pipeline is empty, and commit_messages flushes only the committed
prefix (op <= commit_max), keeping the uncommitted tail resident.
A backup thus never writes uncommitted bytes to its segment and
never drops the headers a later commit needs - which would
otherwise wedge commit_min below commit_max. base_timestamp
reuses the prepare's monotonic timestamp, stamped once by the
primary and replicated. A 3-node data-integrity test gates the
cross-replica byte-identity.
@hubcio hubcio force-pushed the data-integrity-vsr branch 3 times, most recently from 220f3d6 to a82ddbf Compare June 19, 2026 17:59
A disconnect storm spawns many concurrent in-process Logout submits
on shard 0, breaking two invariants that held only under serial
submission.

dispatch_prepare_and_await (metadata plane) snapshotted view and
commit_min before on_replicate().await and asserted them unchanged
after. A sibling on_ack legitimately advances commit_min while a
task is parked in the await, and a view change advances the view,
tripping the debug_assert. The snapshots fed only the assert; no
release-mode logic read them. Both conditions are handled
downstream - a view change drops the reply_sender so the receiver
resolves Canceled, and loopback acks are op-routed - so the assert
and snapshots are removed.

A new primary floors its view-change pipeline rebuild at the max
commit across the DoViewChange quorum. The DVC carried commit_min
(locally applied), which can lag commit_max (known committed) by
more than the pipeline depth: on_ack pops the committable prefix at
once then applies it across an await per entry, while concurrent
submits push op ahead. The rebuild range then exceeds
PIPELINE_PREPARE_QUEUE_MAX and the assert, or the pipeline push,
panics the shard in release builds. Carry commit_max instead: every
replica holds op - commit_max <= pipeline depth, so the rebuilt
range stays within capacity, while quorum intersection keeps the
floor at or below the winner's op. The committed-but-unapplied tail
is replayed by CommitJournal. Only the value written to the
DoViewChange commit field changes; the wire layout is untouched.
@hubcio hubcio force-pushed the data-integrity-vsr branch from a82ddbf to aff604f Compare June 19, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review PR is waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant