fix(depot): teach doctor to diagnose reclaimed history via shard-seeded replay by NathanFlurry · Pull Request #5222 · rivet-dev/rivet

NathanFlurry · 2026-06-11T21:13:02Z

No description provided.

…ed replay

NathanFlurry · 2026-06-11T21:13:09Z

Stack for rivet-dev/rivet

docs(depot): add high-level depot SQLite overview, consolidate docs #5224 sxlkwqvz · 2026-06-11 21:13:04
feat(depot): materialize capped database forks on bucket fork access #5223 uouzzvly · 2026-06-11 21:13:03
fix(depot): teach doctor to diagnose reclaimed history via shard-seeded replay #5222 ulwkqwsz · 2026-06-11 21:13:02 👈
feat(depot): watermark-aligned delta reclaim and retention engine #5221 mkqlmnty · 2026-06-11 21:13:00
test(depot): add branch history snapshot helper for exact retention assertions #5220 tlrrrnmu · 2026-06-11 21:12:58
main

Get stack: forklift get 5222
Push local edits: forklift submit
Merge when ready: forklift merge 5222

claude · 2026-06-11T21:16:33Z

Code Review

Overview

This PR teaches the depot doctor to correctly diagnose databases with reclaimed history. When the reclaimer prunes delta history at or below the hot watermark, the doctor previously saw the gaps as corruption. The fix seeds the sequential replay image from hot shard coverage at the watermark and only replays deltas above it, then threads replay_floor_txid through the analysis functions so gap-detection and sub-replay search are bounded to the safe range.

Code Quality

Correctness

The replay_floor_txid.min(selected_txid) re-application inside build_sequential_replay (line 1586) is not redundant even though it was already applied in load_storage_facts. Sub-replays in find_first_bad_txid and find_first_resolver_divergence pass a mid-range commit.txid as selected_txid, so the inner clamp is necessary to prevent the floor from exceeding the target commit. Correct.
The min_txid.max(storage.replay_floor_txid.saturating_add(1)) adjustment in both find_first_bad_txid and find_first_resolver_divergence prevents sub-replays below the floor. Correct.
analyze_commit_chain now starts the contiguity check at replay_floor_txid + 1 rather than txid 1, and includes replay_floor_txid in the JSON output for diagnostics. Correct.

Minor concern — error message lacks context

Line 1591:

.context("hot watermark commit row is missing for sequential replay")?;

If the watermark commit row is absent (e.g., a race where reclaim removes a keep-set row unexpectedly), the error gives no indication of which txid was expected. Consider:

.with_context(|| format!("hot watermark commit row missing for txid {} in sequential replay", replay_floor_txid))?;

Minor concern — O(pages × shard_pages) coverage check

Lines 1627–1633 iterate 1..=current_size and for each page scan all shard pages to detect uncovered pages. For large databases with many pages, this is quadratic. Building a HashSet<u32> of covered pages before the loop would flatten it to O(total_shard_pages + current_size). Given this runs in a diagnostic tool rather than the hot path, it is acceptable as-is, but worth noting.

No issue — separate transaction for watermark read

read_hot_watermark_txid issues its own transaction after the prefix scans in load_storage_facts complete. There is a narrow TOCTOU window if the watermark advances between these two reads. For a doctor diagnostic tool this is fine; just flagging it as an inherent trade-off of reading at different logical timestamps.

Test Coverage

The new doctor_reclaimed.rs test covers the core scenario well:

Two ancient-timestamp commits are folded into shard coverage by hot compaction, then punched by reclaim.
A third commit above the watermark survives with delta history intact.
The doctor is run and the verdict must be Healthy.

Suggestion — assert replay_floor_txid in the report

The test asserts the verdict is Healthy but does not verify that replay_floor_txid appears with the expected value in the analysis JSON. An assertion on report.analysis["commit_chain_analysis"]["replay_floor_txid"] would make the test a regression guard for the new field itself, not just the verdict.

Conventions

Comment style follows project conventions (complete sentences, no em dashes, lowercase log messages).
replay_floor_txid doc comment on the struct field is appropriately concise.
Test file lives under tests/ as required by the depot CLAUDE.md.
No new raw serde_bare encodings introduced.
No _ => fallthrough arms on any match introduced.

Summary

The approach is sound and edge cases are handled correctly: no shards when floor > 0 falls through to a zero-filled seed (correctly diagnosed as corrupted), sub-replays are clamped to above the floor, and contiguity analysis is bounded above the floor. The two items above are minor polish rather than blocking issues.

fix(depot): teach doctor to diagnose reclaimed history via shard-seed…

e0e6ac1

…ed replay

This was referenced Jun 11, 2026

test(depot): add branch history snapshot helper for exact retention assertions #5220

Open

feat(depot): watermark-aligned delta reclaim and retention engine #5221

Open

This was referenced Jun 11, 2026

feat(depot): materialize capped database forks on bucket fork access #5223

Open

docs(depot): add high-level depot SQLite overview, consolidate docs #5224

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(depot): teach doctor to diagnose reclaimed history via shard-seeded replay#5222

fix(depot): teach doctor to diagnose reclaimed history via shard-seeded replay#5222
NathanFlurry wants to merge 1 commit into
stack/feat-depot-watermark-aligned-delta-reclaim-and-retention-engine-mkqlmntyfrom
stack/fix-depot-teach-doctor-to-diagnose-reclaimed-history-via-shard-seeded-replay-ulwkqwsz

NathanFlurry commented Jun 11, 2026

Uh oh!

NathanFlurry commented Jun 11, 2026

Uh oh!

claude Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented Jun 11, 2026

Uh oh!

NathanFlurry commented Jun 11, 2026

Uh oh!

claude Bot commented Jun 11, 2026

Code Review

Overview

Code Quality

Test Coverage

Conventions

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant