fix(p2p): drive tx protection release from synced blocks instead of wall clock#23978
Merged
PhilWindle merged 1 commit intoJun 10, 2026
Merged
Conversation
22b27d3 to
d061358
Compare
…all clock The tx pool protects txs referenced by an in-flight block proposal by keying them to the proposal's slot and removing them from the pending indices, then releases the protection once the slot has passed. Release used to be driven by a wall-clock slot monitor polling the epoch cache. That monitor could fire after a proposal's checkpoint landed on L1 but before the block stream delivered blocks-added: the just-landed txs were unprotected into pending, where eviction or nullifier-conflict resolution could delete them before mined-marking ran, and a later reorg had nothing to restore. Make the protection lifecycle fully event-driven and delete the slot monitor: - Release on local validation failure: a proposal that fails validation releases only the protection entries still keyed to its slot, so a tx also referenced by a live proposal at another slot stays protected. - Collect via synced block slots: prepareForSlot now runs inside the blocks-added handler with the last synced block's slot, after mined-marking, so the unprotect-before-mined race is impossible by construction. - Mining now clears a tx's protection entry, since a mined tx supersedes any protection. Protections are in-memory only and are not rehydrated on restart. Proposers are unaffected: the sequencer still calls prepareForSlot(targetSlot) directly.
d061358 to
1405dbf
Compare
PhilWindle
approved these changes
Jun 10, 2026
PhilWindle
pushed a commit
that referenced
this pull request
Jun 11, 2026
## Motivation #23967 stopped the checkpoint-replay storm at its source (a prune must not advance checkpoint-bearing cursors onto an uncheckpointed block). This PR makes the local tips store never *silently* report checkpoint zero for a real block, removes the machinery and degenerate fields that made the silent path possible, and closes the gaps found in review: a store-upgrade path that would brick p2p nodes, a transient proven-tip overshoot on prune, and world-state fabricating checkpoint ids it never had. Fixes A-1174 ## Commits **1. `fix(p2p): resolve checkpoint tips from stored ids and fail loudly on corruption`** - `chain-proven`/`chain-finalized` carry their `CheckpointId`; the store records a checkpoint id per cursor. `getCheckpointId` resolves genesis → stored id → (back-compat) `block→checkpoint` mapping → **throw**, so a real-block cursor with no resolvable checkpoint fails loudly instead of degrading to a replay. - A `proven`/`finalized` cursor can legitimately lead the locally-checkpointed frontier (batch lag / `startingBlock`); carrying the id lets it resolve without a local mapping, so the throw only fires on genuine corruption (never on a legitimate skipped-history prune). **2. `refactor(p2p): drop the block→checkpoint mapping and checkpoint-object store`** - With per-cursor stored ids, the `block→checkpoint` mapping and the checkpoint-object store are dead weight (they existed only to feed `getCheckpointId`). Both backing maps are removed from the KV and memory stores. - `chain-pruned` now carries `checkpointed: L2TipId` (the source's confirmed checkpointed tip) instead of a bare `CheckpointId`. The prune handler clamps any checkpoint-bearing cursor that leads that tip down to it, always landing on a block with a recorded id — no genesis-clamp, no mapping lookup. `handleChainFinalized` collapses to pruning block hashes below the lowest live tip. **3. `refactor(stdlib): drop proposedCheckpoint from the local L2 tips provider`** - `proposedCheckpoint` is degenerate in the local stores (always equal to `checkpointed`) and no consumer reads it — the one reader, `p2p_client`'s `maybeCallPrepareForSlot`, was a dead branch (always false). `L2TipsProvider.getL2Tips` now returns `LocalL2Tips = Omit<L2Tips, 'proposedCheckpoint'>`; the local stores stop maintaining the cursor; the dead p2p branch is removed. - `L2BlockSource.getL2Tips` is a separate interface and keeps the full `L2Tips` with `proposedCheckpoint`, which the sequencer and node still read from the archiver. **4. `fix(p2p): bump p2p store schema version for the per-tip checkpoint id layout`** - An upgraded p2p store would keep its old tips with an empty per-cursor id map, making `getL2Tips` throw on every read with no way to self-heal — failing `P2PClient.start()` outright. Bumping the store schema version (3 → 4) resets the store on upgrade instead. **5. `fix(p2p): clamp the proven tip to the source proven tip on prune`** - `chain-pruned` carried only the checkpointed tip, so a prune that rolled back the proven chain clamped the local proven cursor onto the (higher) checkpointed tip, transiently reporting unproven blocks as proven until the corrective `chain-proven` event landed at the end of the same sync iteration. The event now also carries `proven: L2TipId` and each cursor clamps to its own source tip. **6. `refactor(world-state): stop fabricating checkpoint ids in the world-state tips provider`** - The stream's local-data-provider contract demanded full checkpoint-bearing tips, forcing world-state to hardcode genesis checkpoint ids and a `checkpointed` tip at block zero (violating `finalized ≤ checkpointed`) that nothing ever read. The provider contract is narrowed to `LocalChainTips` — the tips the stream actually consumes, with `checkpointed` required only when emitting checkpoint events — and the stream fails loudly if checkpoint emission is enabled without one. World-state now reports only the proposed/proven/finalized blocks it genuinely tracks; `L2TipsProvider` keeps the full `LocalL2Tips` shape for the p2p/pxe tips stores. **7. `fix(prover-node): consume the reshaped chain-pruned and checkpoint-bearing events`** - The CheckpointStore redesign landed a prover-node consumer of `chain-pruned` (written against the old event shape) on the merge train while this branch was in flight. The prune handler now reads `event.checkpointed.checkpoint` (same semantics as the old field) and the test event constructors are updated to the new shapes. **8. `refactor(world-state): report unresolvable tip hashes as undefined instead of fabricating them`** - World-state's `getL2Tips` fabricated values for hashes it could not resolve: an empty string for the proven/finalized tips and a non-null assertion for the proposed tip. The honest missing case is real (a proven tip ahead of the synced range has no archive leaf to resolve from), so `LocalChainTips` now carries block ids with an optional hash and world-state returns resolved hashes as-is. Local tips stores are unaffected (`LocalL2Tips` with required hashes remains assignable), and the stream reads only block numbers from local tips. ## Breaking / operational notes - `PXE_DATA_SCHEMA_VERSION` bumped: existing PXE DBs are wiped and resync on first open. - p2p store schema version bumped: existing p2p data dirs (including the tx pool) are wiped and resync on upgrade. - `L2BlockStreamEvent` shape changes (internal API): `chain-proven`/`chain-finalized` gain `checkpoint`; `chain-pruned` carries `checkpointed`/`proven` tips instead of a bare `checkpoint`. ## Deferred `maybeCallPrepareForSlot`'s target-slot preparation (prepare the target slot when a proposed checkpoint exists) never worked, because the local store cannot represent a proposed checkpoint ahead of the checkpointed tip. This is handled in #23978.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The prepare-for-slot loop in p2p client was not synced with the
L2BlockStreamevents, meanining theunprotectcall could trigger before the blocks-added flagged the txs as mined.One solution could've been to add a new event to the blockstream on
slot-synced, but it's easier to just remove the polling, and unprotect slots when a block proposal that protected the txs fails. As a safeguard, we still call unprotect based on slot numbers on mined blocks.Problem
The tx pool protects txs referenced by an in-flight block proposal: on gossip receipt, the proposal's txs are keyed to its slot and removed from the pending indices, so the local builder cannot re-select them and eviction cannot drop them while the proposal may still land.
prepareForSlot(S)releases protections from slots beforeS, revalidates the txs, and returns them to pending.Release was driven by a wall-clock slot monitor polling the epoch cache every tick. Three problems:
blocks-added. The just-landed txs are unprotected into pending, where eviction or nullifier-conflict resolution can delete them; when the block then syncs there is nothing left to mark mined, and after a later reorghandlePrunedBlockshas nothing to restore — the tx is lost to the pool.proposedCheckpointfrom the local tips store, where it can never lead the checkpointed tip — removed in fix(p2p)!: resolve checkpoint tips from stored ids #23968 as dead code.)Change
The protection lifecycle becomes fully event-driven and the slot monitor is deleted:
blocks-addedmarks txs mined, superseding protection;chain-prunedun-mines them back to pending. Proposals that landed as proposed-but-unconfirmed checkpoints and are later orphaned are fully handled by this existing lifecycle.prepareForSlotnow runs inside theblocks-addedhandler with the slot of the last synced block, after mined-marking. Any block landing at slot S releases protections from all earlier slots — covering proposals that never reached L1 at all (no quorum, proposer crash, dropped L1 tx), for which no chain event can ever fire. Because it is ordered after mined-marking in the same handler, the unprotect-before-mined race is impossible by construction.prepareForSlot(targetSlot)directly before building, which remains the one legitimate ahead-of-chain preparation.Trade-off accepted
During a multi-slot stall with no blocks landing anywhere, non-proposer pools retain protections until the first block lands (the wall clock used to release them mid-stall). There is no user-visible cost — there is no chain to include txs in during a stall — and the memory held is bounded and self-healing on the first
blocks-added. Proposers self-serve via the direct sequencer call. Protections are in-memory only, so a restart clears them.Fixes A-1173