fix(p2p): drive tx protection release from synced blocks instead of wall clock by spalladino · Pull Request #23978 · AztecProtocol/aztec-packages

spalladino · 2026-06-10T00:52:08Z

The prepare-for-slot loop in p2p client was not synced with the L2BlockStream events, meanining the unprotect call could trigger before the blocks-added flagged the txs as mined.

One solution could've been to add a new event to the blockstream on slot-synced, but it's easier to just remove the polling, and unprotect slots when a block proposal that protected the txs fails. As a safeguard, we still call unprotect based on slot numbers on mined blocks.

Problem

The tx pool protects txs referenced by an in-flight block proposal: on gossip receipt, the proposal's txs are keyed to its slot and removed from the pending indices, so the local builder cannot re-select them and eviction cannot drop them while the proposal may still land. prepareForSlot(S) releases protections from slots before S, revalidates the txs, and returns them to pending.

Release was driven by a wall-clock slot monitor polling the epoch cache every tick. Three problems:

Race against mined-marking. The monitor can fire after a proposal's checkpoint lands on L1 but before the block stream delivers blocks-added. The just-landed txs are unprotected into pending, where eviction or nullifier-conflict resolution can delete them; when the block then syncs there is nothing left to mark mined, and after a later reorg handlePrunedBlocks has nothing to restore — the tx is lost to the pool.
Clock dependency. The epoch cache is wall-clock derived and explicitly depends on system clock sync; unprotection correctness should depend on observed chain state instead.
Pipelining blind spot. Gossiped proposals carry future target slots during proposer pipelining, so wall-clock release frees them late. (The old target-slot branch that tried to address this read proposedCheckpoint from the local tips store, where it can never lead the checkpointed tip — removed in fix(p2p)!: resolve checkpoint tips from stored ids #23968 as dead code.)

Change

The protection lifecycle becomes fully event-driven and the slot monitor is deleted:

Protect on gossip receipt of a block proposal (unchanged).
Release on local validation failure: a proposal that fails validation immediately releases the protections it created — only entries still keyed to that proposal's slot, so a tx also referenced by a live proposal at another slot stays protected.
Resolve via chain events (unchanged): blocks-added marks txs mined, superseding protection; chain-pruned un-mines them back to pending. Proposals that landed as proposed-but-unconfirmed checkpoints and are later orphaned are fully handled by this existing lifecycle.
Collect silent deaths via synced block slots: prepareForSlot now runs inside the blocks-added handler with the slot of the last synced block, after mined-marking. Any block landing at slot S releases protections from all earlier slots — covering proposals that never reached L1 at all (no quorum, proposer crash, dropped L1 tx), for which no chain event can ever fire. Because it is ordered after mined-marking in the same handler, the unprotect-before-mined race is impossible by construction.
Proposers are unaffected: the sequencer already calls prepareForSlot(targetSlot) directly before building, which remains the one legitimate ahead-of-chain preparation.

Trade-off accepted

During a multi-slot stall with no blocks landing anywhere, non-proposer pools retain protections until the first block lands (the wall clock used to release them mid-stall). There is no user-visible cost — there is no chain to include txs in during a stall — and the memory held is bounded and self-healing on the first blocks-added. Proposers self-serve via the direct sequencer call. Protections are in-memory only, so a restart clears them.

Fixes A-1173

…all clock The tx pool protects txs referenced by an in-flight block proposal by keying them to the proposal's slot and removing them from the pending indices, then releases the protection once the slot has passed. Release used to be driven by a wall-clock slot monitor polling the epoch cache. That monitor could fire after a proposal's checkpoint landed on L1 but before the block stream delivered blocks-added: the just-landed txs were unprotected into pending, where eviction or nullifier-conflict resolution could delete them before mined-marking ran, and a later reorg had nothing to restore. Make the protection lifecycle fully event-driven and delete the slot monitor: - Release on local validation failure: a proposal that fails validation releases only the protection entries still keyed to its slot, so a tx also referenced by a live proposal at another slot stays protected. - Collect via synced block slots: prepareForSlot now runs inside the blocks-added handler with the last synced block's slot, after mined-marking, so the unprotect-before-mined race is impossible by construction. - Mining now clears a tx's protection entry, since a mined tx supersedes any protection. Protections are in-memory only and are not rehydrated on restart. Proposers are unaffected: the sequencer still calls prepareForSlot(targetSlot) directly.

## Motivation #23967 stopped the checkpoint-replay storm at its source (a prune must not advance checkpoint-bearing cursors onto an uncheckpointed block). This PR makes the local tips store never *silently* report checkpoint zero for a real block, removes the machinery and degenerate fields that made the silent path possible, and closes the gaps found in review: a store-upgrade path that would brick p2p nodes, a transient proven-tip overshoot on prune, and world-state fabricating checkpoint ids it never had. Fixes A-1174 ## Commits **1. `fix(p2p): resolve checkpoint tips from stored ids and fail loudly on corruption`** - `chain-proven`/`chain-finalized` carry their `CheckpointId`; the store records a checkpoint id per cursor. `getCheckpointId` resolves genesis → stored id → (back-compat) `block→checkpoint` mapping → **throw**, so a real-block cursor with no resolvable checkpoint fails loudly instead of degrading to a replay. - A `proven`/`finalized` cursor can legitimately lead the locally-checkpointed frontier (batch lag / `startingBlock`); carrying the id lets it resolve without a local mapping, so the throw only fires on genuine corruption (never on a legitimate skipped-history prune). **2. `refactor(p2p): drop the block→checkpoint mapping and checkpoint-object store`** - With per-cursor stored ids, the `block→checkpoint` mapping and the checkpoint-object store are dead weight (they existed only to feed `getCheckpointId`). Both backing maps are removed from the KV and memory stores. - `chain-pruned` now carries `checkpointed: L2TipId` (the source's confirmed checkpointed tip) instead of a bare `CheckpointId`. The prune handler clamps any checkpoint-bearing cursor that leads that tip down to it, always landing on a block with a recorded id — no genesis-clamp, no mapping lookup. `handleChainFinalized` collapses to pruning block hashes below the lowest live tip. **3. `refactor(stdlib): drop proposedCheckpoint from the local L2 tips provider`** - `proposedCheckpoint` is degenerate in the local stores (always equal to `checkpointed`) and no consumer reads it — the one reader, `p2p_client`'s `maybeCallPrepareForSlot`, was a dead branch (always false). `L2TipsProvider.getL2Tips` now returns `LocalL2Tips = Omit<L2Tips, 'proposedCheckpoint'>`; the local stores stop maintaining the cursor; the dead p2p branch is removed. - `L2BlockSource.getL2Tips` is a separate interface and keeps the full `L2Tips` with `proposedCheckpoint`, which the sequencer and node still read from the archiver. **4. `fix(p2p): bump p2p store schema version for the per-tip checkpoint id layout`** - An upgraded p2p store would keep its old tips with an empty per-cursor id map, making `getL2Tips` throw on every read with no way to self-heal — failing `P2PClient.start()` outright. Bumping the store schema version (3 → 4) resets the store on upgrade instead. **5. `fix(p2p): clamp the proven tip to the source proven tip on prune`** - `chain-pruned` carried only the checkpointed tip, so a prune that rolled back the proven chain clamped the local proven cursor onto the (higher) checkpointed tip, transiently reporting unproven blocks as proven until the corrective `chain-proven` event landed at the end of the same sync iteration. The event now also carries `proven: L2TipId` and each cursor clamps to its own source tip. **6. `refactor(world-state): stop fabricating checkpoint ids in the world-state tips provider`** - The stream's local-data-provider contract demanded full checkpoint-bearing tips, forcing world-state to hardcode genesis checkpoint ids and a `checkpointed` tip at block zero (violating `finalized ≤ checkpointed`) that nothing ever read. The provider contract is narrowed to `LocalChainTips` — the tips the stream actually consumes, with `checkpointed` required only when emitting checkpoint events — and the stream fails loudly if checkpoint emission is enabled without one. World-state now reports only the proposed/proven/finalized blocks it genuinely tracks; `L2TipsProvider` keeps the full `LocalL2Tips` shape for the p2p/pxe tips stores. **7. `fix(prover-node): consume the reshaped chain-pruned and checkpoint-bearing events`** - The CheckpointStore redesign landed a prover-node consumer of `chain-pruned` (written against the old event shape) on the merge train while this branch was in flight. The prune handler now reads `event.checkpointed.checkpoint` (same semantics as the old field) and the test event constructors are updated to the new shapes. **8. `refactor(world-state): report unresolvable tip hashes as undefined instead of fabricating them`** - World-state's `getL2Tips` fabricated values for hashes it could not resolve: an empty string for the proven/finalized tips and a non-null assertion for the proposed tip. The honest missing case is real (a proven tip ahead of the synced range has no archive leaf to resolve from), so `LocalChainTips` now carries block ids with an optional hash and world-state returns resolved hashes as-is. Local tips stores are unaffected (`LocalL2Tips` with required hashes remains assignable), and the stream reads only block numbers from local tips. ## Breaking / operational notes - `PXE_DATA_SCHEMA_VERSION` bumped: existing PXE DBs are wiped and resync on first open. - p2p store schema version bumped: existing p2p data dirs (including the tx pool) are wiped and resync on upgrade. - `L2BlockStreamEvent` shape changes (internal API): `chain-proven`/`chain-finalized` gain `checkpoint`; `chain-pruned` carries `checkpointed`/`proven` tips instead of a bare `checkpoint`. ## Deferred `maybeCallPrepareForSlot`'s target-slot preparation (prepare the target slot when a proposed checkpoint exists) never worked, because the local store cannot represent a proposed checkpoint ahead of the checkpointed tip. This is handled in #23978.

spalladino force-pushed the spl/event-driven-tx-protection branch 2 times, most recently from 22b27d3 to d061358 Compare June 10, 2026 01:05

spalladino changed the base branch from spl/fix-tips-checkpoint-ids-fail-loud to merge-train/spartan-v5 June 10, 2026 01:05

spalladino force-pushed the spl/event-driven-tx-protection branch from d061358 to 1405dbf Compare June 10, 2026 15:24

spalladino mentioned this pull request Jun 10, 2026

fix(p2p)!: resolve checkpoint tips from stored ids #23968

Merged

PhilWindle approved these changes Jun 10, 2026

View reviewed changes

PhilWindle merged commit 3867206 into merge-train/spartan-v5 Jun 10, 2026
18 checks passed

PhilWindle deleted the spl/event-driven-tx-protection branch June 10, 2026 16:05

AztecBot mentioned this pull request Jun 10, 2026

feat: merge-train/spartan-v5 #23975

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(p2p): drive tx protection release from synced blocks instead of wall clock#23978

fix(p2p): drive tx protection release from synced blocks instead of wall clock#23978
PhilWindle merged 1 commit into
merge-train/spartan-v5from
spl/event-driven-tx-protection

spalladino commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

spalladino commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change

Trade-off accepted

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spalladino commented Jun 10, 2026 •

edited

Loading