Skip to content

feat: merge-train/spartan-v5#23975

Open
AztecBot wants to merge 8 commits into
v5-nextfrom
merge-train/spartan-v5
Open

feat: merge-train/spartan-v5#23975
AztecBot wants to merge 8 commits into
v5-nextfrom
merge-train/spartan-v5

Conversation

@AztecBot

@AztecBot AztecBot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

BEGIN_COMMIT_OVERRIDE
fix(p2p): stop checkpoint-replay storm when pruning to an uncheckpointed block (#23967)
refactor(sequencer)!: always enforce timetable with concrete block duration (#23821)
fix(e2e): drop removed enforceTimeTable option from optimistic proving test (#23976)
feat: persist peer bans for a configurable duration (A-1157) (#23922)
refactor!: rename node JSON-RPC to aztec_* prefixes (#23909)
fix(p2p): drive tx protection release from synced blocks instead of wall clock (#23978)
END_COMMIT_OVERRIDE

…ted block (#23967)

## Motivation

On a staging HA validator, an archiver orphan prune triggered a storm of
thousands of duplicate `chain-checkpointed` events out of p2p's
`L2BlockStream`.

The local tips store keeps a block number per cursor and derives the
checkpoint number from a `block -> checkpoint` map that is only
populated for the last block of each confirmed checkpoint.
`handleChainPruned` moved the `checkpointed` and `proposedCheckpoint`
cursors to the prune target unconditionally. That target is the new tip
of the *proposed* chain and can be an uncheckpointed block with no
mapping (in the incident, a block belonging to a not-yet-confirmed
checkpoint, sitting ahead of the checkpointed tip). `getCheckpointId`
then resolved that cursor to checkpoint zero, the stream computed
`nextCheckpointToEmit = 0 + 1 = 1`, and it replayed every checkpoint
from 1 up to the source tip.

## Approach

A prune is a rollback, so checkpoint-bearing cursors may only move
*backward*. `handleChainPruned` now sets `proposed` unconditionally and
clamps `checkpointed`/`proposedCheckpoint`/`proven` to the prune target
only when they are ahead of it (generalizing the guard `proven` already
had). In the incident the checkpointed cursor is left untouched and
keeps resolving to its real checkpoint, so there is nothing to replay.

Surfacing a missing mapping *loudly* (rather than silently reporting
checkpoint zero) is intentionally deferred to a stacked follow-up: doing
it safely requires per-tip checkpoint ids so the store can fail loudly
on genuine corruption without bricking legitimate skipped-history prunes
(which would otherwise throw on the next `getL2Tips`). This PR is the
minimal, behavior-preserving fix for the storm.

## Changes

- **stdlib**: `handleChainPruned` clamps checkpoint-bearing cursors
backward only instead of forcing them onto the (possibly uncheckpointed)
prune target.
- **stdlib (tests)**: store-level regression that a prune to an
uncheckpointed block ahead of the checkpointed tip leaves the cursor
intact and resolving to its real checkpoint; stream-level regression
asserting no `chain-checkpointed` replay after such a prune.

Fixes A-1167
…ration (#23821)

## Motivation

The production sequencer kept two legacy escape hatches:
`enforceTimeTable=false` (unbounded block building with no deadlines)
and `blockDurationMs=undefined` (single-block-per-slot mode). Both
existed only to satisfy tests and the sandbox, complicated the timetable
with dead branches, and let most e2e tests run under timing that
production never uses.

## Approach

The timetable now always enforces sub-slot deadlines with a concrete
`blockDurationMs` (required config, default 3000 ms). The only
non-enforced path left is the `AutomineSequencer`: the local sandbox
switches to it, which makes `AnvilTestWatcher` deletable — it was
already inert across the e2e suite since every e2e path runs anvil in
interval mining. The e2e PIPELINING preset flips to enforced real timing
at exactly 2 blocks per slot.

## Fee prediction changes

`sequencer-client/src/global_variable_builder/fee_provider.ts` now
treats the current L1 fee snapshot as part of the predicted-fee set
exposed by the node. `getPredictedMinFees()` returns the current minimum
fees first, followed by the future-slot predictions from the fee
predictor. This matters for local automine because a freshly mined
checkpoint can make the node current minimum fees higher than the
predictor's future samples; including the current value prevents clients
from quoting below what tx validation will accept.

`getCurrentMinFees()` also bypasses viem's cached block number by
calling `getBlockNumber({ cacheTime: 0 })`, so automine fee snapshots
observe newly mined L1 blocks immediately instead of reusing a stale L1
block number.

## Public simulation global variables

`aztec-node/src/aztec-node/server.ts` no longer calls
`buildGlobalVariables()` for `simulatePublicCalls`. Instead it computes
a simulation target slot from local chain state and calls
`buildCheckpointGlobalVariables()` with that slot, then combines those
checkpoint globals with the requested simulation block number.

The target slot is the max of:

- the slot corresponding to the next L1 timestamp from the epoch cache,
- the slot after the proposed checkpoint loaded from the block source,
- the latest proposed block slot when it is ahead of the proposed
checkpoint.

This keeps public simulation aligned with the same checkpoint-global
construction used for block building, while avoiding a rollup-contract
lookup on the simulation path.

## Automine sequencer: proving and recovery

The `AutomineSequencer` now drives epoch proving for the sandbox without
a prover. `aztecNode.prove(upToCheckpoint?)` synthetically settles each
checkpointed epoch — computing its out-hash, writing the outbox root and
proven checkpoint to L1 via cheat codes, then calling `markAsProven` —
with partial-epoch support (it can settle a prefix up to a requested
checkpoint). Because that settlement mines no L1 block, it then mines
one empty block so the archiver (which short-circuits its L1 sync while
the block hash is unchanged) observes the new proven tip immediately,
mirroring a real epoch proof landing an L1 verify tx. An optional
auto-settle loop (`AUTOMINE_ENABLE_PROVE_EPOCH`, on by default for the
local network) proves epochs as they close, replacing the standalone
`EpochTestSettler` that used to race the build loop.

On a wrong-slot or failed publish the sequencer returns the failed
block's txs to the pool and retries the build rather than reorging L1,
using a new `archiver.discardProposedCheckpointsAfter` to drop
proposed-but-uncheckpointed blocks during recovery.

## Changes

- **stdlib**: `blockDuration` required in the checkpoint timing model,
single-block branches removed; `DEFAULT_BLOCK_DURATION_MS = 3000` as
single source of truth.
- **sequencer-client**: `SequencerTimetable` loses the `enforce` field;
`canStartNextBlock` always returns a concrete deadline; config drops
`enforceTimeTable`.
- **automine sequencer**: `prove(upToCheckpoint?)` synthetically settles
epochs (partial-epoch capable) and an optional auto-settle loop
(`AUTOMINE_ENABLE_PROVE_EPOCH`) advances the proven tip, mining an empty
L1 block so the archiver observes it; failed/wrong-slot publishes return
txs to the pool and retry instead of reorging L1 (new
`archiver.discardProposedCheckpointsAfter`).
- **fees**: node fee predictions now include the current minimum fees as
the first entry before future-slot predictions, and current fee
snapshots bypass cached L1 block numbers so local automine fee quotes
see freshly mined checkpoints.
- **p2p**: `blockDurationMs` required in proposal/attestation
validators, the pipelining window, and gossipsub topic scoring.
- **foundation / aztec-node / validator-client**:
`SEQ_ENFORCE_TIME_TABLE` env var removed; dead `blockDurationMs ===
undefined` branches simplified.
- **aztec (sandbox)**: local network runs the `AutomineSequencer` by
default, including p2p-enabled local runs; local-network is not a mode
for connecting to an existing Aztec network. `AnvilTestWatcher` deleted,
and the standalone `EpochTestSettler` is replaced by the
AutomineSequencer auto-settle loop.
- **end-to-end (tests)**: PIPELINING preset sets `blockDurationMs: 3000`
(2 blocks/slot); ~30 `enforceTimeTable` call sites removed; watcher
manual-proving call sites replaced with
`cheatCodes.rollup.markAsProven()`; bench given explicit slot headroom;
block-building regression test fixed for a min-txs remainder livelock
that enforced deadlines exposed.
- **docs**: sequencer-client and gossipsub READMEs updated to the
always-enforced model; sandbox/local-network docs updated to describe
automine block production and the removal of `SEQ_ENFORCE_TIME_TABLE`
for v5.

Breaking: `SEQ_ENFORCE_TIME_TABLE` is removed and
`SEQ_BLOCK_DURATION_MS` now defaults to 3000 ms (previously unset,
meaning single block per slot). The `SEQ_ENFORCE_TIME_TABLE` wiring in
`spartan/` (deploy script, terraform, env files) is removed as well.

Fixes A-1148
@PhilWindle PhilWindle requested a review from a team as a code owner June 9, 2026 20:49
@PhilWindle PhilWindle enabled auto-merge June 9, 2026 20:50
…g test (#23976)

## Problem

CI on `merge-train/spartan-v5` (commit 609014a,
[log](http://ci.aztec-labs.com/1781038207169152)) failed in the
`yarn-project` build at the `yarn tsgo -b --emitDeclarationOnly` step:

```
end-to-end/src/e2e_epochs/epochs_optimistic_proving.parallel.test.ts(222,9): error TS2353:
  Object literal may only specify known properties, and 'enforceTimeTable'
  does not exist in type 'EpochsTestOpts'.
```
(also at lines 366, 473, 558, 646, 780)

## Root cause

PR #23821 (*always enforce timetable with concrete block duration*) made
timetable enforcement unconditional and removed the `enforceTimeTable`
option from `EpochsTestOpts`/`SetupOptions`, deleting ~30
`enforceTimeTable: true` call sites.
`epochs_optimistic_proving.parallel.test.ts` landed on the v5 line
separately and still passed `enforceTimeTable: true` at six sites, so it
no longer type-checks.

## Fix

- Remove the six now-invalid `enforceTimeTable: true` properties. Each
call site already sets a concrete `blockDurationMs: 8000`, so the change
is behavior-preserving — the same deletion the PR applied to every other
e2e test. Verified in CI: `yarn-project` now type-checks and
`epochs_optimistic_proving.parallel.test.ts` passes.
- Temporarily `it.skip` the HA test `should distribute work across
multiple HA nodes` in `composed/ha/e2e_ha_full.test.ts`, which fails
under the always-enforced timetable (sequencer misses slots:
`BlockOrCheckpointSlotExpiredError` / `no_blocks_built` / `Fork not
found`). Skipped at Santiago's request, to be re-enabled after the HA
block-building interaction with #23821 is fixed.
@PhilWindle PhilWindle added this pull request to the merge queue Jun 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 10, 2026
@PhilWindle PhilWindle enabled auto-merge June 10, 2026 07:23
@PhilWindle PhilWindle added this pull request to the merge queue Jun 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 10, 2026
@PhilWindle PhilWindle added this pull request to the merge queue Jun 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 10, 2026
PhilWindle and others added 3 commits June 10, 2026 13:29
Fixes A-1157. Addresses security advisory GHSA-h4vv-85x5-6hmh.

## Problem

Peer scores decay toward zero (~0.9/minute). A peer whose score crossed
the ban threshold (`MIN_SCORE_BEFORE_BAN = -100`) recovered to a healthy
score within approximately 1 hour.

## Fix

Record a ban when a peer's score drops below the ban threshold and hold
it for a configurable duration (default 24h). Bans are kept **in memory
only** and are cleared on restart — a restarted node re-learns bad peers
from their behaviour rather than carrying stale bans across runs.

- **`PeerScoring`** records `{ score, expiry }` in an in-memory
`bannedPeers` map, so `getScore`/`getScoreState` stay **synchronous**
(required by the peer-manager hot paths, including a `.sort()`
comparator).
- While banned, `getScore` returns the **ban score** regardless of
decay, so a peer cannot recover its way out of the ban early — even
after `decayAllScores` cleans up the decayed live-score entry. Once the
ban expires it is lifted and the live (decayed) score takes over,
letting the peer recover.
- Expired bans are lifted lazily on the next score query
(`getActiveBanScore`) and swept proactively each heartbeat via
`pruneExpiredBans()`, so a banned peer that disconnects and is never
queried again does not linger in the map.

## Configuration

New `P2P_PEER_BAN_DURATION_SECONDS` (config field
`peerBanDurationSeconds`), default `86400` (24h). Registered in
`foundation` env vars and the P2P config mappings.

## Tests

`peer_scoring.test.ts` covers the full lifecycle, asserting both score
**values** and states:
- ban floor held through banned → recovered-live-score → expiry
transitions;
- `peerBanDurationSeconds` drives the window (60s case);
- the advisory regression: after decay cleans up the live-score entry,
`getScore` still returns the `-150` ban score (not `0`), keeping the
peer Banned;
- a peer whose previous ban has expired can be re-banned;
- `pruneExpiredBans` removes expired bans but keeps active ones.

Existing `peer_manager` and `peer_scoring` suites pass; the previously
existing "returns to Healthy after improving score" assertion was
updated to reflect the new intended behaviour (a banned peer stays
banned for the full window).
## Summary
- Rename JSON-RPC namespaces from `node_*` / `nodeAdmin_*` /
`nodeDebug_*` to `aztec_*`, `aztecAdmin_*`, and `aztecDebug_*` on both
client schemas and `aztec start` server registration.
- Stop mounting the standalone `p2p_*` namespace; add `getPeers` and
`getCheckpointAttestationsForSlot` to `AztecNode` and delegate from the
node server.
- Update node API reference generation, regenerated operator docs, e2e
forward-compatibility config, and a migration note under TBD.

Fixes
[A-1010](https://linear.app/aztec-labs/issue/A-1010/archiver-flag-silently-ignored-when-combined-with-node-archiver-rpc)
@spypsy spypsy requested a review from Thunkar as a code owner June 10, 2026 15:39
…all clock (#23978)

The prepare-for-slot loop in p2p client was **not** synced with the
`L2BlockStream` events, meanining the `unprotect` call could trigger
before the blocks-added flagged the txs as mined.

One solution could've been to add a new event to the blockstream on
`slot-synced`, but it's easier to just remove the polling, and unprotect
slots when a block proposal that protected the txs fails. As a
safeguard, we still call unprotect based on slot numbers on mined
blocks.

## Problem

The tx pool **protects** txs referenced by an in-flight block proposal:
on gossip receipt, the proposal's txs are keyed to its slot and removed
from the pending indices, so the local builder cannot re-select them and
eviction cannot drop them while the proposal may still land.
`prepareForSlot(S)` releases protections from slots before `S`,
revalidates the txs, and returns them to pending.

Release was driven by a wall-clock slot monitor polling the epoch cache
every tick. Three problems:

- **Race against mined-marking.** The monitor can fire after a
proposal's checkpoint lands on L1 but before the block stream delivers
`blocks-added`. The just-landed txs are unprotected into pending, where
eviction or nullifier-conflict resolution can delete them; when the
block then syncs there is nothing left to mark mined, and after a later
reorg `handlePrunedBlocks` has nothing to restore — the tx is lost to
the pool.
- **Clock dependency.** The epoch cache is wall-clock derived and
explicitly depends on system clock sync; unprotection correctness should
depend on observed chain state instead.
- **Pipelining blind spot.** Gossiped proposals carry future target
slots during proposer pipelining, so wall-clock release frees them late.
(The old target-slot branch that tried to address this read
`proposedCheckpoint` from the local tips store, where it can never lead
the checkpointed tip — removed in #23968 as dead code.)

## Change

The protection lifecycle becomes fully event-driven and the slot monitor
is deleted:

- **Protect** on gossip receipt of a block proposal (unchanged).
- **Release on local validation failure**: a proposal that fails
validation immediately releases the protections it created — only
entries still keyed to that proposal's slot, so a tx also referenced by
a live proposal at another slot stays protected.
- **Resolve via chain events** (unchanged): `blocks-added` marks txs
mined, superseding protection; `chain-pruned` un-mines them back to
pending. Proposals that landed as proposed-but-unconfirmed checkpoints
and are later orphaned are fully handled by this existing lifecycle.
- **Collect silent deaths via synced block slots**: `prepareForSlot` now
runs inside the `blocks-added` handler with the slot of the last synced
block, after mined-marking. Any block landing at slot S releases
protections from all earlier slots — covering proposals that never
reached L1 at all (no quorum, proposer crash, dropped L1 tx), for which
no chain event can ever fire. Because it is ordered after mined-marking
in the same handler, the unprotect-before-mined race is impossible by
construction.
- **Proposers are unaffected**: the sequencer already calls
`prepareForSlot(targetSlot)` directly before building, which remains the
one legitimate ahead-of-chain preparation.

## Trade-off accepted

During a multi-slot stall with no blocks landing anywhere, non-proposer
pools retain protections until the first block lands (the wall clock
used to release them mid-stall). There is no user-visible cost — there
is no chain to include txs in during a stall — and the memory held is
bounded and self-healing on the first `blocks-added`. Proposers
self-serve via the direct sequencer call. Protections are in-memory
only, so a restart clears them.

Fixes A-1173
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants