Skip to content

Predictable ledger state snapshots#6526

Open
geo2a wants to merge 6 commits into
masterfrom
geo2a/predictable-snapshots
Open

Predictable ledger state snapshots#6526
geo2a wants to merge 6 commits into
masterfrom
geo2a/predictable-snapshots

Conversation

@geo2a

@geo2a geo2a commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Description

This PR brings in the Consensus feature of predictable ledger state snapshots:

  • snapshots will be taken by all nodes at the same deterministic slots numbers, rather then depending on a node's start time.
  • to avoid the thundering herd effect, when all nodes take the snapshot at the same time and stop the network, every node will introduce a randomised time delay before taking a snapshot.

Changes to cardano-node configuration

The LedgerDB section of the config.yaml file is re-worked to have the following parameters:

LedgerDB:
  # remains as-is
  Backend: V2InMemory
  Snapshots:

    # start taking the snaphots at slot 172800, after Byron
    SlotOffset: 172800

    # take snapshots every 432000 slots, at the end of every Shelley epoch
    SnapshotInterval: 432000

    # A minimum duration between snapshots, in seconds (used to avoid excessive snapshots while syncing).
    RateLimit: 0 # default is 10 minutes

    # randomised snapshot delay range, in seconds.
    # Both Min and Max need to be specified, otherwise the default delay of (5min, 10min) will be used
    MinDelay: 60
    MaxDelay: 120

Alternatively, instead of an object with individual options, Snapshots can name a predefined
snapshot policy, which is then used as a whole:

LedgerDB:
  Backend: V2InMemory
  Snapshots: Mithril

Currently the only named policy is Mithril, intended for nodes producing snapshots for Mithril.

Note that the snapshot-related options are now grouped under "LedgerDB->Snapshot". The legacy format is still supported, but it has slightly changed it's meaning: the SnapshotInterval parameter is interpreted as slots, rather than seconds.

New tracing events

  • SnapshotRequestDelayed snapshotRequestTime delayBeforeSnapshotting slots --- traces the fact that a snapshot was requested for slots, but the request will be executed after delayBeforeSnapshotting.
  • SnapshotRequestCompleted signifies the completion of a delayed snapshot request.

Manual Testing

This feature is a little tricky to test automatically, and I have not found any end-to-end tests for the ledger state snapshot functionality. I've done some manual testing by analysing the logs. This process could be automated using cardano-testnet, but I'm afraid that the test could be flaky and very verbose.

To test the feature, I've ran started a sync with mainnet and used Claude Code to grep the logs and construct a table that verifies that the snapshots are indeed taken at the announces slots after the expected delay:

# SnapshotRequestDelayed Scheduled Slots TookSnapshot Taken Slot Reported Delay (s) Actual Delta (s)
1 2026-04-10 12:21:49.5021 4492799 2026-04-10 12:23:02.5057 4492799 73 73.0036
2 2026-04-10 12:23:03.3550 4924780 2026-04-10 12:24:07.3566 4924780 64 64.0016
3 2026-04-10 12:24:08.6039 5356780, 5788780 2026-04-10 12:25:56.6049 5356780 108 108.0010
2026-04-10 12:25:58.6811 5788780 110.0772
4 2026-04-10 12:26:00.0986 6220777, 6652775, 7084774 2026-04-10 12:27:13.1009 6220777 73 73.0023
2026-04-10 12:27:15.2028 6652775 75.1042
2026-04-10 12:27:16.2926 7084774 76.1940
5 2026-04-10 12:27:17.9178 7516773, 7948772, 8380772 2026-04-10 12:28:40.9199 7516773 83 83.0021
2026-04-10 12:28:42.1981 7948772 84.2803
2026-04-10 12:28:43.6356 8380772 85.7178

The relevant fragment of the log file is attached:

Details [2026-04-10 12:21:49.5021Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 4492799] , with a randomised delay of 73s [2026-04-10 12:23:02.5057Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 4492799, dsSuffix = Nothing} at f8084c61b6a238acec985b59310b6ecec49c0ab8352249afd7268da5cff2a457 at slot 4492799 [2026-04-10 12:23:03.2456Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 4492799, dsSuffix = Nothing} at f8084c61b6a238acec985b59310b6ecec49c0ab8352249afd7268da5cff2a457 at slot 4492799 , duration: 0.739851377s [2026-04-10 12:23:03.2530Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:23:03.3550Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 4924780] , with a randomised delay of 64s [2026-04-10 12:24:07.3566Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 4924780, dsSuffix = Nothing} at a0805ae8e52318f0e499be7f85d3f1d5c7dddeacdca0dab9e9d9a8ae6c49a22c at slot 4924780 [2026-04-10 12:24:08.2799Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 4924780, dsSuffix = Nothing} at a0805ae8e52318f0e499be7f85d3f1d5c7dddeacdca0dab9e9d9a8ae6c49a22c at slot 4924780 , duration: 0.923355707s [2026-04-10 12:24:08.2872Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:24:08.6039Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 5356780,SlotNo 5788780] , with a randomised delay of 108s [2026-04-10 12:25:56.6049Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 5356780, dsSuffix = Nothing} at 4ddf277b3aff32931843da9f7900f5ef2fffed15b124891c485be4b3a06fca72 at slot 5356780 [2026-04-10 12:25:58.6810Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 5356780, dsSuffix = Nothing} at 4ddf277b3aff32931843da9f7900f5ef2fffed15b124891c485be4b3a06fca72 at slot 5356780 , duration: 2.076082693s [2026-04-10 12:25:58.6811Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 5788780, dsSuffix = Nothing} at 9e6fc811d9b09f7c8c6d7a23dc8b3360a9c4a3930ba640ce107e944d5e2750e2 at slot 5788780 [2026-04-10 12:25:59.7608Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 5788780, dsSuffix = Nothing} at 9e6fc811d9b09f7c8c6d7a23dc8b3360a9c4a3930ba640ce107e944d5e2750e2 at slot 5788780 , duration: 1.079557458s [2026-04-10 12:25:59.7861Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:26:00.0986Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 6220777,SlotNo 6652775,SlotNo 7084774] , with a randomised delay of 73s [2026-04-10 12:27:13.1009Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 6220777, dsSuffix = Nothing} at bc98eda36819d00f424e63aeb4eb43950bd5eacf37f2c35a2b8f807aa68cd895 at slot 6220777 [2026-04-10 12:27:15.2022Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 6220777, dsSuffix = Nothing} at bc98eda36819d00f424e63aeb4eb43950bd5eacf37f2c35a2b8f807aa68cd895 at slot 6220777 , duration: 2.101244366s [2026-04-10 12:27:15.2028Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 6652775, dsSuffix = Nothing} at 6707ef3c2e885c25d5081a1aa0dd03e81492e21c5955208f23eee3d92ae28f9f at slot 6652775 [2026-04-10 12:27:16.2922Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 6652775, dsSuffix = Nothing} at 6707ef3c2e885c25d5081a1aa0dd03e81492e21c5955208f23eee3d92ae28f9f at slot 6652775 , duration: 1.089450966s [2026-04-10 12:27:16.2926Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 7084774, dsSuffix = Nothing} at 057c01d0a0f0b6c554589ac5baf6b72b63cd22b2d668ee86f7421199eab1c46c at slot 7084774 [2026-04-10 12:27:17.4298Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 7084774, dsSuffix = Nothing} at 057c01d0a0f0b6c554589ac5baf6b72b63cd22b2d668ee86f7421199eab1c46c at slot 7084774 , duration: 1.137220622s [2026-04-10 12:27:17.4573Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:27:17.9178Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 7516773,SlotNo 7948772,SlotNo 8380772] , with a randomised delay of 83s [2026-04-10 12:28:40.9199Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 7516773, dsSuffix = Nothing} at cd0dad9ea278cc82d9c3dbefa1769ddbfb9358dc800e4a70a4cc1e671489c493 at slot 7516773 [2026-04-10 12:28:42.1979Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 7516773, dsSuffix = Nothing} at cd0dad9ea278cc82d9c3dbefa1769ddbfb9358dc800e4a70a4cc1e671489c493 at slot 7516773 , duration: 1.277999016s [2026-04-10 12:28:42.1981Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 7948772, dsSuffix = Nothing} at cff7c23b9f62ad48a2436b2270a10bb9286999a721f1da3bde35f6f1579d1464 at slot 7948772 [2026-04-10 12:28:43.6354Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 7948772, dsSuffix = Nothing} at cff7c23b9f62ad48a2436b2270a10bb9286999a721f1da3bde35f6f1579d1464 at slot 7948772 , duration: 1.43729202s [2026-04-10 12:28:43.6356Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 8380772, dsSuffix = Nothing} at 47fef957a7152647dacbcff13242b3ef3c416930e23cd55722c36c1fd126c721 at slot 8380772 [2026-04-10 12:28:45.5275Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 8380772, dsSuffix = Nothing} at 47fef957a7152647dacbcff13242b3ef3c416930e23cd55722c36c1fd126c721 at slot 8380772 , duration: 1.891909794s [2026-04-10 12:28:45.5622Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot
# Checklist
  • Commit sequence broadly makes sense and commits have useful messages
  • New tests are added if needed and existing tests are updated. These may include:
    • golden tests
    • property tests
    • roundtrip tests
    • integration tests
      See Running tests for more details
  • Any changes are noted in the CHANGELOG.md for affected package
  • The version bounds in .cabal files are updated
  • CI passes. See note on CI. The following CI checks are required:
    • Code is linted with hlint. See .github/workflows/check-hlint.yml to get the hlint version
    • Code is formatted with stylish-haskell. See .github/workflows/stylish-haskell.yml to get the stylish-haskell version
    • Code builds on Linux, MacOS and Windows for ghc-9.6 and ghc-9.12
  • Self-reviewed the diff

Note on CI

If your PR is from a fork, the necessary CI jobs won't trigger automatically for security reasons.
You will need to get someone with write privileges. Please contact IOG node developers to do this
for you.

@geo2a geo2a self-assigned this Apr 13, 2026
@geo2a geo2a changed the title Predictable snapshots ledger state snapshots Predictable ledger state snapshots Apr 14, 2026
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch 6 times, most recently from c9b2597 to 0062d94 Compare April 15, 2026 13:09
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch from 0062d94 to 0379807 Compare April 22, 2026 07:40
@geo2a geo2a moved this to 👀 In review in Consensus Team Backlog Apr 28, 2026
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch 2 times, most recently from db260ed to 8e9fbae Compare April 28, 2026 15:09
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch from 8e9fbae to 7bc0d2b Compare May 28, 2026 10:53
@geo2a geo2a marked this pull request as ready for review May 28, 2026 10:56
@geo2a geo2a requested review from a team as code owners May 28, 2026 10:56
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch 3 times, most recently from fc301a4 to 7446f7b Compare June 10, 2026 13:05
Comment thread cardano-node/src/Cardano/Node/Tracing/Tracers/ChainDB.hs Outdated
Comment thread cardano-node/src/Cardano/Node/Tracing/Tracers/ChainDB.hs Outdated
Comment thread cardano-node/src/Cardano/Node/Tracing/Tracers/ChainDB.hs Outdated
Comment thread cardano-node/src/Cardano/Node/Tracing/Tracers/ChainDB.hs
Comment thread cardano-node/src/Cardano/Node/Tracing/Tracers/ChainDB.hs
@mgmeier mgmeier requested a review from jutaro June 10, 2026 16:35
@mgmeier mgmeier force-pushed the geo2a/predictable-snapshots branch from 4abaae4 to d8493e8 Compare June 10, 2026 16:37
Comment thread cardano-node/src/Cardano/Node/Tracing/Tracers/ChainDB.hs Outdated
Comment thread cardano-node/src/Cardano/Node/Tracing/Tracers/ChainDB.hs
Comment thread cardano-node/src/Cardano/Node/Configuration/POM.hs Outdated
Comment thread configuration/cardano/mainnet-config.yaml Outdated
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch from 2c593d8 to b7844a7 Compare June 11, 2026 15:34
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch from b7844a7 to 5d86d7c Compare June 12, 2026 07:09

@mgmeier mgmeier left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 👀 In review

Development

Successfully merging this pull request may close these issues.

4 participants