fix: prevent second batch store reset wiping postage snapshot on --resync by martinconic · Pull Request #5499 · ethersphere/bee

martinconic · 2026-06-10T15:53:17Z

Checklist

I have read the coding guide.
My change requires a documentation update, and I have done it.
I have added tests to cover my changes.
I have filled out the description and linked the related issues.

Description

Fixes #5495: a node started with --resync while the postage snapshot is active shuts down during sync with:

get: get batch <id>: storage: not found

Root cause

The batch store is reset twice on startup in this configuration:

snapshotBatchSvc.Start() resets the store (resync) and rebuilds it from the snapshot, then sets postageSyncStart = maxBlockHeight.
The live batchSvc was constructed with the same o.Resync flag, so its Start() reset the store again, discarding the snapshot that was just loaded.

Live sync then started from the snapshot block height against an empty store, so the first top-up/dilute event referencing a batch created before the snapshot height failed with storage: not found, and the node shut down.

This matches the report exactly: two "resync requested, resetting batch store" log lines, and the fact that skip-postage-snapshot works around it (only one reset, and sync replays from contract genesis where every create precedes its top-up).

Note: this is independent of #5343/#5482. The flow originates in #5094 and is still present on master; the reporter's build (2.8.0-41d6efc6) predates the #5343 revert, but reverting #5343 does not address this.

Fix

The two reset sites are not redundant — one rebuilds from the snapshot, the other is the fallback rebuild for the no-snapshot / snapshot-failed / non-mainnet cases — they were just both armed at once.

Construct the live batch service after the snapshot path has run and pass resync only when the snapshot did not rebuild the store:

batchSvc, err = batchservice.New(..., o.Resync && !snapshotLoaded)

The reset logic stays in its single existing place (batchservice.Start()), exactly one service is armed to reset, and there is no runtime toggle. The store is still reset exactly once in every other case (snapshot skipped, failed, not applicable, or a non-resync start).

Testing

TestResyncControlsReset in batchservice asserts both directions of the contract node.go relies on:
- resync=true ⇒ store reset once
- resync=false ⇒ store not reset (the post-snapshot live-service case)
go test -race ./pkg/postage/batchservice/... passes
go build ./pkg/node/... passes; go vet, gofumpt, and golangci-lint clean

Open API Spec Version Changes (if applicable)

None.

Related Issue

Closes #5495

AI Disclosure

This PR contains code that has been generated by an LLM.
I have reviewed the AI generated code thoroughly.
I possess the technical expertise to responsibly review the code generated in this PR.

acud · 2026-06-10T18:26:01Z

+// postage snapshot has already reset and rebuilt the batch store, so the live
+// sync continues from the snapshot instead of wiping it (see issue #5495).
+func (svc *batchService) SkipResync() {
+	svc.resync = false


to me the fact that there's two places where the batch service could get reset already smells a bit. with this change we have still two places where a reset happens + third path to disable one of them potentially. so the question is asked - why have two places that do this in the first place?

The two reset sites aren't pure duplication. They serve different roles:

snapshot service → resets + rebuilds from the snapshot

live service → resets + rebuilds from chain genesis — this is the fallback path, needed when there's no snapshot
So you can't simply delete one site without merging the two services (they use different listeners — snapshot filterer vs live chain backend).

I changed the implementation to not modify at all batchservice.go

…sync When a node is started with --resync together with the postage snapshot, the batch store was reset twice. The snapshot batch service resets and rebuilds the store from the snapshot, but the live-chain batch service was constructed with the same resync flag and reset the store a second time in its Start(), discarding the freshly loaded snapshot. Live sync then began at the snapshot block height against an empty store, so the first top-up or dilute event for any batch created before the snapshot height failed with "get batch <id>: storage: not found" and the node shut down. This is why disabling the snapshot (skip-postage-snapshot) worked around it: the store was reset only once and sync replayed from contract genesis, where every create precedes its top-up. Construct the live batch service after the snapshot path has run and pass resync only when the snapshot did not rebuild the store (o.Resync && !snapshotLoaded). The store is still reset exactly once in every other case (snapshot skipped, failed, not applicable, or a non-resync start). Closes #5495

gacevicljubisa · 2026-06-12T12:20:11Z

 		}
 	)

+	snapshotLoaded := false


Suggestion: Instead of having this guard, we could expose batchservice.Reset(stateStore, batchStore) and just call it before start in case if --resync is set to true

Additional solution could be to modify the Start interface:

Start(ctx context.Context, listener Listener, startBlock uint64) error

where we would pass snapshotEventListener and eventListener respectfully, and then when resync is done, we could set it to false.

But this requires interface changes.

@acud wdyt?

I am not sure exposing Reset is needed. From my perspective consider this:

if you want to reset (the flag was set) - you start the batch service the snapshot. If you don't want to reset - snapshot it nil.

inside the constructor, it detects - if the snapshot is not nil - reset, replay events, then continue startup normally, returning the instance.

If it is really needed to be able to reset without a snapshot, i.e. force a reset that reads directly from the blockchain, pass both a flag and the snapshot (nullable). Not sure I see the need for external callers to be able to reset the state of that component in that way.

martinconic requested review from acud, akrem-chabchoub, gacevicljubisa, janos and sbackend123 June 10, 2026 15:56

martinconic changed the title ~~fix: prevent second batch store reset wiping postage snapshot on --re…~~ fix: prevent second batch store reset wiping postage snapshot on --resync Jun 10, 2026

martinconic self-assigned this Jun 10, 2026

martinconic added this to the 2026 milestone Jun 10, 2026

acud reviewed Jun 10, 2026

View reviewed changes

martinconic force-pushed the fix/resync-snapshot-double-reset branch from c26bb91 to a398078 Compare June 11, 2026 07:11

martinconic force-pushed the fix/resync-snapshot-double-reset branch from a398078 to 69df1aa Compare June 11, 2026 07:16

martinconic requested a review from acud June 11, 2026 07:19

gacevicljubisa reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent second batch store reset wiping postage snapshot on --resync#5499

fix: prevent second batch store reset wiping postage snapshot on --resync#5499
martinconic wants to merge 1 commit into
masterfrom
fix/resync-snapshot-double-reset

martinconic commented Jun 10, 2026 •

edited

Loading

Uh oh!

acud Jun 10, 2026

Uh oh!

martinconic Jun 11, 2026

Uh oh!

gacevicljubisa Jun 12, 2026

Uh oh!

gacevicljubisa Jun 12, 2026

Uh oh!

gacevicljubisa Jun 12, 2026

Uh oh!

acud Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

martinconic commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Description

Root cause

Fix

Testing

Open API Spec Version Changes (if applicable)

Related Issue

AI Disclosure

Uh oh!

acud Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

martinconic Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

gacevicljubisa Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gacevicljubisa Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gacevicljubisa Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

acud Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

martinconic commented Jun 10, 2026 •

edited

Loading