Skip to content

index: move chunk index to index/ namespace, drop hash seed#9778

Merged
ThomasWaldmann merged 1 commit into
borgbackup:masterfrom
mr-raj12:chunks-index-storage
Jun 15, 2026
Merged

index: move chunk index to index/ namespace, drop hash seed#9778
ThomasWaldmann merged 1 commit into
borgbackup:masterfrom
mr-raj12:chunks-index-storage

Conversation

@mr-raj12

Copy link
Copy Markdown
Contributor

Description

The repo-side chunk index used to live under cache/chunks.<hash>, with the name computed as sha256(content + CHUNKINDEX_HASH_SEED). The seed was a manual invalidation knob: bump it and every old index file stops matching its own name, so clients drop it.

Two problems with that. First, the index is not really a cache. Rebuilding it means rescanning the whole repository, which is slow and expensive, so it is closer to permanent repository data than to a throwaway cache. Second, the seed meant the name was not the plain sha256 of the content, so borgstore could not verify it by hash the way it verifies every other object.

So this gives the chunk index its own index/ namespace and names each object by the plain sha256 of its content:

  • cache/chunks.<hash> becomes index/<hash>, and the whole name is the content hash.
  • The hash seed is gone. An index written by an incompatible borg version now gets rejected when it is read back: borghash's serialized format already starts with its own MAGIC and VERSION header and refuses to load a version it does not understand, so that header does the invalidation job the seed used to do.
  • index/ is registered as a real namespace, flat nesting like cache/, with write and delete permissions in the no-delete and write-only modes. A backup writes an incremental index fragment and compact merges and removes old fragments, so both need write and delete to work.

The write/read/delete_chunkindex_*_repo_cache helpers are renamed to *_repo to match.

Two things I left out on purpose:

  • No changelog entry.
  • No migration for existing beta repos. An upgraded repo finds an empty index/, rebuilds once from the repo, and leaves the old cache/chunks.* files sitting there as harmless orphans. Fine for beta with fresh test repos.

Changes:

  • cache.py: drop the seed, hash the content directly, store and read under index/<hash>, list the index/ namespace, rename the helpers.
  • repository.py: add the index/ namespace config and its permissions, update the call sites.
  • archive.py, compact_cmd.py: rename call sites and fix comments.
  • tests: point the storage paths at index/ and rename the helper calls.

closes #9758

Checklist

  • PR is against master (or maintenance branch if only applicable there)
  • New code has tests and docs where appropriate
  • Tests pass (run tox or the relevant test subset)
  • Commit messages are clean and reference related issues

@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.69767% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.87%. Comparing base (9ba7241) to head (eb30123).
⚠️ Report is 12 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/borg/cache.py 87.09% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9778      +/-   ##
==========================================
- Coverage   84.87%   84.87%   -0.01%     
==========================================
  Files          92       92              
  Lines       15165    15162       -3     
  Branches     2271     2270       -1     
==========================================
- Hits        12872    12868       -4     
- Misses       1589     1590       +1     
  Partials      704      704              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment fixes

Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
Name objects by pure sha256 so borgstore can verify them. closes borgbackup#9758
@mr-raj12 mr-raj12 force-pushed the chunks-index-storage branch from fce680e to eb30123 Compare June 15, 2026 17:34
@mr-raj12 mr-raj12 marked this pull request as ready for review June 15, 2026 17:34
@ThomasWaldmann ThomasWaldmann merged commit ebfe0a1 into borgbackup:master Jun 15, 2026
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

borg2: chunks index storage

2 participants