Skip to content

compact: copy-forward packs, retire Repository.delete#9777

Draft
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:pack-files-step7-compact-copy-forward
Draft

compact: copy-forward packs, retire Repository.delete#9777
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:pack-files-step7-compact-copy-forward

Conversation

@mr-raj12

Copy link
Copy Markdown
Contributor

Description

Reclaiming space in a borg2 repo used to go through Repository.delete(id), which removes one object at a time. That works today because every pack holds exactly one object (N=1): the pack id equals the chunk id, so deleting the object and deleting the pack are the same thing. It stops working once a pack bundles several objects (N>1), because you cannot cut a single object out of an append-only pack file.

This moves compaction to pack granularity:

  • A pack with nothing referenced anymore is dropped whole.
  • A pack with everything still referenced is left alone.
  • A mixed pack gets rewritten: surviving objects are copied into a fresh pack, the index is repointed, and the old pack is dropped.

The mixed-pack case can only show up once a pack holds more than one object, so it sits behind the max_count guard and is unreachable at N=1. Rather than ship that branch untested, there is a test that forces max_count > 1 to build a real mixed pack and run copy-forward on it, so the design is proven now instead of promised for later.

Repository.delete() is gone. In its place is Repository.delete_pack(pack_id). The two callers that still worked at object level now drop by pack id: check --repair when it hits a defective object, and debug delete-obj.

The crash-safety fix from #9748 still holds: cached chunk indexes are invalidated before the first store change, so an interrupted compact stays on the safe side and the next client rebuilds from what is actually in the repo.

Changes:

  • repository.py: replace delete(id) with delete_pack(pack_id); add a pack_max_count property so compact can size its writer.
  • compact_cmd.py: group surviving objects by pack id, drop the all-unused packs, and copy-forward the mixed ones with a PackWriter bound to the GC chunk index.
  • archive.py: check --repair drops the bad object's pack by pack id.
  • debug_cmd.py: delete-obj routes through delete_pack.
  • tests: test_compact_drops_whole_unused_packs covers the N=1 whole-pack drop, test_compact_copy_forward_mixed_pack forces max_count to build and compact a mixed pack, and the direct delete() callers in the suite switch to delete_pack().

refs #8572

Checklist

  • PR is against master (or maintenance branch if only applicable there)
  • New code has tests and docs where appropriate
  • Tests pass (run tox or the relevant test subset)
  • Commit messages are clean and reference related issues

…p#8572

Reclaim space at pack granularity: drop fully-unused packs whole and rewrite
mixed packs by copying referenced objects forward (N>1 path, max_count-guarded).
check --repair and debug delete-obj now drop by pack_id; Repository.delete() is gone.
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.90%. Comparing base (d5b3ffa) to head (2175c9f).
⚠️ Report is 23 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9777      +/-   ##
==========================================
+ Coverage   84.87%   84.90%   +0.03%     
==========================================
  Files          92       92              
  Lines       15167    15194      +27     
  Branches     2271     2275       +4     
==========================================
+ Hits        12873    12901      +28     
+ Misses       1590     1589       -1     
  Partials      704      704              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Comment thread src/borg/archive.py
pack_id = self.chunks[defect_chunk].pack_id
del self.chunks[defect_chunk]
self.repository.delete(defect_chunk)
self.repository.delete_pack(pack_id)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to get rid of the N=1 limitation, so I suggest you don't add code that makes no sense and is even very harmful when N!=1.

def compact_packs(self):
"""Reclaim space at the granularity the store supports: whole pack files.

The store is append-only and content-addressed, so we never delete a single object in

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append-only is a term from the borg 1.x past, don't use it if it makes no sense anymore, that just adds confusion.

at the store level, full packs are just added usually. removed only by special borg operations like compact or check.

there is also no ordering anymore as in 1.x segment files 0..N, so when looking at the whole store, one maybe can say add-only, but append-only implies an "end" of the store that does not exist anymore.

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs optimisation or RAM usage will explode.

Comment on lines +217 to +218
for id, entry in self.chunks.iteritems():
bucket = packs.setdefault(entry.pack_id, {"used": [], "unused": []})

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never do whole-chunkindex transformations, building new datastructures that contain most of the information from the chunks index. The chunks index is a highly optimized datastructure and even then, it can be many gigabytes in size. If you are transforming that into another arrangement using unoptimized python datastructures, you will likely use too much RAM.

Comment on lines +225 to +226
packs_to_drop = [pid for pid, b in packs.items() if b["unused"] and not b["used"]]
packs_to_rewrite = [pid for pid, b in packs.items() if b["unused"] and b["used"]]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the right idea, just needs a more efficient implementation.

del self.chunks[id]
pi.finish()

def _copy_forward_pack(self, pack_id, used_ids):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually this is named "compaction".

Comment on lines +274 to +280
writer = PackWriter(store, max_count=self.repository.pack_max_count, chunks=self.chunks)
for id in used_ids:
entry = self.chunks[id]
key = "packs/" + bin_to_hex(entry.pack_id)
cdata = store.load(key, offset=entry.obj_offset, size=entry.obj_size)
writer.add(id, cdata)
writer.flush()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use borgstore.Store.defrag method, that is way more efficient.

Comment thread src/borg/repository.py

def delete(self, id, wait=True):
"""delete a repo object
def delete_pack(self, pack_id, wait=True):

@ThomasWaldmann ThomasWaldmann Jun 15, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess you could just use .store_delete at the few places that need this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants