compact: copy-forward packs, retire Repository.delete by mr-raj12 · Pull Request #9777 · borgbackup/borg

mr-raj12 · 2026-06-15T05:56:49Z

Description

Reclaiming space in a borg2 repo used to go through Repository.delete(id), which removes one object at a time. That works today because every pack holds exactly one object (N=1): the pack id equals the chunk id, so deleting the object and deleting the pack are the same thing. It stops working once a pack bundles several objects (N>1), because you cannot cut a single object out of an append-only pack file.

This moves compaction to pack granularity:

A pack with nothing referenced anymore is dropped whole.
A pack with everything still referenced is left alone.
A mixed pack gets rewritten: surviving objects are copied into a fresh pack, the index is repointed, and the old pack is dropped.

The mixed-pack case can only show up once a pack holds more than one object, so it sits behind the max_count guard and is unreachable at N=1. Rather than ship that branch untested, there is a test that forces max_count > 1 to build a real mixed pack and run copy-forward on it, so the design is proven now instead of promised for later.

Repository.delete() is gone. In its place is Repository.delete_pack(pack_id). The two callers that still worked at object level now drop by pack id: check --repair when it hits a defective object, and debug delete-obj.

The crash-safety fix from #9748 still holds: cached chunk indexes are invalidated before the first store change, so an interrupted compact stays on the safe side and the next client rebuilds from what is actually in the repo.

Changes:

repository.py: replace delete(id) with delete_pack(pack_id); add a pack_max_count property so compact can size its writer.
compact_cmd.py: group surviving objects by pack id, drop the all-unused packs, and copy-forward the mixed ones with a PackWriter bound to the GC chunk index.
archive.py: check --repair drops the bad object's pack by pack id.
debug_cmd.py: delete-obj routes through delete_pack.
tests: test_compact_drops_whole_unused_packs covers the N=1 whole-pack drop, test_compact_copy_forward_mixed_pack forces max_count to build and compact a mixed pack, and the direct delete() callers in the suite switch to delete_pack().

refs #8572

Checklist

PR is against master (or maintenance branch if only applicable there)
New code has tests and docs where appropriate
Tests pass (run tox or the relevant test subset)
Commit messages are clean and reference related issues

…p#8572 Reclaim space at pack granularity: drop fully-unused packs whole and rewrite mixed packs by copying referenced objects forward (N>1 path, max_count-guarded). check --repair and debug delete-obj now drop by pack_id; Repository.delete() is gone.

codecov · 2026-06-15T06:19:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.90%. Comparing base (d5b3ffa) to head (2175c9f).
⚠️ Report is 23 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9777      +/-   ##
==========================================
+ Coverage   84.87%   84.90%   +0.03%     
==========================================
  Files          92       92              
  Lines       15167    15194      +27     
  Branches     2271     2275       +4     
==========================================
+ Hits        12873    12901      +28     
+ Misses       1590     1589       -1     
  Partials      704      704

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

ThomasWaldmann · 2026-06-15T07:19:17Z

+                        pack_id = self.chunks[defect_chunk].pack_id
                        del self.chunks[defect_chunk]
-                        self.repository.delete(defect_chunk)
+                        self.repository.delete_pack(pack_id)


The goal is to get rid of the N=1 limitation, so I suggest you don't add code that makes no sense and is even very harmful when N!=1.

ThomasWaldmann · 2026-06-15T07:25:50Z

+    def compact_packs(self):
+        """Reclaim space at the granularity the store supports: whole pack files.
+
+        The store is append-only and content-addressed, so we never delete a single object in


append-only is a term from the borg 1.x past, don't use it if it makes no sense anymore, that just adds confusion.

at the store level, full packs are just added usually. removed only by special borg operations like compact or check.

there is also no ordering anymore as in 1.x segment files 0..N, so when looking at the whole store, one maybe can say add-only, but append-only implies an "end" of the store that does not exist anymore.

ThomasWaldmann

needs optimisation or RAM usage will explode.

ThomasWaldmann · 2026-06-15T07:48:52Z

+        for id, entry in self.chunks.iteritems():
+            bucket = packs.setdefault(entry.pack_id, {"used": [], "unused": []})


Never do whole-chunkindex transformations, building new datastructures that contain most of the information from the chunks index. The chunks index is a highly optimized datastructure and even then, it can be many gigabytes in size. If you are transforming that into another arrangement using unoptimized python datastructures, you will likely use too much RAM.

ThomasWaldmann · 2026-06-15T07:52:34Z

+        packs_to_drop = [pid for pid, b in packs.items() if b["unused"] and not b["used"]]
+        packs_to_rewrite = [pid for pid, b in packs.items() if b["unused"] and b["used"]]


Basically the right idea, just needs a more efficient implementation.

ThomasWaldmann · 2026-06-15T07:55:51Z

+                del self.chunks[id]
+        pi.finish()
+
+    def _copy_forward_pack(self, pack_id, used_ids):


usually this is named "compaction".

ThomasWaldmann · 2026-06-15T07:59:21Z

+        writer = PackWriter(store, max_count=self.repository.pack_max_count, chunks=self.chunks)
+        for id in used_ids:
+            entry = self.chunks[id]
+            key = "packs/" + bin_to_hex(entry.pack_id)
+            cdata = store.load(key, offset=entry.obj_offset, size=entry.obj_size)
+            writer.add(id, cdata)
+        writer.flush()


Use borgstore.Store.defrag method, that is way more efficient.

ThomasWaldmann · 2026-06-15T08:01:03Z


-    def delete(self, id, wait=True):
-        """delete a repo object
+    def delete_pack(self, pack_id, wait=True):


guess you could just use .store_delete at the few places that need this.

ThomasWaldmann reviewed Jun 15, 2026

View reviewed changes

ThomasWaldmann requested changes Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

compact: copy-forward packs, retire Repository.delete#9777

compact: copy-forward packs, retire Repository.delete#9777
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:pack-files-step7-compact-copy-forward

mr-raj12 commented Jun 15, 2026

Uh oh!

codecov Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

ThomasWaldmann Jun 15, 2026

Uh oh!

ThomasWaldmann Jun 15, 2026

Uh oh!

ThomasWaldmann left a comment

Uh oh!

ThomasWaldmann Jun 15, 2026

Uh oh!

ThomasWaldmann Jun 15, 2026

Uh oh!

ThomasWaldmann Jun 15, 2026

Uh oh!

ThomasWaldmann Jun 15, 2026

Uh oh!

ThomasWaldmann Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		for id, entry in self.chunks.iteritems():
		bucket = packs.setdefault(entry.pack_id, {"used": [], "unused": []})

		packs_to_drop = [pid for pid, b in packs.items() if b["unused"] and not b["used"]]
		packs_to_rewrite = [pid for pid, b in packs.items() if b["unused"] and b["used"]]

Uh oh!

Conversation

mr-raj12 commented Jun 15, 2026

Description

Checklist

Uh oh!

codecov Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ThomasWaldmann Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann left a comment

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 15, 2026 •

edited

Loading

ThomasWaldmann Jun 15, 2026 •

edited

Loading