Skip to content

feat(node): pin the per-push object delta instead of re-enumerating the whole repo#90

Merged
kevincodex1 merged 4 commits into
mainfrom
fix/pin-push-delta-reenumeration
Jun 24, 2026
Merged

feat(node): pin the per-push object delta instead of re-enumerating the whole repo#90
kevincodex1 merged 4 commits into
mainfrom
fix/pin-push-delta-reenumeration

Conversation

@beardthelion

@beardthelion beardthelion commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

Pin only the objects a push actually introduces, instead of re-enumerating the entire repository on every push. Pin cost now scales with push size rather than total repo size.

Motivation & context

After each push the IPFS and Pinata pin paths ran git cat-file --batch-all-objects over the whole repo and then a per-object DB lookup, so a one-commit push to a large repo walked and checked every object in history. The post-receive handler already has the push's ref updates (old/new tips) in scope, so the introduced objects are computable directly.

No tracked issue for this one; it came out of a performance pass on the replication path.

Kind of change

  • Bug fix
  • Feature
  • Security fix
  • Docs
  • Tests / CI
  • Refactor (no behavior change)
  • Breaking or protocol change (issue required first)

What changed

Crate touched: gitlawb-node.

  • New git::push_delta module. resolve_push_delta computes the per-push delta with git rev-list --objects --no-object-names <new tips> --not <old tips>. It fails closed to a full scan (returns FullScanRequired) on a non-commit tip, any git error, or the GITLAWB_PIN_FORCE_FULLSCAN kill-switch. The non-commit guard peels fully with git cat-file -t '<sha>^{}' (a bare cat-file -t returns tag for an annotated tag, and rev-list does not error on a non-commit tip, so neither can be the guard).
  • pin_new_objects in both ipfs_pin and pinata now take the candidate OID set from the caller instead of scanning the repo themselves. The two identical list_all_objects copies collapse into one in push_delta. The withheld filter and per-object DB dedup downstream are unchanged.
  • Extracted the anonymous replication withheld gate (caller = None) into a shared replication_withheld_set helper returning (announce, withheld), so the push path and a future reconciliation sweep cannot drift. The per-caller read-serve withheld computation in git_upload_pack is deliberately left separate (it filters per requester, not for anonymous replication).
  • resolve_candidates_for_push owns the dispatch and logs every degraded path (fallback taken, fallback failed, task panicked) rather than silently pinning nothing.

How a reviewer can verify

cargo test -p gitlawb-node push_delta       # delta + all degenerate states (blob/tree/tag/tag-of-tag tips, force-push, multi-ref, missing OID, fallback)
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings
# Optional: set GITLAWB_PIN_FORCE_FULLSCAN=1 to force the old whole-repo behavior.

Before you request review

  • Scope is one logical change; no unrelated churn
  • cargo test --workspace passes locally
  • New behavior is covered by tests (required for fixes)
  • cargo fmt --all and cargo clippy --workspace --all-targets -- -D warnings are clean
  • Commit titles use Conventional Commits (feat(...), fix(...), docs(...))
  • Docs / .env.example updated if behavior or config changed (or N/A)
  • Checked existing PRs so this isn't a duplicate

Notes for reviewers

  • Security framing: narrowing the pin enumeration only shrinks the per-push exposure set, so it cannot cause an under-withholding leak. The withheld filter still runs on whatever candidate set arrives (delta, full-scan fallback, or empty). The only risk it introduces is bounded under-pinning (a durability gap), not a leak.
  • That durability gap is backstopped by a reconciliation sweep that is intentionally not in this PR. The sweep is a full-history caller of the withheld walk, so it depends on the history-deep blob_paths fix from fix(node): close under-withholding via full ref scope and full-history classification (#42) #84 landing on main first. Holding it as a follow-up keeps this change to the pin-enumeration side only.
  • Known edge: a bulk --mirror push spawns one cat-file per ref. Fine for ordinary pushes; batchable with a single cat-file --batch session if bulk pushes become common.

Summary by CodeRabbit

  • New Features

    • Added delta-based pinning for push operations, pinning only newly introduced objects instead of scanning the full repository each time.
  • Refactor

    • Centralized replication “fail-closed” gating for anonymous replication and updated push replication enforcement to use it.
    • Updated the IPFS and Pinata pinning flows to accept a per-push candidate object set derived from the push.
  • Tests

    • Added a comprehensive test suite for push delta candidate resolution, including fail-closed and kill-switch scenarios.

Compute the per-push object delta (git rev-list --objects <new> --not <old>)
for the pin path instead of enumerating the whole repo, with a fully-peeled
ref-type guard (cat-file -t '<sha>^{}') that fails closed to a full scan on a
non-commit tip, any git error, or the GITLAWB_PIN_FORCE_FULLSCAN kill-switch.
The whole-repo lister moves here for the fallback and the reconciliation sweep.

Pin-enumeration scope only (N16); the withheld filter is unchanged.
Wire the push-delta enumeration into the IPFS and Pinata pin paths so pin work
scales with push size, not repo size (N16).

- U2: both pin_new_objects twins take a candidate OID set instead of scanning
  the whole repo internally; repo_path is retained for reading object bytes.
- U4: extract the anonymous replication withheld gate (caller=None, no caller
  param) into a shared helper returning (announce, withheld), so the push path
  and the future reconciliation sweep cannot drift. The per-caller read-serve
  withheld call in git_upload_pack is deliberately left separate.
- U3: the post-receive handler resolves the push delta once (off-thread),
  falls back to a full scan on a non-commit tip / git error / kill-switch, and
  passes the candidate set to both pin paths. Logs delta size and fallbacks.

The reconciliation sweep (U5) is deferred: it depends on the history-deep
blob_paths fix (#84) landing on main first.
…_delta

Code review found the push candidate computation collapsed both a failed
full-scan fallback and a panicked spawn_blocking task to a silent empty set,
diverging from the sibling withheld path that logs and fails closed.

Extract resolve_candidates_for_push into git::push_delta (it owns the
PinCandidates dispatch and full-scan fallback), logging every degraded path
(fallback taken, fallback failed, task panicked) instead of failing silently.
Pinning nothing on a hard failure is still safe (the withheld filter always
runs; the reconciliation sweep backstops durability) but is now observable.
Moving the dispatch out of the HTTP handler also makes it unit-testable.

Add tests: tag-of-tag-of-commit peels to a Delta, and resolve_candidates_for_push
delta + non-commit-tip full-scan-fallback paths.
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89e7a335-669c-4046-9810-441a2dbb8a60

📥 Commits

Reviewing files that changed from the base of the PR and between 47e6903 and d4f2aac.

📒 Files selected for processing (1)
  • crates/gitlawb-node/src/api/repos.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/gitlawb-node/src/api/repos.rs

📝 Walkthrough

Walkthrough

Introduces a new push_delta module that enumerates only the Git objects introduced by a push (instead of full-repo scans) to produce a PinCandidates set. Both ipfs_pin::pin_new_objects and pinata::pin_new_objects are updated to accept an explicit candidate list. git_receive_pack is refactored to use a new replication_withheld_set helper for fail-closed replication gating and threads the computed per-push candidates into both pinning paths.

Changes

Push-delta pinning optimization

Layer / File(s) Summary
PinCandidates contract and push_delta resolver
crates/gitlawb-node/src/git/mod.rs, crates/gitlawb-node/src/git/push_delta.rs
Declares the push_delta submodule; defines PinCandidates enum (Delta(Vec<String>) / FullScanRequired) and FORCE_FULL_SCAN_ENV kill-switch; implements resolve_push_delta with deletion-only handling, commit-type guard via peeled_object_type, rev_list_delta, list_all_objects; adds resolve_candidates_for_push async wrapper with spawn_blocking and full-scan fallback; comprehensive test suite.
ipfs_pin and pinata candidate-set API
crates/gitlawb-node/src/ipfs_pin.rs, crates/gitlawb-node/src/pinata.rs
Updates pin_new_objects in both modules to accept candidates: Vec<String>, derives object_list via replicable_objects(candidates, withheld), and removes the private list_all_objects helpers that previously shelled out to git cat-file --batch-all-objects.
git_receive_pack replication gating and candidate threading
crates/gitlawb-node/src/api/repos.rs
Adds ZERO_SHA constant and replication_withheld_set private async helper (fail-closed announce/withheld computation); refactors Phase 2 enforcement to call this helper; derives per-push candidates via resolve_candidates_for_push when withheld is Some; passes candidates_ipfs/candidates_pinata into both IPFS and Pinata pinning call sites.

Sequence Diagram(s)

sequenceDiagram
  participant ReceivePack as git_receive_pack
  participant Helper as replication_withheld_set
  participant Delta as resolve_candidates_for_push
  participant IPFS as ipfs_pin::pin_new_objects
  participant Pinata as pinata::pin_new_objects

  ReceivePack->>Helper: visibility snapshot
  Helper-->>ReceivePack: (announce, withheld Option)

  alt withheld is Some
    ReceivePack->>Delta: new_tips / old_tips (excl. ZERO_SHA)
    Delta-->>ReceivePack: candidates Vec<String>
    ReceivePack->>IPFS: candidates_ipfs, withheld
    IPFS-->>ReceivePack: [(sha, cid)]
    ReceivePack->>Pinata: candidates_pinata, withheld
    Pinata-->>ReceivePack: [(sha, cid)]
  else withheld is None
    ReceivePack->>ReceivePack: skip pinning (fail-closed)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Gitlawb/node#34: Main PR extends PR #34's Phase-2 git_receive_pack replication enforcement by refactoring the withheld/announce computation into replication_withheld_set and threading per-push candidates into IPFS/Pinata pinning.
  • Gitlawb/node#36: Both PRs touch the same replication/pinning codepath in pin_new_objects/pinata::pin_new_objects; PR #36 added encrypted-withheld replication surfaces while this PR refactors receive-pack pinning to use push-delta candidate OID sets instead of repo-wide enumeration.

Suggested labels

sev:low, crate:node, kind:perf

Suggested reviewers

  • kevincodex1

Poem

🐇 Hippity-hop through the object store,
No more scanning every file like before!
Just the new commits, the delta so bright,
Pinned with precision — fail-closed and tight.
The withheld blobs stay hidden from sight,
Push deltas computed with speed and delight! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: optimizing pinning to use per-push object deltas instead of full repository enumeration.
Description check ✅ Passed The description follows the template structure with all required sections completed: Summary, Motivation & context, Kind of change, What changed, How a reviewer can verify, and comprehensive Notes for reviewers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/pin-push-delta-reenumeration

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/gitlawb-node/src/api/repos.rs (1)

724-741: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy lift

Keep fallback candidate scans off the push response path.

resolve_candidates_for_push(...).await now runs before the handler returns, but the IPFS/Pinata pinning work is already backgrounded. If the helper takes the full-scan fallback or the kill switch is enabled, a large repo scan can add user-visible git push latency; consider moving candidate resolution into the spawned replication/pinning worker and sharing the resulting candidate set there.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/repos.rs` around lines 724 - 741, The
resolve_candidates_for_push function call is running synchronously in the
handler before returning, which can add user-visible latency to the git push
response if the fallback full-scan path is triggered. Move the candidate
resolution logic from its current location (where it's awaited as part of the
handler response) into the spawned background replication/pinning worker task,
and share the computed candidate set (new_tips, old_tips, and resulting
candidates) within that worker context instead of computing it on the push
response path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 52-77: When the withheld_blob_oids computation fails or panics and
withheld becomes None, the code still returns the original announce value,
allowing downstream replication to proceed when it should fail closed. Modify
the logic to clear or set announce to None in cases where withheld computation
fails (distinguished from the legitimate None case when rules themselves are
None). You can track whether a failure occurred in the Some(rules) branch and
conditionally return a tuple with announce set to None when the withheld
computation fails, ensuring that the final return statement at line 77 properly
handles the fail-closed scenario for both announce and withheld.

---

Nitpick comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 724-741: The resolve_candidates_for_push function call is running
synchronously in the handler before returning, which can add user-visible
latency to the git push response if the fallback full-scan path is triggered.
Move the candidate resolution logic from its current location (where it's
awaited as part of the handler response) into the spawned background
replication/pinning worker task, and share the computed candidate set (new_tips,
old_tips, and resulting candidates) within that worker context instead of
computing it on the push response path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8fc693f-d027-414a-8481-9abb62fa58c1

📥 Commits

Reviewing files that changed from the base of the PR and between 2153b0b and 47e6903.

📒 Files selected for processing (5)
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/git/push_delta.rs
  • crates/gitlawb-node/src/ipfs_pin.rs
  • crates/gitlawb-node/src/pinata.rs

Comment thread crates/gitlawb-node/src/api/repos.rs Outdated
A failed or panicked withheld_blob_oids walk left announce=true while
withheld=None, so a push we could not vet would still gossip ref updates,
notify peers, and write a permanent Arweave anchor. Force announce false
in that arm so the gate fails closed on both axes.
@beardthelion

Copy link
Copy Markdown
Collaborator Author

On the candidate-resolution latency nitpick (repos.rs:724-741): leaving it as-is. The happy path is a single bounded git rev-list <new> ^<old> on the push delta; only the rare degraded branch (non-commit tip or git error) takes the full-scan fallback. Moving resolution off the response path means handing a shared candidate set to two independent spawned workers (IPFS + Pinata), which is a spawn-topology change disproportionate to a rare fallback. Will revisit if the fallback shows up in push-latency telemetry.

@kevincodex1 kevincodex1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

@kevincodex1 kevincodex1 merged commit 1af4fdf into main Jun 24, 2026
14 checks passed
beardthelion added a commit that referenced this pull request Jun 24, 2026
Resolves conflicts in ipfs_pin.rs and pinata.rs against #90 (per-push delta
pin). #90 centralized object enumeration in push_delta::list_all_objects
(cat-file --batch-all-objects) and reworked the pin path to take a candidate
OID set, so this branch's local reachable-only list_all_objects helpers and
their tests are now dead duplicates — dropped both. The withholding fix
(full ref scope + full-history classification + explicit detached-HEAD walk)
in visibility_pack.rs is unaffected; updated its stale comments that still
described the pin set as 'git rev-list --objects --all'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants