Skip to content

indexer: handle missing parent objects in shallow clones and root commits#21

Open
Vui-Chee wants to merge 1 commit intofacebookexperimental:mainfrom
Vui-Chee:fix/shallow-clone-object-not-found
Open

indexer: handle missing parent objects in shallow clones and root commits#21
Vui-Chee wants to merge 1 commit intofacebookexperimental:mainfrom
Vui-Chee:fix/shallow-clone-object-not-found

Conversation

@Vui-Chee
Copy link

@Vui-Chee Vui-Chee commented Mar 7, 2026

Problem

Running semcode-index --source /path/to/repo without --git fails with:

Error: An object with id 69695f5331d4d670eff76376faa3aeb5d68733e3 could not be found

This happens on:

  • Shallow clones (git clone --depth 1) — the HEAD commit has parent SHAs recorded in it, but those parent objects are not present in the local object store
  • Root commits — repos initialized from tarballs (e.g. git init && git add . && git commit) with no parent at all

Root Cause

When no --git flag is provided, run_pipeline auto-detects HEAD and constructs the range SHA^..SHA (parent-to-HEAD). list_shas_in_range then calls resolve_to_commit(repo, "SHA^"), which gix implements by looking up the parent object in the local store — failing with the above error when it doesn't exist.

Fix

In list_shas_in_range, catch the error from resolving from_spec and fall back to returning just [to_commit_sha]. This means indexing proceeds with the single HEAD commit, which is the correct behaviour for the auto-detect case.

Test

# Shallow clone
git clone --depth 1 https://github.com/torvalds/linux linux-shallow
semcode-index --source linux-shallow --database ./linux-shallow.db

# Tarball-initialized repo
tar xf linux-6.18.6.tar.xz && cd linux-6.18.6
git init && git add . && git commit -m "initial"
semcode-index --source . --database ./code.db

…mits

When `semcode-index --source` is run without `--git`, it auto-detects
HEAD and constructs the range `HEAD^..HEAD`. On a shallow clone or a
repo initialized from a tarball (single root commit), `HEAD^` refers to
a parent object that doesn't exist in the local object store, causing
gix to error with "An object with id ... could not be found".

Fix `list_shas_in_range` to catch this error and fall back to returning
just the target commit SHA, so indexing proceeds normally with the
single HEAD commit.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@meta-cla
Copy link

meta-cla bot commented Mar 7, 2026

Hi @Vui-Chee!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla
Copy link

meta-cla bot commented Mar 7, 2026

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant