feat(services/git): Add Git service with transparent LFS support#6836
feat(services/git): Add Git service with transparent LFS support#6836siomporas wants to merge 1 commit intoapache:mainfrom
Conversation
| // Use full clone instead of shallow to support arbitrary commit SHAs | ||
| // Shallow clone only gets the tip of the default branch | ||
| let (repo, _) = prepare | ||
| .fetch_only(gix::progress::Discard, &gix::interrupt::IS_INTERRUPTED) |
There was a problem hiding this comment.
If we need to full clone the repo, what's the value this service can bring to our users?
There was a problem hiding this comment.
The clone operation is for the underlying git repository, which is done in a temp folder. This is necessary because of how git's internal object database and revisions are stored and served in packs. Dealing with individual files requires access to the packs, which is how things like GitHub serve individual files - this service is meant to use the least common denominator, which is git's protocol, so we need to download the packs for an oid to get the files which is what gix provides.
The value isn't in providing the contents of the core git repo (these contents are loaded into memory when the repo is cloned) - it is offering streams for the LFS objects which are generally huge, and exist completely outside and on top of the git repository and are directly accessible over http.
Hopefully that makes sense. I suggest you clone down the example project I linked in the description and clone an AI model off of huggingface while watching resource utilisation for the process if that still doesn't make sense.
lockfile Co-authored-by: Matthew Hambrecht <matthew.hambrecht@patapsco.ai>
0471131 to
12c290b
Compare
Which issue does this PR close?
Closes #6831.
Rationale for this change
OpenDAL has support for specific git service providers like HuggingFace, but not a generic git provider with LFS support. These changes provide generic git + LFS file streaming support using an OpenDAL service
git.What changes are included in this PR?
A new service
git, documentation, and crate features for the new service.Are there any user-facing changes?
A new service back end!
NOTE - I tested these changes pretty comprehensively on LFS repositories in my private Gitlab instance as well as on HuggingFace, both with and without credentials on private and public repositories, and I tested non-LFS repos as well including on Github.
I created a companion demo project here that bootstraps this particular version of OpenDAL using a git submodule, and provides a simple CLI tool to clone git repository states including LFS to the local file system to demonstrate the new service.