From 4d6e76434fcdccf016e8855a8e201f607236d447 Mon Sep 17 00:00:00 2001 From: Kaartic Sivaraam Date: Sun, 8 Feb 2026 00:53:49 +0530 Subject: [PATCH 1/2] SoC-2026: add more ideas based on Christian's suggestions Ref: https://lore.kernel.org/git/CAP8UFD29LtG2dRRB4f6mZAHNGqDmDxUV4ULYw3w3OYg15ZBBYg@mail.gmail.com/ --- SoC-2026-Ideas.md | 189 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 187 insertions(+), 2 deletions(-) diff --git a/SoC-2026-Ideas.md b/SoC-2026-Ideas.md index 28e2d94a2..221fc75a3 100644 --- a/SoC-2026-Ideas.md +++ b/SoC-2026-Ideas.md @@ -143,6 +143,140 @@ _Possible mentors_: * Siddharth Asthana < > * Lucas Seiki Oshiro < > +### Improve disk space recovery for partial clones + +Git's partial clone feature allows users to clone repositories without downloading +all objects immediately, which is particularly useful for very large repositories. +Objects are fetched on-demand from "promisor remotes" as needed. However, over time, +clients may accumulate large local blobs that are no longer needed but remain on disk, +and currently there's no easy way to reclaim this space. + +This project aims to improve `git-backfill` (or create a new command) to allow +clients to remove large local blobs when they are available on a promisor remote. +This would help users who want to get back disk space while maintaining the ability +to re-fetch objects when needed. + +The project involves: +- Designing a safe mechanism to identify which blobs can be removed +- Implementing the removal process while maintaining repository integrity +- Ensuring removed objects can be transparently re-fetched when needed +- Adding appropriate safeguards and user controls + +**Getting started:** Build Git from source, set up a partial clone and experiment +with promisor remotes, study the existing `git-backfill` command (if available) +or related functionality, understand how Git tracks and fetches objects from +promisor remotes, review documentation on partial clones in +`Documentation/technical/partial-clone.txt`, and submit a micro-patch to +demonstrate familiarity with the codebase. + +**Resources:** +- [Partial clone documentation](https://git-scm.com/docs/partial-clone) +- [Git Protocol v2 documentation](https://git-scm.com/docs/gitprotocol-v2) + +_Expected Project Size_: 175 hours or 350 hours + +_Difficulty_: Medium to Hard + +_Languages_: C, shell(bash) + +_Possible mentors_: + +* Christian Couder < > +* Karthik Nayak < > +* Justin Tobler < > +* Siddharth Asthana < > +* Ayush Chandekar < > +* Lucas Seiki Oshiro < > + +### Implement promisor remote fetch ordering + +When a Git repository is configured with multiple promisor remotes, there's +currently no mechanism to specify or optimize the order in which these remotes +should be queried when fetching missing objects. Different remotes may have +different performance characteristics, costs, or reliability, making fetch +order an important consideration. + +This project aims to implement a fetch ordering mechanism for multiple promisor +remotes. The order could be: +- Configured locally by the client +- Advertised by servers through the promisor-remote protocol +- Determined dynamically based on network conditions or other heuristics + +The key challenge is designing a flexible system that allows servers to +communicate their preferred fetch order to clients (to ensure optimal +performance and cost management) while still allowing client-side overrides +when appropriate. + +**Getting started:** Build Git from source, set up a repository with multiple +promisor remotes and experiment with object fetching, study how Git currently +handles multiple remotes, review the promisor-remote protocol in +`Documentation/gitprotocol-v2.txt`, understand partial clone implementation, +and submit a micro-patch to demonstrate familiarity with the codebase. + +**Resources:** +- [Partial clone documentation](https://git-scm.com/docs/partial-clone) +- [Git Protocol v2 documentation](https://git-scm.com/docs/gitprotocol-v2) + +_Expected Project Size_: 175 hours or 350 hours + +_Difficulty_: Medium to Hard + +_Languages_: C, shell(bash) + +_Possible mentors_: + +* Christian Couder < > +* Karthik Nayak < > +* Justin Tobler < > +* Siddharth Asthana < > +* Ayush Chandekar < > +* Lucas Seiki Oshiro < > + +### Enhance promisor-remote protocol for better-connected remotes + +Currently, the promisor-remote protocol allows servers to advertise remotes +that the server itself uses as promisor remotes. However, as suggested by +Junio Hamano, it would be more useful if servers could advertise +"better-connected" remotes - remotes that might not be promisor remotes +for the server but would be good choices for the client. + +This enhancement would allow servers to guide clients toward optimal remote +configurations, potentially improving performance and reducing load on +individual servers by distributing requests across a network of remotes. + +This project involves: +- Extending the promisor-remote protocol to support advertising + better-connected remotes +- Implementing server-side logic to determine and advertise appropriate remotes +- Implementing client-side handling of these advertisements +- Designing the protocol extension with backward compatibility in mind +- Testing with various network topologies + +**Getting started:** Build Git from source, study the current promisor-remote +protocol implementation, read Junio's suggestion in `Documentation/gitprotocol-v2.txt`, +understand how Git currently advertises and uses promisor remotes, set up test +scenarios with multiple interconnected remotes, and submit a micro-patch to +demonstrate familiarity with the codebase. + +**Resources:** +- [Partial clone documentation](https://git-scm.com/docs/partial-clone) +- [Git Protocol v2 documentation - promisor remote section](https://git-scm.com/docs/gitprotocol-v2#_promisor_remotepr_info) + +_Expected Project Size_: 175 hours or 350 hours + +_Difficulty_: Hard + +_Languages_: C, shell(bash) + +_Possible mentors_: + +* Christian Couder < > +* Karthik Nayak < > +* Justin Tobler < > +* Siddharth Asthana < > +* Ayush Chandekar < > +* Lucas Seiki Oshiro < > + ### Complete and extend the `remote-object-info` command for `git cat-file` From around June 2024 to March 2025, work was undertaken by Eric Ju to add a @@ -188,10 +322,61 @@ _Languages_: C, shell(bash) _Possible mentors_: +* Christian Couder < christian.couder@gmail.com > +* Karthik Nayak < karthik.188@gmail.com > +* Justin Tobler < jltobler@gmail.com > +* Ayush Chandekar < ayu.chandekar@gmail.com > +* Siddharth Asthana < siddharthasthana31@gmail.com > +* Lucas Seiki Oshiro < lucasseikioshiro@gmail.com > +* Chandra Pratap < chandrapratap3519@gmail.com > + +### Improve signature handling in fast-export/fast-import and git-filter-repo + +Git's `fast-export` and `fast-import` commands are powerful tools for +repository manipulation and migration, and `git-filter-repo` builds on +these to provide advanced repository filtering capabilities. However, +handling of commit and tag signatures during these operations could +be significantly improved. + +Currently, signatures may be lost or become invalid when objects are +exported and imported, which can be problematic for repositories that +rely on signed commits or tags for security and verification purposes. + +This project aims to improve how these tools handle signatures by: +- Preserving signature information during export/import operations +- Providing options for signature handling (preserve, strip, re-sign, etc.) +- Ensuring signature validity is maintained or appropriately flagged +- Extending `git-filter-repo` to handle signatures correctly +- Adding tests and documentation for signature-related workflows + +**Note:** This project may potentially conflict with ongoing work by GitLab +developers (including Christian Couder) on signature handling. Applicants +should coordinate with mentors before proposing this project to ensure the +work would not duplicate ongoing efforts. + +**Getting started:** Build Git from source, experiment with `git fast-export` +and `git fast-import` on repositories with signed commits and tags, study +the current signature handling code, review `git-filter-repo` functionality, +understand GPG signature verification in Git, and submit a micro-patch to +demonstrate familiarity with the codebase. + +**Resources:** +- [git-fast-export documentation](https://git-scm.com/docs/git-fast-export) +- [git-fast-import documentation](https://git-scm.com/docs/git-fast-import) +- [git-filter-repo project](https://github.com/newren/git-filter-repo) +- [Git signature verification documentation](https://git-scm.com/docs/git-verify-commit) + +_Expected Project Size_: 175 hours or 350 hours + +_Difficulty_: Medium to Hard + +_Languages_: C, Python (for git-filter-repo), shell(bash) + +_Possible mentors_: + * Christian Couder < > * Karthik Nayak < > * Justin Tobler < > -* Ayush Chandekar < > * Siddharth Asthana < > +* Ayush Chandekar < > * Lucas Seiki Oshiro < > -* Chandra Pratap < > From 79cb13d212b88d52731d786d3334f35fbb31e1e7 Mon Sep 17 00:00:00 2001 From: Kaartic Sivaraam Date: Wed, 11 Feb 2026 02:24:25 +0530 Subject: [PATCH 2/2] SoC-2026-Ideas: remove already taken idea and further tweaks --- SoC-2026-Ideas.md | 62 ++++++----------------------------------------- 1 file changed, 8 insertions(+), 54 deletions(-) diff --git a/SoC-2026-Ideas.md b/SoC-2026-Ideas.md index 221fc75a3..9534de009 100644 --- a/SoC-2026-Ideas.md +++ b/SoC-2026-Ideas.md @@ -151,7 +151,7 @@ Objects are fetched on-demand from "promisor remotes" as needed. However, over t clients may accumulate large local blobs that are no longer needed but remain on disk, and currently there's no easy way to reclaim this space. -This project aims to improve `git-backfill` (or create a new command) to allow +This project aims to improve `git backfill` (or create a new command) to allow clients to remove large local blobs when they are available on a promisor remote. This would help users who want to get back disk space while maintaining the ability to re-fetch objects when needed. @@ -162,6 +162,12 @@ The project involves: - Ensuring removed objects can be transparently re-fetched when needed - Adding appropriate safeguards and user controls +**Important note:** While the project mentions `git backfill`, it is not yet +decided that it is right place to have this command. Other potential candidates +for placement are `git gc` / `git repack` / `git maintenance`. A design discussion +with the community is imminent as part of this project to finalize the most +appropriate placement and for this command. + **Getting started:** Build Git from source, set up a partial clone and experiment with promisor remotes, study the existing `git-backfill` command (if available) or related functionality, understand how Git tracks and fetches objects from @@ -200,12 +206,10 @@ This project aims to implement a fetch ordering mechanism for multiple promisor remotes. The order could be: - Configured locally by the client - Advertised by servers through the promisor-remote protocol -- Determined dynamically based on network conditions or other heuristics The key challenge is designing a flexible system that allows servers to communicate their preferred fetch order to clients (to ensure optimal -performance and cost management) while still allowing client-side overrides -when appropriate. +performance and cost management). **Getting started:** Build Git from source, set up a repository with multiple promisor remotes and experiment with object fetching, study how Git currently @@ -330,53 +334,3 @@ _Possible mentors_: * Lucas Seiki Oshiro < lucasseikioshiro@gmail.com > * Chandra Pratap < chandrapratap3519@gmail.com > -### Improve signature handling in fast-export/fast-import and git-filter-repo - -Git's `fast-export` and `fast-import` commands are powerful tools for -repository manipulation and migration, and `git-filter-repo` builds on -these to provide advanced repository filtering capabilities. However, -handling of commit and tag signatures during these operations could -be significantly improved. - -Currently, signatures may be lost or become invalid when objects are -exported and imported, which can be problematic for repositories that -rely on signed commits or tags for security and verification purposes. - -This project aims to improve how these tools handle signatures by: -- Preserving signature information during export/import operations -- Providing options for signature handling (preserve, strip, re-sign, etc.) -- Ensuring signature validity is maintained or appropriately flagged -- Extending `git-filter-repo` to handle signatures correctly -- Adding tests and documentation for signature-related workflows - -**Note:** This project may potentially conflict with ongoing work by GitLab -developers (including Christian Couder) on signature handling. Applicants -should coordinate with mentors before proposing this project to ensure the -work would not duplicate ongoing efforts. - -**Getting started:** Build Git from source, experiment with `git fast-export` -and `git fast-import` on repositories with signed commits and tags, study -the current signature handling code, review `git-filter-repo` functionality, -understand GPG signature verification in Git, and submit a micro-patch to -demonstrate familiarity with the codebase. - -**Resources:** -- [git-fast-export documentation](https://git-scm.com/docs/git-fast-export) -- [git-fast-import documentation](https://git-scm.com/docs/git-fast-import) -- [git-filter-repo project](https://github.com/newren/git-filter-repo) -- [Git signature verification documentation](https://git-scm.com/docs/git-verify-commit) - -_Expected Project Size_: 175 hours or 350 hours - -_Difficulty_: Medium to Hard - -_Languages_: C, Python (for git-filter-repo), shell(bash) - -_Possible mentors_: - -* Christian Couder < > -* Karthik Nayak < > -* Justin Tobler < > -* Siddharth Asthana < > -* Ayush Chandekar < > -* Lucas Seiki Oshiro < >