Skip to content

Support efficient indexing for Git worktree-based agent development workflows #351

@pcristin

Description

@pcristin

Summary

Agent-driven development workflows increasingly rely on git worktree instead of a single local checkout with manually switched branches.

Coding agents often create isolated worktrees for issue implementation, PR review, parallel tasks, verification runs, and short-lived experiments. This makes worktrees a core part of the development model, not just an occasional advanced Git feature.

Codebase Memory MCP would be more useful in these environments if it could understand Git worktrees as related contexts of the same repository, without requiring a full independent index for every temporary worktree.

Problem

A repository may have:

  • one canonical checkout
  • multiple issue worktrees
  • PR review worktrees
  • temporary agent-created worktrees
  • detached or short-lived verification worktrees

If each worktree is treated as a fully separate project/index, several issues appear:

  • duplicated indexing of mostly identical repository content
  • unnecessary disk growth
  • stale project entries after worktrees are deleted
  • ambiguous project selection when the MCP is used from inside a worktree
  • missing branch-specific context if only the canonical checkout is indexed
  • extra cleanup logic outside the MCP

This is especially visible in agentic workflows, where multiple short-lived worktrees may be created and removed as part of normal development.

Desired Outcome

Codebase Memory MCP should support worktree-heavy workflows efficiently.

Ideally:

  • indexing from a worktree should recognize the canonical repository
  • mostly identical repository content should not require a full duplicate index
  • worktree-specific changes should remain queryable
  • deleted or stale worktrees should be detectable
  • queries from inside a worktree should use the relevant worktree context by default
  • project/status output should make the relationship between repository and worktrees clear

Possible Design Directions

Option 1: Canonical Project + Worktree Diff Overlay

Maintain one full canonical repository index, then store only per-worktree overlays for changed files.

Queries from a worktree could resolve against:

  1. the worktree overlay
  2. the canonical repository index

This is likely disk-efficient and well aligned with temporary agent worktrees.

Tradeoff: query resolution needs to merge canonical and overlay graph data.

Option 2: Canonical Project + Branch/Commit Contexts

Represent one repository as a single project with multiple branch, commit, or worktree contexts underneath it.

This would keep project lists clean and make worktree relationships explicit.

Tradeoff: storage still needs deduplication, otherwise branch snapshots could become full duplicate indexes.

Option 3: Separate Worktree Projects With Shared Content Storage

Keep worktrees as separate projects externally, but deduplicate indexed file content and graph fragments internally.

This may be less disruptive to existing APIs.

Tradeoff: project-list clutter and context ambiguity may remain unless grouped views are added.

Option 4: On-Demand Worktree Diff Indexing

Persist only the canonical index, then compute and index the current worktree diff on demand.

This could work well for short-lived review or verification worktrees.

Tradeoff: first query may be slower, and long-lived branches may need a persistent overlay.

Suggested Direction

A good incremental path might be:

  1. detect when an indexed path is a Git worktree;
  2. resolve the canonical repository identity;
  3. expose repository/worktree grouping in project/status output;
  4. store worktree metadata such as path, branch, HEAD SHA, and base commit;
  5. later add diff-overlay indexing so worktrees do not require full duplicate indexes.

This would let the MCP become more autonomous in modern agent-driven development workflows while avoiding index bloat.

Contribution Note

If maintainers agree on the direction, I can help with implementation. A reasonable first PR could focus only on worktree detection, canonical repository identity, and grouped status output before changing deeper indexing behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions