MCP T18 — Incremental indexing (file-hash based)

## Context

Today, every call to `Project.analyze_sources()` re-analyzes the entire repo from scratch. For a 5,000-file codebase this is the difference between an interactive workflow and a coffee break — and after the MCP server lands, agents will be calling `index_repo` repeatedly during a session as they edit code.

This ticket adds **file-hash-based incremental indexing**: track per-file content hashes in Redis, diff against the current state on each index call, and only re-analyze changed files.

Builds on T17's per-branch storage (each branch tracks its own hash map).

## Scope (in)

1. **Hash storage** — track per-file content hashes in Redis under `{repo}:{branch}_files` (a Redis hash, field per file path → SHA256). Persisted at the end of every full or incremental index.
2. **`Project.analyze_sources(incremental=True)`** — walk the file tree, compute current hashes, diff against stored hashes:
   - **Unchanged files** → skip the analyzer entirely
   - **Modified files** → call existing `delete_files()` to remove old graph entities for this file, then re-run the analyzer (first pass) on just these files
   - **Deleted files** → call `delete_files()` only
   - **New files** → analyze normally
3. **Second pass (LSP symbol resolution)** — for v1, **safe correctness wins**: if any file changed, run the second pass over the entire branch graph. Per-file second-pass optimization is deferred.
4. **Persist** the new hash map to Redis at the end (atomic — old map stays until new one is fully written).
5. **`Project` API** — expose `was_incremental: bool` and `files_changed: list[str]` for callers.
6. **CLI** — `cgraph index .` defaults to incremental when a graph already exists for `(project, branch)`; new `--full` flag forces a full re-index.
7. **MCP tool** — `index_repo(..., incremental=True)` is the default (consumed by #652 T4); response includes `mode: \"full\"|\"incremental\"` and `files_changed: list[str]`.

## Edge cases handled

- First-time indexing of a branch → falls back to full
- Hash store missing or corrupted → falls back to full with a warning logged to stderr
- File renames → treated as delete + add (rename detection deferred to Phase 2)
- Aborted previous run leaving stale hashes → next full run overwrites

## Scope (out)

- Per-file second-pass / LSP optimization (Phase 2).
- Rename detection (Phase 2).
- Cross-branch incremental (each branch has its own hash store).
- Watching the filesystem for changes (this is pull-based; user/agent calls index_repo).

## Files

- modified `api/project.py` (new `incremental` flag, hash diff orchestration, `was_incremental` / `files_changed` attributes)
- modified `api/info.py` (new file-hash get/set helpers under `{repo}:{branch}_files`)
- modified `api/analyzers/source_analyzer.py` (incremental orchestration over the changed-file set)
- modified `api/cli.py` (`--full` flag on `index` and `index-repo`)
- modified `api/mcp/tools/structural.py` (consume incremental flag; report mode + files_changed)
- new `tests/test_incremental_indexing.py`

## Acceptance criteria

- [ ] Index fixture → re-index with no changes → second run reports `mode=incremental, files_changed=[]` and is significantly faster (assert via analyzer-call-count, not wall clock).
- [ ] Modify one file → re-index → only that file's entities are deleted+re-added; other entities untouched (verify by node-id snapshot diff).
- [ ] Delete a file → re-index → its entities are removed from the graph.
- [ ] Add a new file → re-index → its entities appear.
- [ ] First run on a fresh branch automatically falls back to full (no hash store yet).
- [ ] `--full` CLI flag forces full re-index even when graph exists.
- [ ] Corrupted hash store → falls back to full with a warning logged.
- [ ] MCP `index_repo` integration test exercises an unchanged → modified → deleted → added sequence end-to-end.
- [ ] Existing full-index tests still pass (incremental is opt-in at the API level, even if CLI defaults to it).

## Dependencies

- #651 (T17 — per-branch graph identity) — hard dep; needs branch-scoped Redis keys for the hash store.

## Notes for the implementer

- Use SHA256 over file bytes (not mtime) — mtime is unreliable across git checkouts and CI environments.
- The hash diff should be the *only* place that decides what to re-analyze. Don't sprinkle incremental logic deep into individual analyzers; orchestrate it in `source_analyzer.py`.
- Be careful with `delete_files()` — it must remove **all** graph entities tied to a file (Functions, Classes, edges) without leaving orphans. Verify with a node-count assertion in the test.
- The second-pass-over-everything decision is intentional for v1. Don't try to be clever here; the goal is correctness, and the first pass is where most of the win is.
- When the hash store is missing/corrupted, log clearly to stderr so users notice and aren't surprised by a slow \"incremental\" run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP T18 — Incremental indexing (file-hash based) #665

Context

Scope (in)

Edge cases handled

Scope (out)

Files

Acceptance criteria

Dependencies

Notes for the implementer

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MCP T18 — Incremental indexing (file-hash based) #665

Description

Context

Scope (in)

Edge cases handled

Scope (out)

Files

Acceptance criteria

Dependencies

Notes for the implementer

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions