Skip to content

feat: streaming transcript/checkpoint parsing to reduce memory amplification#956

Open
Krishnachaitanyakc wants to merge 1 commit intogit-ai-project:mainfrom
Krishnachaitanyakc:feat/streaming-transcript-parsing-v2
Open

feat: streaming transcript/checkpoint parsing to reduce memory amplification#956
Krishnachaitanyakc wants to merge 1 commit intogit-ai-project:mainfrom
Krishnachaitanyakc:feat/streaming-transcript-parsing-v2

Conversation

@Krishnachaitanyakc
Copy link
Copy Markdown
Contributor

@Krishnachaitanyakc Krishnachaitanyakc commented Apr 5, 2026

Summary

  • Convert 4 JSONL transcript parsers (Claude, Codex, Windsurf, Droid) from read_to_string to streaming BufReader + .lines(), eliminating full-file memory materialization
  • Convert 2 JSON parsers (Gemini, Continue) from read_to_string + from_str to from_reader, avoiding intermediate String allocation
  • Stream checkpoint JSONL reads via BufReader instead of read_to_string
  • Eliminate 2 redundant read_all_checkpoints() calls in get_all_tracked_files() by reading once upfront and threading as &[Checkpoint]
  • Perform hash migration in-place (&mut iteration) instead of allocating a second Vec
  • Add configurable size caps (max_checkpoint_jsonl_bytes=64MB, max_transcript_bytes=32MB) via env vars or file config; checkpoint cap is advisory-only, transcript cap returns Err to preserve existing data
  • Handle --reset with eager reset on read failure, propagating write errors independently
  • Harden generated CI workflows with action version pinning and integrity checks

Motivation

git-ai's transcript/checkpoint parsers cause ~5-8x memory amplification: a 187 MB transcript produces ~1.2 GB peak RSS, and a 307 MB checkpoint file produces ~1.78 GB peak RSS with ~33s runtime. This causes OOM kills and poor UX on long AI coding sessions.

Targets

  • Transcript peak RSS down ≥40%
  • Checkpoint peak RSS down ≥30%, wall-clock down ≥25%

Test plan

  • cargo fmt -- --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test --lib — 1205 passed, 0 failed
  • Run scripts/repro_runaway_memory.py for before/after RSS measurements
  • Manual testing with large transcript files (>100MB)
  • Verify --reset recovers from corrupt checkpoints.jsonl

Open with Devin

@Krishnachaitanyakc Krishnachaitanyakc marked this pull request as ready for review April 7, 2026 20:59
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Open in Devin Review

@Krishnachaitanyakc Krishnachaitanyakc force-pushed the feat/streaming-transcript-parsing-v2 branch from 66b6a9c to 910a20c Compare April 7, 2026 21:14
…ication

Convert JSONL transcript parsers (Claude, Codex, Windsurf, Droid) from
read_to_string to BufReader + line-by-line streaming. Use from_reader
for JSON parsers (Gemini, Continue). Stream checkpoint JSONL reads via
BufReader. Eliminate double read_all_checkpoints() calls in
get_all_tracked_files() by threading checkpoints as a parameter.

Add configurable size caps (max_checkpoint_jsonl_bytes=64MB,
max_transcript_bytes=32MB) via env vars or file config. Checkpoint cap
is advisory-only (warns but still parses). Transcript cap returns Err
to preserve existing data rather than silently replacing with empty.

Perform hash migration in-place instead of allocating a second Vec.
Handle --reset with eager reset on read failure so corrupt checkpoint
files can be recovered without swallowing non-corruption I/O errors.

Harden generated CI workflows with version pinning and integrity checks.

Targets: transcript RSS down >=40%, checkpoint RSS down >=30%.
@Krishnachaitanyakc Krishnachaitanyakc force-pushed the feat/streaming-transcript-parsing-v2 branch from 324eb68 to 37a47a5 Compare April 10, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant