Skip to content

sshkeda/lossless-agent-context

Repository files navigation

lossless-agent-context

Find, view, and losslessly convert recorded agent sessions across Claude Code, Codex, and Pi.

Each of those tools stores its session context in its own verbose JSONL shape. This repo imports any of them into one canonical event model, which powers two things: human/LLM-readable views for mining what an agent did, and lossless conversion between providers so you can hot-swap a session from one tool to another.

The primary use is finding and viewing sessions. Conversion is secondary.

Install

git clone https://github.com/sshkeda/lossless-agent-context
cd lossless-agent-context
bun install
ln -sf "$PWD/bin/lac" ~/.local/bin/lac   # put `lac` on PATH

lac <args> is equivalent to bun src/cli/index.ts <args> from the checkout.

Find and view sessions

lac <session.jsonl>                     # digest: timeline, tool ledger, token growth
cat session.jsonl | lac -               # stdin; provider auto-detected

lac find fix-bridge-dialog              # find by id, branch, cwd, or opening task
lac find --since 2026-06-01             # filter by recency (--since / --before)
lac find keychain --deep --since 2026-06-01   # full-text search inside sessions
lac find --sort tokens                  # rank recent sessions by context blowup

lac view session.jsonl --event <id>     # show one omitted span in full
lac view session.jsonl --full           # digest without omission truncation
lac view session.jsonl --stats          # header + token growth + tool ledger only (no timeline)
lac find <query> --digest --limit 3     # find the newest N and digest each (compact)

The digest is a lossy filter over the canonical events — session header, token growth (input → peak, for both Claude and Codex), a user/assistant/reasoning/tool timeline, and a per-tool call/error ledger. Long spans are truncated and tagged with their source event id (evt:<id>); nothing is lost, recover any span with --event. find scans the Codex (~/.codex/sessions), Claude Code (~/.claude/projects), Pi (~/.pi/agent/sessions), cursor-agent (~/.cursor/projects), and agy / Antigravity (~/.gemini/antigravity-cli/conversations) stores, and groups Claude subagent matches under their parent session. To discover sessions a provider writes to a non-standard location (e.g. an agent fleet pointed at a custom dir), add extra dirs per provider via LAC_PI_DIRS / LAC_CODEX_DIRS / LAC_CLAUDE_DIRS / LAC_CURSOR_DIRS / LAC_AGY_DIRS (colon-separated).

A lac skill (skills/lac/SKILL.md) documents this for agents.

Convert between providers

lac convert session.jsonl --to claude-code -o claude.jsonl   # provider -> provider
lac prepare-claude-code-resume session.jsonl --from pi -o claude-seed.jsonl
cat session.jsonl | lac convert - --to codex                 # stdin, source auto-detected
lac convert session.jsonl --from pi --to codex               # explicit source
lac convert conversation.db --to agy -o copy.db              # agy/cursor: byte-identical .db (binary, needs -o)
lac convert session.jsonl --to agy -o new.db                 # foreign → agy: clone a local agy .db as a template (lossless via carry)
lac convert session.jsonl --to agy --install                 # construct + drop into agy's store, print `agy --conversation <id>`

--install (agy only) writes the constructed .db straight into agy's conversation store (~/.gemini/antigravity-cli/conversations/<cascade>.db) under its cascade id — the id agy opens a conversation by — and prints the ready-to-run agy --conversation <id> … command, so testing a constructed session is a one-liner (agy must be logged in to actually continue it).

Providers: pi | claude-code | codex. --from is auto-detected when omitted. cursor and agy are read via the sqlite3 binary (--from cursor / --from agy, or auto-detected). cursor has two stores: the lossy JSONL transcript (~/.cursor/projects/.../agent-transcripts/, assistant text + tool_use only) and the authoritative store.db (~/.cursor/chats/<hash>/<id>/store.db) — a content-addressed blob DAG with the full conversation including tool results, read by pointing lac at the .db. agy stores each conversation as a binary SQLite database (~/.gemini/antigravity-cli/conversations/<id>.db) with the timeline in protobuf.

Both can also be written back byte-identically (--to agy / --to cursor, binary, requires -o). agy's per-step blobs are crypto signatures/embeddings and cursor's store is content-addressed, so neither can be reconstructed from semantics — instead the importer preserves the exact original .db and the exporter replays it, the same preserve-and-replay the JSONL providers use via __lac_foreign. This survives a round-trip through another provider: agy → claude → agy and cursor → claude → cursor are byte-identical.

Both --to cursor and --to agy additionally accept a foreign session (one that didn't come from that provider):

  • cursor's transcript is open JSON, so lac constructs a valid agent-transcripts JSONL from any canonical stream. The visible transcript is the same lossy view cursor itself writes (text + tool_use, no results), but the full canonical session rides along in a __lac_session field cursor ignores.
  • agy has no open format, so lac instead clones the largest local agy .db as a structural skeleton and transplants the foreign session into the plaintext step fields the agy model actually reads. agy reconstructs the model's history from those plaintext fields (not the encrypted per-step blob, which it ignores), so the foreign session is rendered as ordered narration ("The user asked: …", "I used the bash tool: ", "The result was: …") and written into each skeleton step's prose — every duplicated copy — with the cascade/trajectory/executor ids rewritten to a fresh set (shared cross-conversation constants preserved). The result is model-resumable: agy loads it, the model inherits the foreign conversation as real context and can continue it (verified end-to-end against the authenticated CLI — asked to recall the prior commands, the model lists them accurately). Constructing needs at least one real agy conversation on the machine to clone (agy can't be written from a schema); without one, --to agy errors clearly. The visible narration is best-effort; the full canonical still rides in a lac_carry table agy ignores, so lac re-import is exact regardless. (agy occasionally rejects a constructed conversation when its derived ids collide with another local conversation's — re-converting a different session avoids it.)

Both carries make the round-trip lossless: pi → cursor → pi, codex → cursor → codex, pi → agy → pi, and agy → cursor → agy all come back byte-identical, because re-import reconstructs the exact canonical from the carry rather than the lossy visible view.

How it works

raw native JSONL  ↔  canonical event model  ↔  views (lossy)
  (pi/claude/codex)            │                convert (lossless)
                               └─────────────→  raw native JSONL

The canonical model is append-only and provider-agnostic. Views are lossy filters over it; exporters take canonical events back to any of the three native shapes losslessly.

When going cross-provider (e.g. Pi → Claude Code), exporters embed foreign native line envelopes as __lac_foreign / __lac_canonical fields when the target provider can safely carry them. Claude Code resume seeds are stricter than generic conversion JSONL, so prepare-claude-code-resume writes an adjacent recovery sidecar at <file>.jsonl.lossless.json. Keep that file next to the JSONL when converting back; the CLI reads it automatically for file-based Claude Code imports.

Layout

  • bin/lac — launcher (symlink onto PATH)
  • src/cli/ — the lac CLI: find, view, convert, prepare-claude-code-resume
  • src/core/ — canonical event schema
  • src/adapters/ — Pi / Claude Code / Codex JSONL importers + exporters
  • src/find/ — session discovery across provider stores
  • src/views/ — the digest view + omitted-span recovery
  • src/tokens.ts, src/json.ts — shared token-usage and JSON-narrowing helpers
  • skills/lac/ — agent-facing skill
  • test/e2e/ — fixture-driven integration tests

Conversion coverage

  • Pi JSONL ↔ canonical
  • Claude Code JSONL ↔ canonical
  • Codex JSONL ↔ canonical
  • cursor store.db (content-addressed DAG) and agy .db (protobuf-in-SQLite) ↔ canonical, byte-identical via preserve-and-replay of the original .db
  • cursor-agent transcript JSONL → canonical (lossy subset of the store.db; no .db to replay)
  • Foreign → cursor (constructed agent-transcripts JSONL) and foreign → agy (a real agy .db cloned as a template, ids rewritten, agy-loadable), each carrying the full canonical losslessly in a field/table the provider ignores (__lac_session / lac_carry), so pi → agy → pi and pi → cursor → pi are byte-identical
  • Cross-provider export (e.g. Pi → Claude Code) with lossless __lac_foreign / __lac_canonical carry-through
  • Deterministic recovery sidecars (*.lossless.json) for transforms that provider JSONL cannot safely carry directly, such as demoted reasoning markers
  • Native Codex response items including messages, reasoning, function/custom tool calls, web search calls, and image generation calls
  • Semantic Pi → Claude Code export validated by the real @anthropic-ai/claude-agent-sdkgetSessionMessages parses the converted output and returns the original Pi user/assistant chain

Structured omission

lac omit is the machine-first omission surface: a structured, token-budgeted projection of a session into compact conversation steps, for agent handoffs and for tools (e.g. gsty) that want the gist without re-learning each provider's JSONL. Provider-injected preamble is dropped; --budget keeps the opening task plus the most recent steps that fit. Like every view, it builds on the canonical model and never replaces the lossless convert path.

lac omit session.jsonl --budget 800                    # json (default)
lac omit session.jsonl --format markdown --budget 800
{
  "schema": "lac.omission.steps.v1",
  "steps": [
    { "kind": "user", "text": "asked to list screen sessions" },
    { "kind": "tool", "name": "gsty panes", "text": "listed 7 panes" },
    { "kind": "assistant", "text": "summarized visible sessions" }
  ],
  "omitted": { "events": 18, "bytes": 42193 }
}

Provider versions

Each adapter is written against a specific shape of a provider's on-disk session format. The JSONL providers (Claude Code / Codex / Pi) are relatively stable; the reverse-engineered binary formats (cursor's SQLite store, agy's protobuf-in-SQLite) can change shape with no notice. lac doctor pins the CLI version each adapter was verified against and flags drift, so a CLI update is a visible "re-check the format" signal instead of a silent breakage:

lac doctor
# ok    Claude Code        2.1.170
# ok    agy (Antigravity)  1.0.7  [reverse-engineered]
# DRIFT cursor-agent       verified 2026.06.04-…, installed 2026.07.… — re-check: …

The pinned versions live in src/provider-versions.ts; bump them (and re-run test:real-logs) when you re-verify against a new CLI release.

Development

bun run check         # typecheck + eslint + prettier
bun run test          # portable fixture-driven test suite
bun run verify:portable  # check + test in series (what CI proves)

Before cutting a release from a machine that has the local CLIs and session stores available, also run the real-log gate:

bun run test:real-logs

test:real-logs reads recent sessions from ~/.pi/agent/sessions, ~/.claude/projects, and ~/.codex/archived_sessions, validates target-native output at each hop, and checks byte-identical same-provider round-trips. Override picked files with LAC_REAL_PI_SESSION / LAC_REAL_CLAUDE_SESSION / LAC_REAL_CODEX_SESSION.

Design standard

Strict native fidelity:

  • provider -> lac -> other format -> lac -> provider must round-trip back to the original native session bytes.
  • This applies uniformly to Claude Code, Codex, and Pi.
  • If a rebuilt provider session is not byte-for-byte identical to the original native session, that is a bug in lac.
  • Provider-specific workarounds that rely on replaying preserved raw files instead of reconstructing them are not the intended end state — reconstruction itself must be lossless.

Tool-use concurrency: Claude's API requires every tool_use in an assistant message to be answered in the single immediately-following user message. A cross-provider seed used to split a parallel tool_use batch's results across consecutive one-result user messages, which the docs flagged as a source of API Error: 400 due to tool use concurrency issues. prepareClaudeCodeResumeSeed now merges those split results back into one user message (offline-verified: a real pi→claude seed drops from 16 ungrouped batches to 0; a no-op for same-provider seeds, so byte fidelity holds). Note: a live A/B claude --resume of the split vs. merged seed on the current CLI accepted both, so today's resume path appears to tolerate or internally regroup the split — the merge now stands as defensive conformance to the documented API rule rather than a fix for a reproducible 400.

License

MIT © 2026 Stephen Shkeda

About

Lossless session switching between Claude Code, Codex, and Pi. Convert JSONL sessions between any two formats without dropping context.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages