Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
3b8014f
Add HTTP stream MCP server
bigsmartben Jun 15, 2026
b553675
docs: design cloud native codegraph crd
bigsmartben Jun 16, 2026
7c878e7
docs: plan cloud native codegraph crd implementation
bigsmartben Jun 16, 2026
cd4d42d
chore: ignore local worktrees
bigsmartben Jun 16, 2026
96997f5
feat: scaffold codegraph operator module
bigsmartben Jun 16, 2026
fff65c5
feat: add codegraph repository crd types
bigsmartben Jun 16, 2026
3a0fffa
fix: tighten codegraph repository api schema
bigsmartben Jun 16, 2026
ab72697
test: harden codegraph repository schema checks
bigsmartben Jun 16, 2026
feaec22
fix: bound operator resource names
bigsmartben Jun 16, 2026
538ef5f
feat: build codegraph repository workloads
bigsmartben Jun 16, 2026
bb56f10
docs: require runtime rollout after indexing
bigsmartben Jun 16, 2026
40eb0fc
fix: prepare runtime rollout resources
bigsmartben Jun 16, 2026
f823da3
fix: make ingress route exact
bigsmartben Jun 16, 2026
dbddec2
feat: build codegraph repository routes
bigsmartben Jun 16, 2026
84422fa
feat: reconcile codegraph repository resources
bigsmartben Jun 16, 2026
7074745
fix: harden repository reconciliation updates
bigsmartben Jun 16, 2026
693fb39
feat: report codegraph repository readiness
bigsmartben Jun 16, 2026
5d5bfd2
fix: classify terminal sync job failures
bigsmartben Jun 16, 2026
d64e2fc
fix: harden runtime readiness status
bigsmartben Jun 16, 2026
3afd851
docs: add codegraph operator samples
bigsmartben Jun 16, 2026
e6ba556
docs: remove local operator path from readme
bigsmartben Jun 16, 2026
cb214c0
test: add operator validation script
bigsmartben Jun 16, 2026
40b1868
fix: use concrete operator api generation path
bigsmartben Jun 16, 2026
a09774e
fix: harden repository refresh lifecycle
bigsmartben Jun 16, 2026
52cceff
fix: wait for runtime pods before sync
bigsmartben Jun 16, 2026
e154c27
fix: distinguish runtime pods during refresh
bigsmartben Jun 16, 2026
673e2ee
fix: require explicit operator runtime image
bigsmartben Jun 16, 2026
ef5bb38
fix: serialize repository sync jobs
bigsmartben Jun 16, 2026
a7f8d4f
docs: sync cloud native operator plan
bigsmartben Jun 16, 2026
e3b17af
fix: avoid repeated operator resource updates
bigsmartben Jun 16, 2026
87b25cd
fix: preserve controller owned annotations
bigsmartben Jun 16, 2026
8d3d407
docs: design single mcp gateway
bigsmartben Jun 16, 2026
dbeb076
feat: add single mcp gateway
bigsmartben Jun 16, 2026
f83db54
docs: add cloud native gateway quickstart
bigsmartben Jun 16, 2026
d99d583
docs: sync gateway and agent guidance
bigsmartben Jun 17, 2026
34b8af7
feat: add repo-routed HTTP MCP gateway
bigsmartben Jun 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
219 changes: 219 additions & 0 deletions .agents/skills/add-lang/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
---
name: add-lang
description: Add tree-sitter language support to codegraph end-to-end — wire the grammar + extractor, write tests, then benchmark extraction quality and retrieval value on 3 popular real-world repos. Use when the user runs /add-lang <language> or asks to add/support a new language (e.g. Lua, Elixir, Zig, OCaml) in codegraph.
---

# Add a language to CodeGraph

Wire a new tree-sitter language into codegraph's extraction pipeline, prove it
extracts real symbols on popular repos, and prove it beats no-codegraph for an
agent. Runs **fully autonomously** — pick repos, benchmark, update docs, then
report. **Never commit, push, publish, or tag** (house rule); leave all changes
for the user to review.

The argument is the language token used throughout the `Language` union, e.g.
`lua`, `elixir`, `zig`. If none was given, ask which language. Use the lowercase
single-token form everywhere (`csharp`, not `c#`).

## Prerequisites
- Run from the codegraph repo root. `node`, `git`, `gh`, and a logged-in
`Codex` CLI (the benchmark spawns real `Codex -p` runs).
- The benchmark uses the local dev build — Step 8 builds + links it on PATH.

## Workflow

Copy this checklist and work through it in order:
```
- [ ] 1. Resolve language; bail early if already supported (just benchmark)
- [ ] 2. Find a grammar + health-check it (ABI / heap corruption)
- [ ] 3. Discover the grammar's AST node types (dump-ast.mjs)
- [ ] 4. Wire the language (4 files; sometimes a 5th core touch)
- [ ] 5. Build + verify-extraction loop until PASS
- [ ] 6. Add extraction tests; make them green
- [ ] 7. Auto-pick 3 popular repos by size tier; add to corpus.json
- [ ] 8. Benchmark all 3: extraction + with/without A/B
- [ ] 9. Update README + CHANGELOG
- [ ] 10. Report; do NOT commit
```

### Step 1 — Resolve + short-circuit

Check whether the language is already wired: look for the token in the
`LANGUAGES` const (`src/types.ts`) and the `EXTRACTORS` map
(`src/extraction/languages/index.ts`). If it is already supported (e.g.
`typescript`, `rust`), **skip Steps 2–6** and go straight to benchmarking
(Steps 7–8) to validate/measure it — note in the report that no code changed.

### Step 2 — Find a grammar, then health-check it

```bash
ls node_modules/tree-sitter-wasms/out/ | grep -i <lang> # csharp -> c_sharp
```
- **Present** → likely off-the-shelf; `grammars.ts` resolves it from
`tree-sitter-wasms` automatically. (Many languages: elixir, zig, ocaml,
solidity, toml, yaml, …)
- **Absent** → vendor a `.wasm` into `src/extraction/wasm/` (like `pascal` /
`scala` / `lua`) and add the token to the vendored branch in Step 4.

**Always health-check before writing an extractor — a *present* grammar can
still be unusable:**
```bash
node scripts/add-lang/check-grammar.mjs <lang> path/to/valid-sample.<ext>
```
It prints the grammar's ABI version and parses a valid sample many times in a
multi-grammar runtime. If it **FAILs** (ERROR trees on valid code — an old ABI
corrupting the shared WASM heap, which silently drops nested calls/imports on
every file after the first; e.g. the tree-sitter-wasms **Lua** grammar is ABI 13
and fails), do NOT use that wasm. **Vendor a newer (ABI 14/15) build instead:**
```bash
npm pack @tree-sitter-grammars/tree-sitter-<lang> # often ships a prebuilt *.wasm
# or build one: npx tree-sitter build --wasm (needs Docker/emscripten)
cp <the>.wasm src/extraction/wasm/tree-sitter-<lang>.wasm
```
then add the token to the vendored branch in Step 4 and re-run check-grammar on
the vendored path until it PASSes. **If you cannot obtain a healthy wasm, STOP
and tell the user.**

### Step 3 — Discover AST node types

Get a representative source file (write a small sample covering functions,
classes/structs, imports, enums; or `curl` a raw file from a known repo), then:
```bash
node scripts/add-lang/dump-ast.mjs <lang> path/to/sample.<ext>
# vendored grammar: pass the wasm path instead of the token
node scripts/add-lang/dump-ast.mjs src/extraction/wasm/tree-sitter-<lang>.wasm sample.<ext>
```
The frequency table + field names (`name:`, `parameters:`, `body:`,
`return_type:`) tell you what to map. Open the existing extractor closest to the
language's paradigm as a model: `rust.ts`/`scala.ts` (functional, traits),
`java.ts`/`csharp.ts` (OO), `python.ts`/`ruby.ts` (scripting), `go.ts`
(top-level methods + receivers).

### Step 4 — Wire the language (4 files)

These are exact, fragile wiring — match the existing style precisely:

1. **`src/types.ts`** — TWO edits:
- add `'<lang>',` to the `LANGUAGES` const (before `'unknown'`);
- add `'**/*.<ext>',` to `DEFAULT_CONFIG.include`. **Don't skip this** — it's
the file-scan allowlist; without the glob, `codegraph init` finds **0
files** even though detection/extraction are wired.
2. **`src/extraction/grammars.ts`** — three maps:
- `WASM_GRAMMAR_FILES`: `<lang>: 'tree-sitter-<lang>.wasm',`
- `EXTENSION_MAP`: each file extension → `'<lang>'` (e.g. `'.lua': 'lua',`)
- `getLanguageDisplayName`: `<lang>: '<Display Name>',`
- **vendored only**: add `<lang>` to the
`(lang === 'pascal' || lang === 'scala' || …)` wasm-path branch.
3. **`src/extraction/languages/<lang>.ts`** — new file exporting
`export const <lang>Extractor: LanguageExtractor = { … }`. Map the node types
from Step 3. Required fields: `functionTypes`, `classTypes`, `methodTypes`,
`interfaceTypes`, `structTypes`, `enumTypes`, `typeAliasTypes`,
`importTypes`, `callTypes`, `variableTypes`, `nameField`, `bodyField`,
`paramsField`. Add hooks as the grammar needs them (`getSignature`,
`getVisibility`, `isExported`, `extractImport`, `visitNode`, `getReceiverType`,
`interfaceKind`, `enumMemberTypes`, etc. — see
`src/extraction/tree-sitter-types.ts`).
4. **`src/extraction/languages/index.ts`** — `import { <lang>Extractor } from
'./<lang>';` and add `<lang>: <lang>Extractor,` to `EXTRACTORS`.

**Sometimes a 5th, core touch in `src/extraction/tree-sitter.ts`** — variable
extraction has per-language branches in `extractVariable` (the generic fallback
only finds direct `identifier`/`variable_declarator` children). If the grammar
nests declared names (e.g. Lua's `variable_declaration → variable_list`), add a
`} else if (this.language === '<lang>')` branch there, mirroring the existing
ts/python/go ones. Import forms that aren't a distinct node (Lua/Ruby `require`
is a *call*) are handled in the extractor's `visitNode` hook instead.

### Step 5 — Build + verify loop

```bash
npm run build # tsc + copy-assets (copies any vendored *.wasm into dist/)
```
Index a small sample repo and check extraction:
```bash
( cd <sample-repo> && codegraph init -i )
node scripts/add-lang/verify-extraction.mjs <sample-repo> <lang>
```
`verify-extraction.mjs` fails (exit 1) if the language isn't detected or only
`file`/`import` nodes were produced — the classic symptom of wrong node-type
names. On FAIL or a thin WARN: re-run `dump-ast.mjs` on a richer file, fix the
mappings in `<lang>.ts`, `npm run build`, re-index, re-verify. **Repeat until
PASS.**

### Step 6 — Tests

Add to `__tests__/extraction.test.ts`, modeled on the `Rust Extraction` block:
- a `detectLanguage` assertion in `describe('Language Detection')`
- a `describe('<Lang> Extraction')` block asserting functions/classes/imports
are extracted from an inline source string.
```bash
npx vitest run __tests__/extraction.test.ts
```
Green before continuing.

### Step 7 — Auto-pick 3 repos + corpus

Pick **without asking**. Find candidates, then curate 3 that are genuinely
`<lang>`-dominant, one per size tier:
```bash
gh search repos --language=<lang> --sort=stars --limit 40 \
--json fullName,stargazerCount,description
```
Tiers (match `corpus.json`): **Small** <~150 files · **Medium** ~150–1500 ·
**Large** >~1500. Skip repos that are tagged `<lang>` but mostly another
language. Write one cross-file architecture **question** per repo (the kind that
needs tracing across files). Add a `"<Language>"` block to
`.Codex/skills/agent-eval/corpus.json` (fields: `name`, `repo`, `size`,
`files`, `question`) so `/agent-eval` can reuse them.

### Step 8 — Benchmark all 3 (extraction + A/B)

Make the dev build the codegraph on PATH **once**, then loop:
```bash
npm run build && ./scripts/local-install.sh
scripts/add-lang/bench.sh <lang> <name> <url> "<question>" headless # ×3
```
`bench.sh` clones (shared `/tmp/codegraph-corpus`), wipes + indexes, runs
`verify-extraction.mjs`, then the with/without retrieval A/B via
`scripts/agent-eval/run-all.sh` (skips the paid A/B if extraction is broken).
Read each `parse-run.mjs` summary printed by `run-all.sh`: tool calls, file
`Read`s, Grep/Bash, codegraph-tool calls, duration, and **cost** — for both the
`with` and `without` arms. After the loop, restore the dev link if needed:
`./scripts/local-install.sh`.

### Step 9 — Docs + CHANGELOG

- **README.md**: add `<Lang>` to the "19+ Languages" feature bullet, and add a
row to the **Supported Languages** table:
`| <Lang> | \`.ext\` | Full support (classes, methods, …) |`.
- **CHANGELOG.md**: add an `## [Unreleased]` section at the top (above the
latest version) with `### Added` → a user-perspective bullet, e.g.
*"CodeGraph now indexes **<Lang>** (`.ext`) — functions, classes, imports, and
call edges."* If `## [Unreleased]` already exists, append under it. (It's
folded into the next versioned block at release time.)

### Step 10 — Report (do NOT commit)

Summarize for review:
- **Files changed**: the 4 wiring edits + new extractor + tests + README +
CHANGELOG + corpus.json (+ any vendored `.wasm`).
- **Extraction** per repo: files / nodes / edges / `verify-extraction` result.
- **A/B** per repo: `with` vs `without` (tool calls, file Reads, cost) and a
one-line verdict — did codegraph reduce effort, and did both arms reach a
correct answer?
- **Gaps / follow-ups** (node types not yet mapped, resolution edges missing,
framework routes, etc.).

Hand the changes to the user. **Do not** run `git commit`/`push` or publish —
releases go through the GitHub Actions Release workflow.

## Notes
- The A/B spawns real **paid** `Codex -p` runs (opus, `--max-budget-usd`),
2 arms × 3 repos. The corpus dir `/tmp/codegraph-corpus` is shared with
`/agent-eval`, so clones are reused across runs.
- Any new `*.wasm` must live in `src/extraction/wasm/` — `copy-assets` (run by
`npm run build`) ships it; otherwise it won't be in `dist/`.
- An index must be served by the **same** binary that built it. Step 8 builds +
links the dev build first, so this holds.
- If a grammar can't be obtained, or extraction can't reach PASS, **STOP and
report** — don't ship a half-wired language.
74 changes: 74 additions & 0 deletions .agents/skills/agent-eval/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
name: agent-eval
description: Benchmark CodeGraph retrieval quality on a real codebase by comparing agent behavior with vs without CodeGraph. Use when the user runs /agent-eval or asks to test, benchmark, audit, or validate a codegraph version (the local dev build or a published npm version) against a language's repo.
---

# CodeGraph Quality Audit

Measures how much CodeGraph helps an agent versus plain grep/read, for a chosen
codegraph version on a chosen real-world repo. Drives the harness in
`scripts/agent-eval/`.

## Prerequisites
- `tmux` 3+, a logged-in `Codex` CLI, `node`, `git` (macOS/Linux).
- Run from the codegraph repo root.

## Workflow

Copy this checklist:
```
- [ ] 1. Pick version (local or npm)
- [ ] 2. Pick language
- [ ] 3. Pick repo by size
- [ ] 4. Pick harness (headless / tmux / both)
- [ ] 5. Run audit.sh in the background
- [ ] 6. Report results
```

**Step 1 — version.** Ask with `AskUserQuestion`: which codegraph version to test.
Offer "Local dev build" and "Latest published"; the free-text "Other" lets the
user type a specific version (e.g. `0.7.10`). Map the answer to a VERSION token:
- "Local dev build" → `local`
- "Latest published" → `latest`
- a typed version → that string (e.g. `0.7.10`)

**Step 2 — language.** Read `.Codex/skills/agent-eval/corpus.json`. Ask with
`AskUserQuestion` which language to test, listing the languages that have entries.

**Step 3 — repo.** From the chosen language's entries, ask which repo. Label each
option with its size and file count, e.g. `excalidraw — Medium (~600 files)`.
Each entry carries the `repo` URL and a representative `question`.

**Step 4 — harness.** Ask with `AskUserQuestion` which harness to run, and map
the answer to a MODE token:
- "Headless" → `headless` — `Codex -p` with stream-json: exact tokens/cost and a
clean tool sequence (2 runs, fast, no TTY).
- "Interactive (tmux)" → `tmux` — drives the real Codex TUI in tmux: faithful
Explore-subagent behavior, metrics from session logs (2 runs, slower).
- "Both" → `all` — headless + interactive (4 runs).

**Step 5 — run.** Launch in the background (sets the version, clones if missing,
wipes + re-indexes, runs the chosen arms — several minutes):
```bash
scripts/agent-eval/audit.sh <VERSION> <repo-name> <repo-url> "<question>" <MODE>
```

**Step 6 — report.** When the job finishes, read the log and report per arm:
- Headless (`parse-run.mjs`): total tool calls, file `Read`s, Grep/Bash,
codegraph-tool calls, duration, **total cost**.
- Interactive (`parse-session.mjs`): the `VERDICT: codegraph_explore used Nx |
Read N | Grep/Bash N` and `TOKENS:` lines.

Lead with cost + tool/Read counts — they are the reliable signals; raw token
in/out are confounded by subagent delegation and prompt caching. State whether
codegraph reduced effort and whether both arms reached a correct answer.

## Notes
- The index is rebuilt every run (`audit.sh` wipes `.codegraph`) — different
versions extract differently, so an index must be served by the same binary
that built it.
- `audit.sh` temporarily mutates the global `codegraph` install for the test,
then restores your dev link via `local-install.sh`.
- Corpus repos are cloned to `/tmp/codegraph-corpus` (reused if already present).
- Add or edit repos in `corpus.json` (fields: `name`, `repo`, `size`, `files`,
`question`).
Loading