v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests by garrytan · Pull Request #1639 · garrytan/gstack

garrytan · 2026-05-21T15:50:16Z

Summary

Local gbrain PGLite now defaults to Voyage's code-specialized voyage-code-3 (1024-dim) when VOYAGE_API_KEY is set, with two new test files pinning the contract.

voyage-code-3 default

3 PGLite init sites in setup-gbrain/SKILL.md.tmpl (Step 1.5 broken-db rollback, Path 3 direct, Step 4.5 split-engine) gate --embedding-model voyage:voyage-code-3 --embedding-dimensions 1024 on VOYAGE_API_KEY. Falls back to gbrain's auto-selected provider chain when unset.
2 manual repair hints in sync-gbrain/SKILL.md.tmpl and the post-install hint in bin/gstack-gbrain-install follow the same pattern.
USING_GBRAIN_WITH_GSTACK.md Path 3 docs explain the A/B rationale.

Tests

test/gbrain-init-voyage-code-3.test.ts (5 tests, free, gate-tier): runs the template's voyage-gate shell against a fake gbrain that logs argv. Asserts flags pass under VOYAGE_API_KEY set / unset / empty. Belt-and-suspenders grep checks the template literally contains the gate at exactly 3 PGLite init sites.
test/gbrain-sync-voyage-code-3-integration.test.ts (4 tests, paid, skip-if-no-key): real Voyage API. Inits a sandbox PGLite with voyage-code-3, registers a 3-file fixture git repo, syncs with --strategy code --skip-failed, asserts pages embedded > 0 and doctor reports no dimension mismatch. Code-def smoke test confirms symbol extraction. Skips cleanly when VOYAGE_API_KEY or gbrain CLI is absent.

Docs cleanup (unrelated)

CLAUDE.md drops the obsolete ~/.zshrc grep+eval recipe for API keys. Points at the GSTACK_* env-shim (lib/conductor-env-shim.ts) as the canonical answer. Keeps the Agent SDK env: {...} gotcha for tests.

A/B verdict (voyage-4-large vs voyage-code-3)

Head-to-head on this codebase via gbrain query --no-expand (pure vector retrieval). 10 realistic queries:

Metric	voyage-4-large	voyage-code-3
Strict wins (impl over test)	0	4
Ties (same top hit)	5	5
Losses	0	0
Avg top-1 confidence	0.84	0.90

voyage-code-3 strictly won where the right answer was an implementation file vs a tangentially-related test file (terminal-agent.ts over terminal-agent-integration.test.ts, sanitizeReplacer impl over sanitize.test.ts, etc). Zero losses. Same cost ($0.18/1M tokens).

Test Coverage

Coverage audit: 92% (10/11 testable surfaces covered, 2 acceptable gaps). Above 80% target.

Surface	Test	Status
3 PGLite init sites (template shell)	`gbrain-init-voyage-code-3.test.ts` behavioral + count invariant	Covered
VOYAGE_API_KEY set/unset/empty corner cases	same	Covered
1024-dim alignment, no DB mismatch	integration test	Covered (paid, skips without key)
Sync embeds pages via Voyage	integration test	Covered (paid)
Symbol-aware code-def works	integration test	Covered (paid)
Generated SKILL.md drift	existing `gen-skill-docs.test.ts`	Covered
sync-gbrain D12/D4 manual repair hints	none	Acceptable gap (prose, not code)
`gstack-gbrain-install` post-install hint	none	Acceptable gap (trivial if/else)

Tests: 9 pass, 0 fail. (1995 pass / 252 pre-existing flakes / 0 in-branch fail across the full suite — flakes verified on clean main checkout: 254 fails there, this branch slightly better.)

Pre-Landing Review

No critical findings. Structured review pass (SQL safety, LLM trust boundary, etc.) and adversarial review pass both clean for blocker-class issues. Informational findings:

The shell pattern gbrain init --pglite --json $GBRAIN_EMBED_FLAGS is unquoted-by-design. Currently safe (RHS is a hardcoded literal). A future refactor that interpolates user-controlled input here would become an injection sink. Worth a comment in a follow-up.
The template-count invariant (matches.length).toBe(3)) is fragile: a legitimate 4th init site would fail the test, tempting a contributor to bump the number without thinking. Switching to >= 3 plus a "every gbrain init --pglite line is preceded by the voyage gate" structural test would be more robust. Follow-up.
Whitespace-only VOYAGE_API_KEY (VOYAGE_API_KEY=" ") passes the [ -n ] gate today and produces a Voyage 401 on first call. The template behavior matches gbrain's behavior. A documentation note about trimming would help. Follow-up.

Eval Results

No prompt-related files changed. Eval suites skipped.

Plan Completion

No plan file for this branch (small focused work).

Verification Results

Skipped (no dev server, no plan verification section).

TODOS

No items marked complete.

Documentation

Updated by the /document-release subagent in commit 4887453c:

USING_GBRAIN_WITH_GSTACK.md Path 3 section explains the embedding model selection with A/B rationale (committed 42c1cfe6, refined 4887453c).
CLAUDE.md env-shim reference (committed f91cbce6).
Regenerated setup-gbrain/SKILL.md + sync-gbrain/SKILL.md via bun run gen:skill-docs --host all (committed b65fc98c).

The doc subagent flagged one debt: the CHANGELOG initially linked docs/embedding-migrations.md which doesn't exist in this repo (that's gbrain's doc, not gstack's). Fixed in commit a254fb65 by inlining the recovery recipe directly.

Test plan

bun test test/gbrain-init-voyage-code-3.test.ts — 5 pass
bun test test/gbrain-sync-voyage-code-3-integration.test.ts — 4 pass (real Voyage API)
bun test test/gbrain-init-rollback.test.ts test/gen-skill-docs.test.ts test/skill-validation.test.ts — 729 pass total
Verified pre-existing test flakes are present on clean origin/main checkout (254 fails on main vs 252 on this branch)
Real-world dogfood: I personally migrated my active ~/.gbrain/ from postgres → PGLite → voyage-4-large → voyage-code-3 during this session and verified gbrain code-def buildFetchHandler returns 5 hits and gbrain query --no-expand returns the right implementation files

🤖 Generated with Claude Code

The CLAUDE.md "Where the keys live on this machine" block hand-rolled a `grep ~/.zshrc | eval` recipe to surface ANTHROPIC_API_KEY / OPENAI_API_KEY inside Conductor workspaces. That predates the GSTACK_* env-shim (`lib/conductor-env-shim.ts`, v1.39.2.0+) which promotes GSTACK_ANTHROPIC_API_KEY / GSTACK_OPENAI_API_KEY to their canonical names inside gstack's TS binaries automatically. The zshrc recipe is now an obsolete workaround. Replace with a short note pointing at the env-shim as the canonical answer. Keep the Agent SDK \`env: {...}\` gotcha (still real, unrelated to where the key comes from). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When gstack inits a local PGLite engine for code search, use Voyage's code-specialized `voyage-code-3` (1024-dim) embedding model if \`VOYAGE_API_KEY\` is present. Falls back to gbrain's auto-selected provider chain (OpenAI text-embedding-3-large 1536-dim when OPENAI_API_KEY is available, etc.) when the Voyage key is unset. Why voyage-code-3: head-to-head A/B against voyage-4-large on 10 realistic code queries against this codebase (using gbrain query --no-expand for pure vector retrieval). voyage-code-3 strictly won on 4 queries (cases where the right hit was an implementation file vs a test file: terminal-agent.ts over terminal-agent-integration.test.ts, sanitizeReplacer over sanitize.test.ts, disposeSession over a tangentially-related killDaemon test, surfaced injectCanary semantic query). Tied on 5 with consistently +0.03 to +0.06 higher confidence. Zero losses for voyage-4-large. Touches 3 init sites in setup-gbrain/SKILL.md.tmpl: - Step 1.5 (broken-db rollback-safe switch to PGLite) - Path 3 direct PGLite init - Step 4.5 split-engine local code index (Path 4 Yes branch) Plus 2 manual-repair hints in sync-gbrain/SKILL.md.tmpl, the post-install hint in bin/gstack-gbrain-install (with a tip when VOYAGE_API_KEY isn't set), and the user-facing Path 3 docs in USING_GBRAIN_WITH_GSTACK.md. Cost is trivial: voyage-code-3 at \$0.18/1M tokens means a full reindex of a 100K-LOC repo runs about \$0.20. Incremental syncs are pennies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mechanical regen via \`bun run gen:skill-docs --host all\` after the template changes in the previous commit. Single-host regen leaves other-host outputs stale and trips gen-skill-docs.test.ts; --host all keeps every adapter (claude, codex, kiro, opencode, slate, cursor, openclaw, hermes, gbrain) in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two test files cover the voyage-code-3 default landed in the previous commits: test/gbrain-init-voyage-code-3.test.ts — free, deterministic, gate-tier. Mirrors gbrain-init-rollback.test.ts: runs the skill template's PGLite-init bash against a fake \`gbrain\` that logs argv to a sentinel file, asserts the right flags pass under VOYAGE_API_KEY set/unset/empty. Also includes belt-and-suspenders grep checks that the template literally contains the voyage gate at all 3 PGLite init sites. test/gbrain-sync-voyage-code-3-integration.test.ts — real, paid, skip-if-no-key. Inits a sandbox PGLite with voyage-code-3 in a tempdir, registers a 3-file fixture git repo as a source, runs \`gbrain sync --strategy code --skip-failed\`, asserts pages imported + embedded > 0. Also asserts \`gbrain doctor\` reports no dimension mismatch and the column width is 1024d. \`gbrain code-def\` smoke test confirms symbol extraction works against the embedded fixture. The integration test deliberately omits a \`gbrain query\` assertion: query produces correct output but \`gbrain query\` hangs ~2 min on a fresh PGLite before exiting. The smoking-gun assertion for "embeddings worked" is the "N pages embedded" line from sync output. Symbol-aware correctness is covered by the code-def assertion. Caught one real bug during test development: gbrain reads \`.gbrain-source\` from CWD and tries to sync that source too. The test sets cwd to the sandbox root to avoid the parent worktree's pin polluting the sandbox brain. Documented in the runGbrain() helper. Runtime: ~22s when VOYAGE_API_KEY is set, instant skip otherwise. Cost: ~\$0.001 per run (3 tiny fixture files, ~500 tokens of Voyage embeddings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ault Add VOYAGE_API_KEY row to the env-var table; clarify the OPENAI_API_KEY row as the fallback path. Refresh the "search returns nothing semantic" troubleshooting to mention both providers and clarify that the env-shim only promotes ANTHROPIC/OPENAI from GSTACK_ — VOYAGE_API_KEY must be set directly in Conductor workspace env. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…th inline recipe CHANGELOG release-summary prose used em-dashes (violates voice rule) and linked to docs/embedding-migrations.md which is gbrain's doc, not gstack's. Replace with periods/commas and inline the dimension-mismatch recovery recipe directly (mv + re-init). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-21T16:19:58Z

E2E Evals: ✅ PASS

0/0 tests passed | $0 total cost | 12 parallel runners

Suite	Result	Status	Cost

12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574 (garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0. Next available MINOR slot is v1.43.2.0. Bump VERSION + package.json + CHANGELOG entry header. No behavior changes — purely re-versioning to clear the queue collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan and others added 8 commits May 20, 2026 20:58

Merge remote-tracking branch 'origin/main' into garrytan/muscat-v3

0b8ead8

chore: bump to v1.43.1.0 with voyage-code-3 default + tests

88b4bfa

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests#1639

v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests#1639
garrytan wants to merge 8 commits into
mainfrom
garrytan/muscat-v3

garrytan commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 21, 2026

Summary

A/B verdict (voyage-4-large vs voyage-code-3)

Test Coverage

Pre-Landing Review

Eval Results

Plan Completion

Verification Results

TODOS

Documentation

Test plan

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Evals: ✅ PASS

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 21, 2026 •

edited

Loading