v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests#1639
Open
garrytan wants to merge 8 commits into
Open
v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests#1639garrytan wants to merge 8 commits into
garrytan wants to merge 8 commits into
Conversation
The CLAUDE.md "Where the keys live on this machine" block hand-rolled a
`grep ~/.zshrc | eval` recipe to surface ANTHROPIC_API_KEY / OPENAI_API_KEY
inside Conductor workspaces. That predates the GSTACK_* env-shim
(`lib/conductor-env-shim.ts`, v1.39.2.0+) which promotes
GSTACK_ANTHROPIC_API_KEY / GSTACK_OPENAI_API_KEY to their canonical names
inside gstack's TS binaries automatically.
The zshrc recipe is now an obsolete workaround. Replace with a short note
pointing at the env-shim as the canonical answer. Keep the Agent SDK
\`env: {...}\` gotcha (still real, unrelated to where the key comes from).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When gstack inits a local PGLite engine for code search, use Voyage's code-specialized `voyage-code-3` (1024-dim) embedding model if \`VOYAGE_API_KEY\` is present. Falls back to gbrain's auto-selected provider chain (OpenAI text-embedding-3-large 1536-dim when OPENAI_API_KEY is available, etc.) when the Voyage key is unset. Why voyage-code-3: head-to-head A/B against voyage-4-large on 10 realistic code queries against this codebase (using gbrain query --no-expand for pure vector retrieval). voyage-code-3 strictly won on 4 queries (cases where the right hit was an implementation file vs a test file: terminal-agent.ts over terminal-agent-integration.test.ts, sanitizeReplacer over sanitize.test.ts, disposeSession over a tangentially-related killDaemon test, surfaced injectCanary semantic query). Tied on 5 with consistently +0.03 to +0.06 higher confidence. Zero losses for voyage-4-large. Touches 3 init sites in setup-gbrain/SKILL.md.tmpl: - Step 1.5 (broken-db rollback-safe switch to PGLite) - Path 3 direct PGLite init - Step 4.5 split-engine local code index (Path 4 Yes branch) Plus 2 manual-repair hints in sync-gbrain/SKILL.md.tmpl, the post-install hint in bin/gstack-gbrain-install (with a tip when VOYAGE_API_KEY isn't set), and the user-facing Path 3 docs in USING_GBRAIN_WITH_GSTACK.md. Cost is trivial: voyage-code-3 at \$0.18/1M tokens means a full reindex of a 100K-LOC repo runs about \$0.20. Incremental syncs are pennies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical regen via \`bun run gen:skill-docs --host all\` after the template changes in the previous commit. Single-host regen leaves other-host outputs stale and trips gen-skill-docs.test.ts; --host all keeps every adapter (claude, codex, kiro, opencode, slate, cursor, openclaw, hermes, gbrain) in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two test files cover the voyage-code-3 default landed in the previous commits: test/gbrain-init-voyage-code-3.test.ts — free, deterministic, gate-tier. Mirrors gbrain-init-rollback.test.ts: runs the skill template's PGLite-init bash against a fake \`gbrain\` that logs argv to a sentinel file, asserts the right flags pass under VOYAGE_API_KEY set/unset/empty. Also includes belt-and-suspenders grep checks that the template literally contains the voyage gate at all 3 PGLite init sites. test/gbrain-sync-voyage-code-3-integration.test.ts — real, paid, skip-if-no-key. Inits a sandbox PGLite with voyage-code-3 in a tempdir, registers a 3-file fixture git repo as a source, runs \`gbrain sync --strategy code --skip-failed\`, asserts pages imported + embedded > 0. Also asserts \`gbrain doctor\` reports no dimension mismatch and the column width is 1024d. \`gbrain code-def\` smoke test confirms symbol extraction works against the embedded fixture. The integration test deliberately omits a \`gbrain query\` assertion: query produces correct output but \`gbrain query\` hangs ~2 min on a fresh PGLite before exiting. The smoking-gun assertion for "embeddings worked" is the "N pages embedded" line from sync output. Symbol-aware correctness is covered by the code-def assertion. Caught one real bug during test development: gbrain reads \`.gbrain-source\` from CWD and tries to sync that source too. The test sets cwd to the sandbox root to avoid the parent worktree's pin polluting the sandbox brain. Documented in the runGbrain() helper. Runtime: ~22s when VOYAGE_API_KEY is set, instant skip otherwise. Cost: ~\$0.001 per run (3 tiny fixture files, ~500 tokens of Voyage embeddings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ault Add VOYAGE_API_KEY row to the env-var table; clarify the OPENAI_API_KEY row as the fallback path. Refresh the "search returns nothing semantic" troubleshooting to mention both providers and clarify that the env-shim only promotes ANTHROPIC/OPENAI from GSTACK_ — VOYAGE_API_KEY must be set directly in Conductor workspace env. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…th inline recipe CHANGELOG release-summary prose used em-dashes (violates voice rule) and linked to docs/embedding-migrations.md which is gbrain's doc, not gstack's. Replace with periods/commas and inline the dimension-mismatch recovery recipe directly (mv + re-init). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E2E Evals: ✅ PASS0/0 tests passed | $0 total cost | 12 parallel runners
12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite |
garrytan
added a commit
that referenced
this pull request
May 21, 2026
CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574 (garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0. Next available MINOR slot is v1.43.2.0. Bump VERSION + package.json + CHANGELOG entry header. No behavior changes — purely re-versioning to clear the queue collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Local gbrain PGLite now defaults to Voyage's code-specialized
voyage-code-3(1024-dim) whenVOYAGE_API_KEYis set, with two new test files pinning the contract.voyage-code-3 default
setup-gbrain/SKILL.md.tmpl(Step 1.5 broken-db rollback, Path 3 direct, Step 4.5 split-engine) gate--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024onVOYAGE_API_KEY. Falls back to gbrain's auto-selected provider chain when unset.sync-gbrain/SKILL.md.tmpland the post-install hint inbin/gstack-gbrain-installfollow the same pattern.USING_GBRAIN_WITH_GSTACK.mdPath 3 docs explain the A/B rationale.Tests
test/gbrain-init-voyage-code-3.test.ts(5 tests, free, gate-tier): runs the template's voyage-gate shell against a fake gbrain that logs argv. Asserts flags pass underVOYAGE_API_KEYset / unset / empty. Belt-and-suspenders grep checks the template literally contains the gate at exactly 3 PGLite init sites.test/gbrain-sync-voyage-code-3-integration.test.ts(4 tests, paid, skip-if-no-key): real Voyage API. Inits a sandbox PGLite with voyage-code-3, registers a 3-file fixture git repo, syncs with--strategy code --skip-failed, asserts pages embedded > 0 and doctor reports no dimension mismatch. Code-def smoke test confirms symbol extraction. Skips cleanly whenVOYAGE_API_KEYorgbrainCLI is absent.Docs cleanup (unrelated)
CLAUDE.mddrops the obsolete~/.zshrc grep+evalrecipe for API keys. Points at theGSTACK_*env-shim (lib/conductor-env-shim.ts) as the canonical answer. Keeps the Agent SDKenv: {...}gotcha for tests.A/B verdict (voyage-4-large vs voyage-code-3)
Head-to-head on this codebase via
gbrain query --no-expand(pure vector retrieval). 10 realistic queries:voyage-code-3 strictly won where the right answer was an implementation file vs a tangentially-related test file (
terminal-agent.tsoverterminal-agent-integration.test.ts,sanitizeReplacerimpl oversanitize.test.ts, etc). Zero losses. Same cost ($0.18/1M tokens).Test Coverage
Coverage audit: 92% (10/11 testable surfaces covered, 2 acceptable gaps). Above 80% target.
gbrain-init-voyage-code-3.test.tsbehavioral + count invariantgen-skill-docs.test.tsgstack-gbrain-installpost-install hintTests: 9 pass, 0 fail. (1995 pass / 252 pre-existing flakes / 0 in-branch fail across the full suite — flakes verified on clean main checkout: 254 fails there, this branch slightly better.)
Pre-Landing Review
No critical findings. Structured review pass (SQL safety, LLM trust boundary, etc.) and adversarial review pass both clean for blocker-class issues. Informational findings:
gbrain init --pglite --json $GBRAIN_EMBED_FLAGSis unquoted-by-design. Currently safe (RHS is a hardcoded literal). A future refactor that interpolates user-controlled input here would become an injection sink. Worth a comment in a follow-up.matches.length).toBe(3)) is fragile: a legitimate 4th init site would fail the test, tempting a contributor to bump the number without thinking. Switching to>= 3plus a "everygbrain init --pgliteline is preceded by the voyage gate" structural test would be more robust. Follow-up.VOYAGE_API_KEY(VOYAGE_API_KEY=" ") passes the[ -n ]gate today and produces a Voyage 401 on first call. The template behavior matches gbrain's behavior. A documentation note about trimming would help. Follow-up.Eval Results
No prompt-related files changed. Eval suites skipped.
Plan Completion
No plan file for this branch (small focused work).
Verification Results
Skipped (no dev server, no plan verification section).
TODOS
No items marked complete.
Documentation
Updated by the /document-release subagent in commit
4887453c:USING_GBRAIN_WITH_GSTACK.mdPath 3 section explains the embedding model selection with A/B rationale (committed42c1cfe6, refined4887453c).CLAUDE.mdenv-shim reference (committedf91cbce6).setup-gbrain/SKILL.md+sync-gbrain/SKILL.mdviabun run gen:skill-docs --host all(committedb65fc98c).The doc subagent flagged one debt: the CHANGELOG initially linked
docs/embedding-migrations.mdwhich doesn't exist in this repo (that's gbrain's doc, not gstack's). Fixed in commita254fb65by inlining the recovery recipe directly.Test plan
bun test test/gbrain-init-voyage-code-3.test.ts— 5 passbun test test/gbrain-sync-voyage-code-3-integration.test.ts— 4 pass (real Voyage API)bun test test/gbrain-init-rollback.test.ts test/gen-skill-docs.test.ts test/skill-validation.test.ts— 729 pass totalorigin/maincheckout (254 fails on main vs 252 on this branch)~/.gbrain/from postgres → PGLite → voyage-4-large → voyage-code-3 during this session and verifiedgbrain code-def buildFetchHandlerreturns 5 hits andgbrain query --no-expandreturns the right implementation files🤖 Generated with Claude Code