feat(outside-voice): add llm CLI fallback between Codex and Claude subagent by diogolealassis · Pull Request #1631 · garrytan/gstack

diogolealassis · 2026-05-21T00:34:08Z

What

When Codex is unavailable or errors, plan-review and second-opinion skills currently fall back to the Claude subagent. That fallback works, but loses cross-model independence — the subagent is the same model family as the primary review. This PR adds a middle step: try the llm CLI (datasette/llm) with a configured model before falling back to Claude.

New chain: Codex → llm → Claude subagent

When llm is installed and the configured model is registered (e.g. via llm install llm-xai for Grok), the outside voice goes through a genuinely different model family before falling through to Claude. When llm isn't available, behavior is unchanged.

Why this matters today

Codex CLI currently fails at startup for users whose ~/.codex/config.toml has an xai provider with wire_api = "chat" — which is the default per xAI's setup docs but deprecated per openai/codex#7782:

Error loading config.toml: `wire_api = "chat"` is no longer supported.
in `model_providers.xai.wire_api`

The error fires at CLI startup BEFORE the provider is selected, so calls targeted at OpenAI also fail. Users in this state get the Claude subagent fallback on every outside-voice step. They lose the cross-model value the skill is designed to deliver.

Hit twice in a single 2026-05-20 session on /plan-eng-review and /plan-ceo-review. The fix on the user side is one line (wire_api = "responses"), but for users who never had OpenAI auth in the first place (they use Codex/llm because of Grok/Anthropic preference), restoring Codex doesn't actually help — they need a non-Codex cross-model path.

How

Two functions in scripts/resolvers/review.ts produce the outside-voice section that gets templated into 4 skill files:

generateCodexPlanReview — used by /plan-*-review skills
generateCodexSecondOpinion — used by /office-hours

Both now produce a Codex → llm → Claude subagent chain. The llm step is gated on:

New outside_voice_llm_model config key (default grok-4-fast)
command -v llm succeeds
llm models lists the configured model

If any check fails, the chain falls through cleanly to the existing Claude subagent path. No behavior change for users who don't have llm installed.

The default grok-4-fast rather than grok-4-latest is deliberate — the latter's reasoning mode can hang 2-5 minutes on big prompts (the llm stream is empty during hidden reasoning, looks like a hang). grok-4-fast is the right shape for routine outside-voice work.

Configuration

gstack-config set outside_voice_llm_model grok-4-fast    # default
gstack-config set outside_voice_llm_model ""             # disable, skip llm step
gstack-config set outside_voice_llm_model gpt-4o         # any model `llm models` lists

Empty string skips the llm step entirely (Codex → subagent direct).

Files changed

bin/gstack-config                |  12 ++++
office-hours/SKILL.md            |  64 +++++++++++++++++++++--
plan-ceo-review/SKILL.md         |  47 +++++++++++++++++-
plan-devex-review/SKILL.md       |  47 +++++++++++++++++-
plan-eng-review/SKILL.md         |  47 +++++++++++++++++-
scripts/resolvers/review.ts      | 111 ++++++++++++++++++++++++++++++++++++++--
6 files changed, 314 insertions(+), 14 deletions(-)

The SKILL.md changes are mechanically regenerated by bun run gen:skill-docs — the only hand-edits are in bin/gstack-config and scripts/resolvers/review.ts.

Verification

bun run scripts/skill-check.ts passes, all freshness checks green
Spot-checked generated SKILL.md files — llm fallback section present, bash blocks syntactically correct, outside_voice_llm_model referenced consistently
Manually traced the 4-state matrix (codex/llm available × works/errors) through the generated bash to confirm fallthrough behavior
Sequencing fix in generateCodexSecondOpinion: $CODEX_PROMPT_FILE deletion moved from "after Codex" to "after chain end" since llm step reuses the prompt file
Did NOT test end-to-end against a real llm invocation with my Grok key — the resolver functions produce templated prose instructions for the agent, not executable code per se. Happy to do this if you want a smoke-test screenshot.

What this doesn't do

Doesn't change generateAdversarialStep (the /ship + /review adversarial flow). Different shape (always-on, not configurable). Could benefit from the same fallback but out of scope for this PR; happy to do as a follow-up.
Doesn't bump VERSION or add a CHANGELOG.md entry — leaving those to your release workflow.
Doesn't add a first-run prompt for outside_voice_llm_model via AskUserQuestion. The default grok-4-fast is sensible enough that most users won't need to touch it. If you want a first-run prompt I can add it.
Doesn't add tests. The resolver functions are pure string-returning, and the generated bash is exercised at agent-runtime not at gstack-build-time. Open to suggestions on how you'd want this tested.

Generated by

A real /plan-eng-review + outside-voice flow on a downstream user project (diogolealassis/gws-platform) hit the Codex failure twice in one session and surfaced the gap. Existing project memory codex-cli-broken-for-xai and the newly logged codex-cli-global-config-blocks-all-providers document the trigger. This PR closes the loop on the gstack side.

Happy to break this into smaller commits or split out the generateCodexSecondOpinion cleanup-sequencing fix if you'd prefer.

🤖 Generated with Claude Code

…bagent When Codex is unavailable or errors, plan-review and second-opinion skills fall back to the Claude subagent. That fallback works, but loses cross-model independence — the subagent is the same model family as the primary review. This adds a middle step: try the `llm` CLI (datasette/llm) with a configured model. When `llm` is installed and the model is registered (e.g. `llm-xai` plugin for Grok), the outside voice goes through a genuinely different model family before falling back to Claude. ## Why this matters today Codex CLI currently fails at startup for any user whose `~/.codex/config.toml` has an xAI provider with `wire_api = "chat"` — which is the default per xAI's setup docs but deprecated per github.com/openai/codex#7782: Error loading config.toml: `wire_api = "chat"` is no longer supported. in `model_providers.xai.wire_api` The error fires at CLI startup BEFORE the provider is selected, so calls targeted at OpenAI also fail. Users in this state currently get the Claude subagent fallback on every outside-voice step. They lose the cross-model value the skill is designed to deliver. Hit twice in a single session 2026-05-20 on /plan-eng-review and /plan-ceo-review. Existing memory `codex-cli-broken-for-xai` documents the upstream Codex side; this PR adds the gstack-side fallback path. ## What changed Two functions in scripts/resolvers/review.ts produce the outside-voice section that gets templated into 4 skill files (office-hours, plan-ceo-review, plan-devex-review, plan-eng-review): - `generateCodexPlanReview` — used by /plan-*-review skills - `generateCodexSecondOpinion` — used by /office-hours Both now produce: Codex → llm → Claude subagent chain. The llm step is gated on: 1. `outside_voice_llm_model` config key (new, default `grok-4-fast`) 2. `command -v llm` succeeds 3. `llm models` lists the configured model If any check fails, the chain falls through cleanly to the existing Claude subagent path. No behavior change for users who don't have llm installed. The default model is `grok-4-fast` rather than `grok-4-latest` because the latter's reasoning mode can hang 2-5 minutes on big prompts (observed 2026-05-18 — the llm stream is empty during hidden reasoning, looks like a hang). `grok-4-fast` is the right shape for routine outside-voice work. ## Config key ``` gstack-config set outside_voice_llm_model grok-4-fast # default gstack-config set outside_voice_llm_model "" # disable, skip llm step gstack-config set outside_voice_llm_model gpt-4o # any model `llm models` lists ``` Empty string skips the llm step entirely (back to Codex → subagent direct). ## File-level diff - `bin/gstack-config` — new key registration in `lookup_default` case statement + CONFIG_HEADER docs section - `scripts/resolvers/review.ts` — llm fallback block inserted between Codex error handling and the "If CODEX_NOT_AVAILABLE" section in both generator functions. Codex cleanup in `generateCodexSecondOpinion` changed to defer `$CODEX_PROMPT_FILE` deletion until end of chain (the llm step reads it). - Regenerated SKILL.md outputs for the 4 affected skills via `bun run gen:skill-docs`. `.gbrain/` outputs also regenerated locally (gitignored, not committed). ## Verification - `bun run scripts/skill-check.ts` — passes, all freshness checks green - Spot-checked generated SKILL.md files: llm fallback section present, bash blocks syntactically correct, `outside_voice_llm_model` referenced consistently - Manually traced the 4-state matrix (codex/llm available × works/errors) through the generated bash to confirm fallthrough behavior ## What this doesn't do - Doesn't change `generateAdversarialStep` (the /ship + /review adversarial flow) — that's a different shape (always-on, not configurable). Could benefit from the same fallback but out of scope for this PR; happy to do as a follow-up if you want it. - Doesn't bump VERSION or add a CHANGELOG entry — leaving that to the maintainer's release workflow. - Doesn't update the `outside_voice_llm_model` to be settable via AskUserQuestion during a first-run flow. The default is sensible enough that most users won't need to touch it. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jbetala7 · 2026-05-21T05:38:09Z

One config-surface gap: this adds outside_voice_llm_model to lookup_default() and the header, but not to the list / defaults key loops in bin/gstack-config.

That means gstack-config get outside_voice_llm_model returns the new default, while gstack-config defaults and the active-values section of gstack-config list omit the key. Users then cannot discover or audit the new fallback model through the normal config surface, and the generated docs drift from the CLI output.

Please add outside_voice_llm_model to both loops and pin it with a focused config regression, similar to the explain_level default/list/defaults issue tracked by #1607/#1608.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(outside-voice): add llm CLI fallback between Codex and Claude subagent#1631

feat(outside-voice): add llm CLI fallback between Codex and Claude subagent#1631
diogolealassis wants to merge 1 commit into
garrytan:mainfrom
diogolealassis:feat/outside-voice-llm-grok-fallback

diogolealassis commented May 21, 2026

Uh oh!

jbetala7 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

diogolealassis commented May 21, 2026

What

Why this matters today

How

Configuration

Files changed

Verification

What this doesn't do

Generated by

Uh oh!

jbetala7 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants