Skip to content

v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits#1642

Open
garrytan wants to merge 28 commits into
mainfrom
garrytan/cairo-v3
Open

v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits#1642
garrytan wants to merge 28 commits into
mainfrom
garrytan/cairo-v3

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

Post-Daegu paper-cut wave. v1.42.0.0 closed 23 user-filed bugs three days ago; this wave closes 18 more (15 community PRs + 3 self-filed silent-failure issues) in the same one-PR bisect-commit pattern.

The headline change is what stops happening: /retro no longer renders a confidently-wrong narrative when the date window is wrong, /sync-gbrain --full no longer SIGTERMs at exactly 35 minutes with no resume path on big brains, and /review no longer ships finding lists where half the items are framework FPs the reviewer never grep'd to confirm.

Tier 1 — Silent-failure trio (self-filed):

Tier 2 — CLI paper-cuts (8 community PRs):

Tier 3 — gbrain hardening (4 community PRs):

Tier 4 — browse + design (4 community PRs, two rebased onto v1.42.2.0's shouldEnableChromiumSandbox helper):

Closed as superseded by v1.42.2.0 (will close after merge): #1617, #1626 (Garry's own Chromium sandbox PRs already shipped in v1.42.2.0).

Test Coverage

7 new test files + 1 schema update + golden fixture refresh + 1 existing test alignment:

Test file Tests Pins
test/regression-1624-retro-stale-base.test.ts 13 All 4 pre-check branches + ordering + disclosure-to-narrative
test/regression-1611-gbrain-sync-resume.test.ts 19 parseTimeoutEnv (10), decideResume (6), SIGTERM preservation (3)
test/regression-1539-review-self-verify.test.ts 12 Gate text + 4 named FP classes + framework-meta nudge + 4 SKILL.md consumers
test/gbrain-lib-validate-varname.test.ts 8 LC_ALL=C macOS-locale invariant + local-scope preservation
test/land-and-deploy-postfail.test.ts 12 §4a-postfail + 3 state branches + non-destructive cleanup + "never retry"
browse/test/cli-setsid-daemonize.test.ts 4 nodeSpawn import + non-Windows uses setsid + Bun.spawn excluded
browse/test/browser-manager-unit.test.ts (extended) +3 GSTACK_CHROMIUM_NO_SANDBOX opt-out on linux/darwin
test/gstack-gbrain-detect-mcp-mode.test.ts (updated) Schema regression includes gbrain_pooler_mode
test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md Regenerated post-resolver
test/learnings-injection.test.ts (updated) Aligned with PR #1619 tagged-line shape

Coverage gate: PASS. Every wave-touched test file passes in isolation (247/247 on the final sweep across 16 files, 0 fail). Cross-file pollution in bun test full-suite mode remains pre-existing and is documented in v1.42.0.0 CHANGELOG. Tests classified gate tier so CI catches regressions on retro/, review/, bin/gstack-gbrain-sync.ts.

Pre-Landing Review

Already done via /plan-eng-review (13 decisions locked: D1-D3, A1-A3, C1-C2, T1-T2, T-Codex-1/2/3/4) + codex outside-voice (4 cross-model tensions resolved interactively, 8 wording/scope fixes folded into the plan).

Implementation matches the reviewed plan: gbrain checkpoint reuse (D1), atomic .tmpl+regen squash (T-Codex-3 reversal of D2), quoted-line gate (D3), file-collision ordering (A1-A3), bounds-checked env vars (C2), detect-then-fallback resume (C1), 4-branch /retro guard (T-Codex-1), light gate + deferred ORM doc (T-Codex-2 split), intermediate verification checkpoints (T-Codex-4).

Plan Completion

Item Status
Tier 1 silent-failure trio DONE — 3 commits + 3 regression test files (44 tests)
Tier 2 CLI paper-cuts (8 PRs) DONE — all cherry-picked, attribution preserved
Tier 3 gbrain hardening (4 PRs) DONE — schema-regression key added for #1591
Tier 4 browse + design (4 PRs) DONE — #1562 rebased onto v1.42.2.0 helper
Test-gap fills (#1606, #1612, #1620) DONE — 24 new tests
Golden baseline refresh DONE — ship-SKILL.md fixtures regenerated
Defer-doc artifact (T-Codex-2 split) DONE — ~/.gstack-dev/plans/1539-framework-aware-review.md written (local-only by design)

TODOS

No TODO items completed in this PR (the wave is bug-fix consolidation, not roadmap work). The deferred /freeze enforcement chain (#1459) remains open for a future wave; the framework-aware ORM verification design doc is filed locally for the next wave to pick up.

Test plan

  • All wave-touched test files pass in isolation (16 files, 247 tests, 0 fail)
  • Pre-existing baseline failures verified on origin/main (design feedback-roundtrip — not introduced by this wave)
  • Cross-file pollution behavior matches v1.42.0.0 CHANGELOG note (pre-existing, documented)
  • CHANGELOG entry in v1.42.0.0 release-summary format with bump-level justification + Linux --no-sandbox threat-model note
  • VERSION + package.json both at 1.43.0.0

🤖 Generated with Claude Code

garrytan and others added 27 commits May 21, 2026 09:34
… a new repo

`gbrain reindex-code` only RE-EMBEDS pages that already exist; it never walks
the filesystem. On a freshly-registered source (0 pages), a --full run that
called reindex-code alone found nothing ("No code pages to reindex"), finished
in ~1s, and left the code index permanently empty while still reporting OK.

Fix: --full now runs `sync --strategy code` FIRST to create pages via the file
walk, then runs `reindex-code` to honor the documented "full walk + reindex"
contract for both fresh and populated sources.

Contributed by @jetsetterfl via #1584.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… repos with their own DATABASE_URL

The freshClassify probe ran `gbrain sources list --json` with the inherited
process env. When the probe ran from inside a repo with its own .env (an app
DATABASE_URL on a different port), Bun autoloaded the project's .env, gbrain
connected to the wrong database, and the classifier reported broken-db on
otherwise-healthy brains.

Fix: route the probe env through `buildGbrainEnv` from lib/gbrain-exec, the
same helper the sync orchestrator uses. DATABASE_URL is seeded from
~/.gbrain/config.json so the result is cwd-independent. The 60s cache can no
longer propagate a poisoned negative to clean directories.

Contributed by @jetsetterfl via #1583.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
/retro silently produced confidently-wrong output when "today" drifted (model
session-context error) or when origin/<default> was materially behind the
actual remote — git log --since returned zero or near-zero commits and the
narrative was fabricated from nothing.

Adds Step 0.5 with four ordered pre-check branches before any window analysis:

  A. No 'origin' remote → skip with "base freshness not verified" note
  B. Detached HEAD → skip with "base freshness not verified" note
  C. `git fetch origin <default>` fails (offline) → warn, proceed against
     last-known origin/<default>
  D. Fetch succeeded → compare today vs latest origin/<default> commit; if
     gap > window-days, BLOCK with explicit citation of latest-commit date.

Skip paths still proceed to Step 1, but the disclosure is carried into the
retro narrative ("offline run, window not freshness-verified") so the output
is never silently confidently-wrong.

Atomic .tmpl + gen:skill-docs regen commit (T-Codex-3 pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 static-invariant tests pinning the four ordered pre-check branches in
retro/SKILL.md.tmpl:Step 0.5:

  A. no-remote skip            — must check origin presence + set verdict
  B. detached-HEAD skip        — must gate behind prior verdict (ordering)
  C. fetch-fail warn           — must match `if !` or `||` shape, gate by verdict
  D. stale-base BLOCK          — must read latest-commit ISO date, cite remediation

Plus a disclosure-survives-to-narrative invariant: skip-path verdicts must be
named in prose so the retro output carries the cited reason rather than
silently misreporting.

Failing build if Step 0.5 is removed, branches re-ordered (no-remote no longer
wins), or the BLOCK message stops citing today/latest-commit/remediation
path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt (#1611)

The memory and code stages hardcoded a 35-min spawn timeout. On brains with
~2000+ staged files, /sync-gbrain --full reliably SIGTERM'd the child at
exactly 35 minutes with exit 143. gbrain left ~/.gbrain/import-checkpoint.json
pointing at the staging dir, but gstack-memory-ingest's SIGTERM handler
unconditionally cleaned the dir up — so the next run found a checkpoint
pointing at nothing and restaged from scratch, repeating the SIGTERM forever.

Three changes:

1. Configurable timeouts via env (bounds 60_000ms - 86_400_000ms, default
   2_100_000ms = 35min unchanged):
     GSTACK_SYNC_MEMORY_TIMEOUT_MS
     GSTACK_SYNC_CODE_TIMEOUT_MS
   Out-of-range or non-numeric values warn and fall back to the default.

2. SIGTERM in gstack-memory-ingest no longer always cleans up the staging
   dir. If gbrain has written ~/.gbrain/import-checkpoint.json pointing at
   the active staging dir, the dir is PRESERVED for next-run resume.
   Otherwise (no checkpoint pointing here, crash before gbrain ever
   touched it) it's cleaned up as before.

3. Next /sync-gbrain run detects gbrain's checkpoint via decideResume() in
   gstack-gbrain-sync.ts:
     - no checkpoint               → fresh ingest pass
     - checkpoint + staging ok     → set GSTACK_INGEST_RESUME_DIR; child
                                      reuses staging dir and skips
                                      writeStaged; gbrain import resumes
                                      from processedIndex+1
     - checkpoint + staging gone   → warn "previous checkpoint stale
                                      (staging dir gone), restaging from
                                      scratch" and proceed

Reuses gbrain's own checkpoint as the source of truth (D1 — no double-store
state). Detect-then-fallback semantics per C1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
19 tests across three surfaces:

  - resolveStageTimeoutMs (10 tests): undefined/empty → default; non-numeric,
    zero, negative, below-floor, above-ceiling → warn + default; at-floor,
    at-ceiling, valid mid-range → accepted as-is.

  - decideResume (6 tests): no checkpoint, corrupt JSON, checkpoint + staging
    ok, checkpoint + staging missing, checkpoint with no dir, checkpoint with
    empty dir.

  - SIGTERM staging preservation (3 static invariants): memory-ingest signal
    handler must check stagingDirIsCheckpointed BEFORE cleanup; preserve
    branch must come before cleanup branch (ordering); orchestrator must
    pass GSTACK_INGEST_RESUME_DIR to the grandchild on resume.

Also threads process.env.HOME through readGbrainCheckpoint and
stagingDirIsCheckpointed so tests can redirect home. os.homedir() caches
at process start and ignores later mutation, so the env override is the
only reliable test injection point.

Failing build if the timeout bounds are removed, the resume detection
short-circuits incorrectly, or the SIGTERM handler regresses to
unconditional cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…1539)

External user filed 4/8 false positives on a /review run against a Django +
DRF + PostgreSQL repo (Sprint 2.5). Every FP class was the same shape:
"resolvable in <5 minutes by viewing the actual code or running a simple
grep" — fields that don't exist on the model, dict.get()-might-be-None on a
form that returns {}-initialized cleaned_data, standard ORM save behavior
called out as data loss.

Extends the Confidence Calibration resolver (consumed by review, cso,
plan-eng-review, ship) with a Pre-emit verification gate:

  Every finding MUST quote the specific code line that motivates it
  (file:line + verbatim text). If the reviewer cannot produce the quote,
  the finding is unverified — its confidence is forced to 4-5 so the
  existing "Suppress from main report" rule fires automatically. The
  finding still goes to the appendix for calibration audit, but the user
  does not see it in the critical-pass output.

Reuses the existing suppression mechanism — no new code path. The FP
classes the gate kills are enumerated in the resolver text so reviewers
see the named patterns.

Framework-meta nudge included for Django Meta, Rails associations,
SQLAlchemy relationships, TypeORM decorators, Sequelize init, Prisma
generated client — the reviewer must quote the meta-construct that
generates the symbol, not just grep for the literal name. Deeper
framework-aware ORM verification (model introspection, migration-history-
aware checks) is deliberately deferred to a future wave per T-Codex-2.

Atomic .tmpl-equivalent (resolver) edit + gen:skill-docs regen commit
per T-Codex-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 tests pinning the gate behavior:

  - Resolver emits the gate header + #1539 reference
  - Gate requires quoting file:line + verbatim text
  - Unverified findings forced to confidence 4-5 (auto-suppress via
    existing <7-rule, no new mechanism)
  - Framework-meta nudge names Django, Rails, SQLAlchemy, TypeORM,
    Sequelize, Prisma
  - Deferred design doc reference present (1539-framework-aware-review.md)
  - Four named FP classes from #1539 enumerated:
      * field doesn't exist on model
      * dict.get() might be None
      * save() might lose fields
      * update_fields might miss X
  - All four downstream SKILL.md consumers (review, cso, plan-eng-review,
    ship) carry the gate text after gen:skill-docs
  - Existing confidence 9-10 'Show normally' + 3-4 'Suppress' rows
    unchanged (regression on existing behavior)

Failing build if the gate is removed, the suppression mechanism is
re-invented separately, the framework-meta nudge drops a framework, or
gen:skill-docs stops propagating the gate to consumers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In many macOS shells the default locale (e.g. en_US.UTF-8) makes bash
glob brackets like `[A-Z]` match lowercase letters too, so the existing
`case "$name" in [A-Z_][A-Z0-9_]*)` branch lets names like `lower-case`
through validation. The function then trips `printf -v "$varname"` and
`export "$varname"` with `not a valid identifier` errors that surface
mid-prompt, which is exactly what the validator was supposed to prevent.

Pinning `LC_ALL=C` inside the function gives ASCII-only bracket semantics
on both macOS and Linux, matching the documented `[A-Z_][A-Z0-9_]*`
contract. Declared `local` so it doesn't leak to the calling shell —
`gstack-gbrain-lib.sh` is documented as a sourced helper, so a bare
assignment would mutate the caller's locale for the rest of the process
(silently affecting downstream `sort`, `tr`, locale-aware globs in the
same shell, etc.).

The existing regression test
`test/gbrain-lib-verify.test.ts:'rejects invalid var names'`
already covers the macOS repro shape (passes `lower-case` and expects
the validator to reject + emit `invalid var name`). On Linux CI the
test silently passed because `LC_ALL=C` is the typical default; on
macOS dev boxes it fails.

Verified:
- `bun test test/gbrain-lib-verify.test.ts`: 22 pass, 0 fail (on macOS).
- `_gstack_gbrain_validate_varname lower-case; echo $?` → 2.
- `_gstack_gbrain_validate_varname FOO_BAR; echo $?` → 0.
- Caller's LC_ALL preserved across calls (confirmed via sourced bash).
After `gh pr merge` exits non-zero, the PR may already be MERGED server-side
(concurrent merge landed, or local cleanup phase failed AFTER the merge
succeeded). Calling `gh pr merge` a second time then errors with a confusing
"already merged" — and worse, the deploy workflow never runs because we
stopped on the first failure.

Adds a Post-failure PR-state check (§4a-postfail) that runs after ANY
non-zero exit from `gh pr merge`:

  - state == MERGED  → record MERGE_PATH=direct, OFFER (don't force)
                       stale-worktree cleanup on the base branch with
                       uncommitted-work guard, proceed to §4a CI watch
  - state == OPEN    → check autoMergeRequest; if non-null treat as
                       merge-queue wait; if null surface both errors and STOP
  - state == CLOSED  → STOP

Hard invariant: never retry `gh pr merge` after a non-zero exit. Server
state is authoritative.

Re-authored from PR #1620 into land-and-deploy/SKILL.md.tmpl (the source of
truth) instead of the generated SKILL.md, so the next gen:skill-docs run
preserves the change. Original diff by @davidfoy via #1620.

Related: cli/cli#3442, cli/cli#13380.

Contributed by @davidfoy via #1620.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…true (#1435)

When gbrain connects through a PgBouncer transaction-mode pooler (port
6543), it auto-disables prepared statements. This breaks `gbrain search`
silently — the /sync-gbrain capability check fails and the GBrain Search
Guidance block never gets written to CLAUDE.md.

Three-layer fix:

1. **lib/gbrain-exec.ts** — `buildGbrainEnv()` now detects port 6543 in
   the effective DATABASE_URL and sets `GBRAIN_PREPARE=true` in the env
   passed to every gbrain spawn. This is the single chokepoint — all
   gstack gbrain invocations inherit the fix. Caller can opt out with
   `GBRAIN_PREPARE=false`.

2. **sync-gbrain/SKILL.md{,.tmpl}** — capability check now exports
   `GBRAIN_PREPARE=true` explicitly and retries search up to 3x with 1s
   delay for async index propagation under connection pooling.

3. **bin/gstack-gbrain-detect** — surfaces `gbrain_pooler_mode` field
   ("transaction" | "session" | null) in the preamble probe JSON so
   /setup-gbrain and /sync-gbrain can advise users about pooler state.

Closes #1435

Built with [ClosedLoop.AI](https://closedloop.ai) | [GitHub](https://github.com/closedloop-ai/claude-plugins)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… new projects

- Single-object pooler API responses default to transaction-mode at 6543,
  but the shared pooler tenant on new projects only listens on session/5432
- Add a `pool_mode == transaction && db_port == 6543` rewrite + stderr note
- Escape hatch via `GSTACK_SUPABASE_TRUST_API_PORT=1` for forward-compat
- 5 new tests covering rewrite, no-op shapes, env opt-out, array path

Fixes #1301.
…1562)

Ubuntu/AppArmor configurations often block unprivileged Chromium sandboxing
for headless agent sessions even for normal users — /qa hangs without
--no-sandbox. The kernel policy denies the unprivileged user namespaces
Chromium needs.

Adds GSTACK_CHROMIUM_NO_SANDBOX=1 as an explicit user override that forces
the sandbox off without changing the default for everyone else. Re-authored
from PR #1562 onto v1.42.2.0's shouldEnableChromiumSandbox() helper —
purely additive, preserves the headed-launch sandbox-on-by-default behavior
that v1.42.2.0 shipped to kill the --no-sandbox yellow infobar.

Three new regression tests cover:
  - linux + override=1 → false (the named use case)
  - darwin + override=1 → false (env wins on any platform)
  - override=0 → does NOT trigger (must be exactly "1")

Original diff by @techcenter68 via #1562.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When BROWSE_EXTENSIONS_DIR is set alongside GSTACK_CHROMIUM_PATH pointing
at a baked-extension build (GBrowser / GStack Browser), the headless launch()
path was unconditionally adding --disable-extensions-except / --load-extension.
This causes the same ServiceWorkerState::SetWorkerId DCHECK crash that
launchHeaded() already guards against via isCustomChromium().

Mirror the existing guard: skip --load-extension flags when isCustomChromium()
returns true; always push the off-screen window geometry args.
`Bun.spawn().unref()` only releases the child from Bun's event loop —
it does NOT call setsid(). The spawned bun server inherits the spawning
shell's process session. When the CLI runs inside a session-managed shell
that exits shortly after the CLI returns (Claude Code's per-command Bash
sandbox, Conductor, OpenClaw, CI step runners), the session leader's exit
sends SIGHUP to every PID in the session — killing the bun server and
its Chromium grandchildren within seconds of a successful `connect`.

Setting `BROWSE_PARENT_PID=0` (already done by the `connect` command and
pair-agent) disables the parent-process watchdog but does NOT save the
server here: SIGHUP from session teardown still reaps it.

Replace the macOS/Linux `Bun.spawn().unref()` with Node's
`child_process.spawn({ detached: true })`, which calls setsid() and
gives the server its own session leader role (PPID=1, STAT=Ss). This
mirrors the Windows path's rationale (PR #191 by @fqueiro) — same root
cause, different OS surface.

Verified on macOS in Conductor: pre-fix the server dies ~10–15s after
connect across separate Bash invocations; post-fix the same PID stays
alive (PPID=1, SESS=0, STAT=Ss) and responds to `status`/`goto`/
`snapshot` across many separate shell calls.

The `proc?.stderr` startup-error branch is removed since both platforms
now spawn with `stdio: 'ignore'`; both fall through to the on-disk
`browse-startup-error.log` written by `server.ts`'s start().catch.
The design binary calls /v1/responses (gpt-4o + image_generation tool,
quality:high, 1536x1024) but aborted the request after a hardcoded 120s.
That class of request consistently takes ~140-160s end-to-end, so every
generate/variants/evolve/iterate call aborted before the image returned.

In /design-shotgun this cascades: Step 3c launches N parallel agents,
each calling `$D generate`, each aborts at 120s and retries, all fail,
the comparison board never opens — the skill appears to hang indefinitely.

Reproduced the exact API call with a longer budget: HTTP 200, valid
image, 143.5s. A real /design-shotgun run after the patch generated 3
variants in parallel at 150.0s / 161.0s / 152.1s, all exit 0 — note the
161s case, which a naive 150s bump would still have failed.

- Bump AbortController timeout 120_000 -> 240_000 in generate.ts,
  variants.ts, evolve.ts, iterate.ts (both call sites)
- Pin the image_generation tool to model "gpt-image-2"

design/test/variants-retry-after.test.ts: 5 pass, 0 fail. The
feedback-roundtrip.test.ts failures are a pre-existing browse-module
breakage (session.clearLoadedHtml undefined), unrelated to this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three cherry-picked PRs in this wave landed without unit-test coverage for
the specific invariant they protect:

  #1606 (@andrey-esipov) — LC_ALL=C pin in _gstack_gbrain_validate_varname
    8 tests by sourcing bin/gstack-gbrain-lib.sh and calling the validator
    directly. Asserts uppercase/digit/underscore accepted, lowercase
    REJECTED (the macOS-locale regression case), mixed-case rejected,
    LC_ALL=C scoping is local (doesn't leak to caller).

  #1612 (@bharat2913) — setsid daemonize via Node child_process.spawn
    4 static-invariant tests on browse/src/cli.ts. The actual setsid
    syscall is hard to assert without a real spawn, so we pin the source
    shape: nodeSpawn imported from child_process; non-Windows branch uses
    nodeSpawn(...) with detached:true and .unref(); comment documents
    setsid/SIGHUP root cause; Bun.spawn() is NOT used on macOS/Linux.

  #1620 (@davidfoy, re-authored into .tmpl per A3) — §4a-postfail
    12 static invariants on land-and-deploy/SKILL.md.tmpl + generated
    SKILL.md. Pins all three state branches (MERGED/OPEN/CLOSED), the
    authoritative state query, the merge-SHA capture, non-destructive
    worktree cleanup with uncommitted-work guard, autoMergeRequest probe
    on OPEN, hard "never retry gh pr merge" rule, and atomic regen
    propagation.

Failing build if any of the three invariants regresses.

Note: gbrain-lib-validate-varname.test.ts also surfaces a pre-existing
glob-pattern overpermissiveness (hyphens + dots accepted) — not in
#1606's scope; documented inline as a separate cleanup target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d-line shape

PR #1619 (preserve current entries in cross-project search) refactored
gstack-learnings-search to tag rows inline (`current\t<json>` vs
`cross\t<json>`) instead of filtering inside the bun block via
process.env.GSTACK_SEARCH_SLUG. The bun block no longer reads SLUG or
CROSS env vars — it parses the per-line tag and sets a per-entry
_crossProject flag.

The pre-existing test/learnings-injection.test.ts still asserted on the
old SLUG + CROSS env var shape. Updates:

  - Remove the SLUG env var assertion (no longer set on bash command line)
  - Remove the bun-block CROSS env var assertion (block reads the tag now,
    not the env)
  - Add a new positive assertion that the bun block parses the tag
    (sourceTag | tabIndex | crossProject)
  - Keep the shell-interpolation safety assertion unchanged — that's
    independent of the SLUG refactor

The CROSS env var is still SET on the bash command line (it controls
whether the cross-project find runs at all), but the bun child no longer
reads it. The existing "env vars set on bash command line" test continues
to pin that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ship/SKILL.md consumes the Confidence Calibration resolver via the
preamble pipeline. This wave's #1539 pre-emit verification gate extends
the resolver text, which propagated to ship/SKILL.md via gen:skill-docs.
The golden fixtures in test/fixtures/golden/ matched the pre-#1539 shape
and failed the host-config regression check.

Refreshes claude-ship-SKILL.md, codex-ship-SKILL.md, and factory-ship-SKILL.md
to match the current generated output. Matches the Daegu wave's bisect
commit 23 ("test(fixtures): regenerate ship-SKILL.md golden baselines").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(PR #1591)

PR #1591 (PgBouncer transaction-mode detection, @mikeangstadt) added
gbrain_pooler_mode to the gstack-gbrain-detect JSON output but did not
update the schema regression check in
test/gstack-gbrain-detect-mcp-mode.test.ts. Adding the key in alphabetical
order matching the rest of the schema array. Downstream sync-gbrain ignores
unknown keys, so this is forward-compat.

Without this, the test fails with a diff:
  + "gbrain_pooler_mode"
because keys is the actual set returned and the expected array was
pre-#1591.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps VERSION 1.42.2.0 → 1.43.0.0 (MINOR per scale-aware bump rules: new
env-var surface GSTACK_SYNC_*_TIMEOUT_MS + GSTACK_CHROMIUM_NO_SANDBOX,
behavior expansion in browse/src/browser-manager.ts headless launch,
three skill-template prompt changes affecting /retro, /review,
/sync-gbrain).

CHANGELOG entry leads with what stopped happening: /retro stops
fabricating retros against stale bases, /sync-gbrain stops SIGTERM-looping
35-min restarts on big brains, /review stops shipping framework FPs the
reviewer never grep'd.

18 fixes total — 15 community PRs + 3 self-filed silent-failure issues
(#1624, #1611, #1539) — in one bundled PR with 26 bisect commits and 7
new regression test files. Every wave-touched test file passes in
isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

E2E Evals: ✅ PASS

31/31 tests passed | $5.55 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 2/2 $0.13
e2e-deploy 5/5 $1.16
e2e-plan 2/2 $0.71
e2e-qa-workflow 2/2 $0.58
e2e-review 6/6 $1.46
e2e-workflow 2/2 $0.21
llm-judge 7/7 $0.14
e2e-deploy 5/5 $1.16

12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574
(garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0.
Next available MINOR slot is v1.43.2.0.

Bump VERSION + package.json + CHANGELOG entry header. No behavior
changes — purely re-versioning to clear the queue collision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v1.43.0.0 fix wave: post-Daegu paper-cut — 18 fixes, 27 bisect commits v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment