fix: close redundant PRs + friendly error on all design commands (v0.15.8.1) by garrytan · Pull Request #817 · garrytan/gstack

garrytan · 2026-04-05T08:54:22Z

Summary

Community PR triage + error handling polish.

PR Triage (GitHub ops):

Merged PR fix(discover): parse Codex sessions with large session_meta #798 (Codex session discovery, 128KB buffer fix)
Merged PR fix: user-friendly error when OpenAI org is not verified #776 (friendly OpenAI org error for generate.ts)
Closed 12 redundant PRs: 6 Gonzih security PRs (shipped in v0.15.7.0), 4 stedfn duplicates, itstimwhite fix: untrack binary, Docker chmod, os.tmpdir (#779, #709, #708) #799, stedfn fix: resolve symlinks in validateOutputPath (#665) #712/fix: check IPv6 AAAA records in DNS rebinding protection (#668) #713
Kept Gonzih fix: path traversal in design serve /api/reload — arbitrary file read #752 open (symlink gap in design serve not fully covered by fix(design): bind server to localhost and validate reload paths #803)

Code Changes:

Friendly OpenAI org error expanded to evolve.ts, iterate.ts, variants.ts, check.ts. Previously only generate.ts showed a user-friendly message when the org wasn't verified.
>128KB regression test for Codex session discovery. Documents the buffer limitation so future Codex versions surface cleanly.

Test Coverage

All new code paths have test coverage (100%). Tests: 19 pass, 0 fail.

Pre-Landing Review

No issues found. Mechanical copy-paste of existing error handling pattern.

Plan Completion

All 8 plan items DONE. 2 intentionally deferred (Issue #778 browse auth, Issue #766 sub-agent context).

Reviews

Review	Status
CEO Review	CLEAR (supply chain trust audit)
Eng Review	CLEAR (3 issues found, 3 resolved)
Codex Review	CLEAR (7 findings, 5 tension points resolved)

Test plan

All bun tests pass (19 tests, 884 assertions)
Regression test for >128KB session_meta
Friendly error pattern verified across all 5 design files

🤖 Generated with Claude Code

Previously only generate.ts showed a user-friendly message when the OpenAI org wasn't verified. Now evolve, iterate, variants, and check all detect the 403 + "organization must be verified" pattern and show a clear message with the correct verification URL.

Documents the current 128KB buffer limitation. When Codex embeds session_meta beyond 128KB, this test will fail, signaling the need for a streaming parse or larger buffer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-05T08:56:30Z

E2E Evals: ✅ PASS

0/0 tests passed | $0 total cost | 12 parallel runners

Suite	Result	Status	Cost

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

* fix: self-healing skill prefix consistency in setup (#805) * fix: self-healing gstack-relink after setup to prevent skill prefix drift Setup now runs gstack-relink as a final consistency check after linking Claude skills. This independently reads skill_prefix from config and ensures name: fields and directory names match, catching cases where interrupted setups or stale state left skills incorrectly prefixed. * chore: bump version and changelog (v0.15.6.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: anti-skip rule for all review skills (v0.15.6.1) (#804) * feat: anti-skip rule for all review skills Review skills sometimes skip sections when reviewing strategy or spec plans. This adds an explicit anti-skip rule to CEO (1-11), eng (1-4), design (1-7), and DX (1-8) review skills. Also fixes CEO header from "10 sections" to "11 sections" to match actual count. * chore: bump version and changelog (v0.15.6.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: security wave 1 — 14 fixes for audit #783 (v0.15.7.0) (#810) * fix: DNS rebinding protection checks AAAA (IPv6) records too Cherry-pick PR #744 by @Gonzih. Closes the IPv6-only DNS rebinding gap by checking both A and AAAA records independently. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validateOutputPath symlink bypass — resolve real path before safe-dir check Cherry-pick PR #745 by @Gonzih. Adds a second pass using fs.realpathSync() to resolve symlinks after lexical path validation. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validate saved URLs before navigation in restoreState Cherry-pick PR #751 by @Gonzih. Prevents navigation to cloud metadata endpoints or file:// URIs embedded in user-writable state files. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: telemetry-ingest uses anon key instead of service role key Cherry-pick PR #750 by @Gonzih. The service role key bypasses RLS and grants unrestricted database access — anon key + RLS is the right model for a public telemetry endpoint. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: killAgent() actually kills the sidebar claude subprocess Cherry-pick PR #743 by @Gonzih. Implements cross-process kill signaling via kill-file + polling pattern, tracks active processes per-tab. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(design): bind server to localhost and validate reload paths Cherry-pick PR #803 by @garagon. Adds hostname: '127.0.0.1' to Bun.serve() and validates /api/reload paths are within cwd() or tmpdir(). Closes C1+C2 from security audit #783. Co-Authored-By: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add auth gate to /inspector/events SSE endpoint (C3) The /inspector/events endpoint had no authentication, unlike /activity/stream which validates tokens. Now requires the same Bearer header or ?token= query param check. Closes C3 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sanitize design feedback with trust boundary markers (C4+H5) Wrap user feedback in <user-feedback> XML markers with tag escaping to prevent prompt injection via malicious feedback text. Cap accumulated feedback to last 5 iterations to limit incremental poisoning. Closes C4 and H5 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden file/directory permissions to owner-only (C5+H9+M9+M10) Add mode 0o700 to all mkdirSync calls for state/session directories. Add mode 0o600 to all writeFileSync calls for session.json, chat.jsonl, and log files. Add umask 077 to setup script. Prevents auth tokens, chat history, and browser logs from being world-readable on multi-user systems. Closes C5, H9, M9, M10 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: TOCTOU race in setup symlink creation (C6) Remove the existence check before mkdir -p (it's idempotent) and validate the target isn't already a symlink before creating the link. Prevents a local attacker from racing between the check and mkdir to redirect SKILL.md writes. Closes C6 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove CORS wildcard, restrict to localhost (H1) Replace Access-Control-Allow-Origin: * with http://127.0.0.1 on sidebar tab/chat endpoints. The Chrome extension uses manifest host_permissions to bypass CORS entirely, so this only blocks malicious websites from making cross-origin requests. Closes H1 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: make cookie picker auth mandatory (H2) Remove the conditional if(authToken) guard that skipped auth when authToken was undefined. Now all cookie picker data/action routes reject unauthenticated requests. Closes H2 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gate /health token on chrome-extension Origin header Only return the auth token in /health response when the request Origin starts with chrome-extension://. The Chrome extension always sends this origin via manifest host_permissions. Regular HTTP requests (including tunneled ones from ngrok/SSH) won't get the token. The extension also has a fallback path through background.js that reads the token from the state file directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: update server-auth test for chrome-extension Origin gating The test previously checked for 'localhost-only' comment. Now checks for 'chrome-extension://' since the token is gated on Origin header. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.7.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Gonzih <gonzih@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: garagon <garagon@users.noreply.github.com> * feat: adaptive gating + cross-review dedup for review army (v0.15.2.0) (#760) * feat: add test_stub optional field to specialist finding schema All specialist prompts now document test_stub as an optional output field, enabling specialists to suggest test code alongside findings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: adaptive gating + test framework detection for review army Adds gstack-specialist-stats binary for tracking specialist hit rates. Resolver now detects test framework for test_stub generation, applies adaptive gating to skip silent specialists, and compiles per-specialist stats for the review-log entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cross-review finding dedup + test stub override + enriched review-log Step 5.0 suppresses findings previously skipped by the user when the relevant code hasn't changed. Test stub findings force ASK classification so users approve test creation. Review-log now includes quality_score, per-specialist stats, and per-finding action records. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.2.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: bash operator precedence in test framework detection [ -f a ] || [ -f b ] && X="y" evaluates as A || (B && C), so the assignment only runs when the second test passes. Wrap the OR group in braces: { [ -f a ] || [ -f b ]; } && X="y". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: user-friendly error when OpenAI org is not verified (#776) Merged via PR triage plan. Friendly error for unverified OpenAI org. Follow-up: expand to evolve.ts, iterate.ts, variants.ts, check.ts. * fix(discover): parse Codex sessions with large session_meta (>4KB) (#798) Merged via PR triage plan. Fixes Codex session discovery for v0.117+ with 15KB+ session_meta. Follow-up: add >128KB regression test. * fix: close redundant PRs + friendly error on all design commands (v0.15.8.1) (#817) * fix: friendly OpenAI org error on all design commands Previously only generate.ts showed a user-friendly message when the OpenAI org wasn't verified. Now evolve, iterate, variants, and check all detect the 403 + "organization must be verified" pattern and show a clear message with the correct verification URL. * test: regression test for >128KB Codex session_meta Documents the current 128KB buffer limitation. When Codex embeds session_meta beyond 128KB, this test will fail, signaling the need for a streaming parse or larger buffer. * chore: bump version and changelog (v0.15.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: OpenClaw integration v2 — prompt is the bridge (v0.15.9.0) (#816) * feat: add includeSkills to HostConfig + update OpenClaw config Add includeSkills allowlist field with union logic (include minus skip). Update OpenClaw to generate only 4 native methodology skills (office-hours, plan-ceo-review, investigate, retro). Remove staticFiles.SOUL.md reference (pointed to non-existent file). * feat: OpenClaw integration — gstack-lite/full generation + spawned session detection Add includeSkills filter to gen-skill-docs pipeline. Generate gstack-lite (planning discipline for spawned coding sessions) and gstack-full (complete feature pipeline) for OpenClaw host. Add OPENCLAW_SESSION env var detection in preamble for spawned session auto-detect. Update setup --host openclaw to print redirect message. * docs: OpenClaw architecture doc + regenerate all SKILL.md with spawned session detection Add docs/OPENCLAW.md with 4-tier dispatch routing and integration architecture. Generate gstack-lite and gstack-full prompt templates. Regenerate all SKILL.md files with OPENCLAW_SESSION env var check in preamble. * test: update golden baselines + OpenClaw includeSkills tests Update golden SKILL.md baselines for preamble SPAWNED_SESSION change. Replace staticFiles SOUL.md test with includeSkills validation. * chore: bump version and changelog (v0.15.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove all Wintermute references from source files Replace with generic "orchestrator" or "OpenClaw" as appropriate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Plan dispatch tier — full review gauntlet for Claude Code project planning New gstack-plan template chains /office-hours → /autoplan (CEO + eng + design + DX + codex adversarial), saves the reviewed plan, and reports back to the orchestrator. The orchestrator persists the plan link to its own memory store. 5 tiers now: Simple, Medium, Heavy, Full, Plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite README OpenClaw install — one paste, real instructions (#818) OpenClaw section promoted to peer of Claude Code install. Single copy-paste prompt that installs gstack for Claude Code AND wires up AGENTS.md dispatch. Usage table shows what happens when you talk naturally. Other agents collapsed from repeated git clones into one auto-detect command + table. Voice input moved after 10-15 parallel sprints section. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: native OpenClaw skills + ClaHub publishing (v0.15.10.0) (#832) * feat: add 4 native OpenClaw skills for ClaHub publishing Hand-crafted methodology skills for the OpenClaw wintermute workspace: - gstack-openclaw-office-hours (375 lines) — 6 forcing questions, startup + builder modes - gstack-openclaw-ceo-review (193 lines) — 4 scope modes, 18 cognitive patterns - gstack-openclaw-investigate (136 lines) — Iron Law, 4-phase debugging - gstack-openclaw-retro (301 lines) — git analytics, per-person praise/growth Pure methodology, no gstack infrastructure. All frontmatter uses single-line inline JSON for OpenClaw parser compatibility. * feat: add AGENTS.md dispatch section with behavioral rules Ready-to-paste section for OpenClaw AGENTS.md with 3 iron-clad rules: 1. Always spawn sessions, never redirect user to Claude Code 2. Resolve repo path or ask, don't punt 3. Autoplan runs end-to-end, reports back in chat Includes full dispatch routing (Simple/Medium/Heavy/Full/Plan tiers). * chore: clear OpenClaw includeSkills — native skills replace generated Native ClaHub skills replace the gen-skill-docs pipeline output for these 4 skills. Updated test to validate empty includeSkills array. * docs: ClaHub install instructions + dispatch routing rules - README: add Native OpenClaw Skills section with clawhub install command - OPENCLAW.md: update dispatch routing with behavioral rules, update native skills section to reference ClaHub * chore: bump version and changelog (v0.15.10.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add gstack-upgrade to OpenClaw dispatch routing Ensures "upgrade gstack" routes to a Claude Code session with /gstack-upgrade instead of Wintermute trying to handle it conversationally. * fix: stop tracking 58MB compiled binary bin/gstack-global-discover Already in .gitignore but was tracked due to historical mistake. Same issue as browse/dist/ and design/dist/. The .ts source is right next to it and ./setup builds from source for every platform. * test: detect compiled binaries and large files tracked by git Two new tests in skill-validation: - No Mach-O or ELF binaries tracked (catches accidental git add of compiled output) - No files over 2MB tracked (catches bloated binaries sneaking in) Both print the exact git rm --cached command to fix the issue. * fix: ClaHub → ClawHub (correct spelling) * docs: add ClawHub publishing instructions to CLAUDE.md Documents the clawhub publish command (not clawhub skill publish), auth flow, version bumping, and verification. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: ship re-run executes all verification checks (v0.15.10.0) (#833) * feat: review army idempotency + cross-review dedup resolver Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: ship re-run executes all checks, adds review army + dedup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: regression guards for ship specialist dispatch + dedup + idempotency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.10.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: snapshot -i auto-detects dropdown/popover interactive elements (#844) - Auto-enable cursor-interactive scan (-C) when -i flag is used - Add floating container detection (portals, popovers, dropdowns) - Detects position:fixed/absolute with high z-index - Recognizes data-floating-ui-portal, data-radix-* attributes - Recognizes role=listbox, role=menu containers - Elements inside floating containers bypass the hasRole skip - Catches dropdown items missed by the accessibility tree - Role=option/menuitem elements in floating containers captured even without cursor:pointer/onclick - Tag floating container items with 'popover-child' reason - Include role name in @c ref reasons when present - Add dropdown.html test fixture - Add dropdown/popover detection test suite (6 tests) - Add test: -i alone includes cursor-interactive elements Fixes: Bookface autocomplete, Radix UI combobox, React portals, and similar dynamic dropdown patterns where ariaSnapshot() misses the floating content. Co-authored-by: root <root@localhost> * Revert "fix: snapshot -i auto-detects dropdown/popover interactive elements (#844)" This reverts commit 542e7836d09c8bae7e0266b39750938a54298f59. * fix: snapshot -i auto-detects dropdown/popover interactive elements (#845) * fix: snapshot -i auto-detects dropdown/popover interactive elements - Auto-enable cursor-interactive scan (-C) when -i flag is used - Add floating container detection (portals, popovers, dropdowns) - Detects position:fixed/absolute with high z-index - Recognizes data-floating-ui-portal, data-radix-* attributes - Recognizes role=listbox, role=menu containers - Elements inside floating containers bypass the hasRole skip - Catches dropdown items missed by the accessibility tree - Role=option/menuitem elements in floating containers captured even without cursor:pointer/onclick - Tag floating container items with 'popover-child' reason - Include role name in @c ref reasons when present - Add dropdown.html test fixture - Add dropdown/popover detection test suite (6 tests) - Add test: -i alone includes cursor-interactive elements Fixes: Bookface autocomplete, Radix UI combobox, React portals, and similar dynamic dropdown patterns where ariaSnapshot() misses the floating content. * chore: bump version and changelog (v0.15.12.0) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: update snapshot -i/-C flag descriptions to mention auto-enable behavior * test: strengthen clickability test guard assertions The @c ref clickability test previously used if-guards that would silently pass when no Alice line was found in the snapshot output. Both Claude and Codex adversarial review flagged this as a test that could regress without CI noticing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: regenerate top-level SKILL.md with updated flag descriptions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: root <root@localhost> Co-authored-by: gstack <ship@gstack.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: team-friendly gstack install mode (v0.15.7.0) (#809) * feat: add gstack-settings-hook for atomic Claude Code hook management DRY helper for adding/removing SessionStart hooks in ~/.claude/settings.json. Handles missing files, deduplication, malformed JSON, and atomic writes (.tmp + rename) to prevent corruption on crash or disk-full. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add gstack-session-update for automatic team updates SessionStart hook target that auto-updates gstack at session start. Background fork (zero latency), throttled to once/hour, with lockfile (mkdir + PID), stale lock recovery, GIT_TERMINAL_PROMPT=0, and debug logging to ~/.gstack/analytics/session-update.log. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add --team, --no-team, -q flags to setup --team enables auto_upgrade and registers SessionStart hook via gstack-settings-hook. --no-team reverses it. -q/--quiet suppresses all informational output (for hook-triggered setup runs). --local now prints a deprecation warning. Replaces ~20 echo calls with log() helper for quiet mode support. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add gstack-team-init for repo-level team bootstrapping Two modes: 'optional' (gentle CLAUDE.md suggestion) and 'required' (CLAUDE.md enforcement + .claude/hooks/check-gstack.sh PreToolUse hook that blocks work without gstack installed). Atomic JSON writes, idempotent, prints git add instructions. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: deprecate vendoring, document team mode, clean up uninstall - README: replace "Step 2: Add to your repo" vendoring instructions with team mode (./setup --team + gstack-team-init) - CLAUDE.md: rename "Vendored symlink awareness" to "Dev symlink awareness", add deprecation note - CONTRIBUTING.md: remove vendoring language from prefix section - bin/gstack-uninstall: clean up SessionStart hook on uninstall Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add vendoring deprecation detection to skill preamble Detects vendored gstack in CWD (.claude/skills/gstack/ that's not a symlink and has VERSION or .git). Outputs VENDORED_GSTACK: yes/no. Adds generateVendoringDeprecation() section that offers one-time migration to team mode via AskUserQuestion. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files with vendoring deprecation preamble Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: team mode (v0.15.7.0) — credit Jared Friedman Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add integration tests for team mode (20 tests) Covers gstack-settings-hook (add, remove, dedup, preserve existing, atomic write), gstack-session-update (guards, throttle, non-fatal), gstack-team-init (optional, required, enforcement hook, idempotent), and setup flags (-q, --local deprecation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gstack-team-init detects and removes vendored copies (#848) * fix: gstack-team-init detects and removes vendored copies in team mode When running gstack-team-init inside a repo with a vendored .claude/skills/gstack/, the script now auto-detects and removes it: git rm --cached, add to .gitignore, rm -rf. Also adds team_mode config key to setup --team/--no-team, and makes gstack-upgrade Step 4.5 team-mode aware (remove instead of sync). Includes 5 new integration tests for the vendored copy migration. * chore: bump version and changelog (v0.15.14.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847) * fix(bin): pass search params via env vars (RCE fix) (#819) Replace shell string interpolation with process.env in gstack-learnings-search to prevent arbitrary code execution via crafted learnings entries. Also fixes the CROSS_PROJECT interpolation that the original PR missed. Adds 3 regression tests verifying no shell interpolation remains in the bun -e block. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): add path validation to upload command (#821) Add isPathWithin() and path traversal checks to the upload command, blocking file exfiltration via crafted upload paths. Uses existing SAFE_DIRECTORIES constant instead of a local copy. Adds 3 regression tests. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): symlink resolution in meta-commands validateOutputPath (#820) Add realpathSync to validateOutputPath in meta-commands.ts to catch symlink-based directory escapes in screenshot, pdf, and responsive commands. Resolves SAFE_DIRECTORIES through realpathSync to handle macOS /tmp -> /private/tmp symlinks. Existing path validation tests pass with the hardened implementation. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add uninstall instructions to README (#812) Community PR #812 by @0531Kim. Adds two uninstall paths: the gstack-uninstall script (handles everything) and manual removal steps for when the repo isn't cloned. Includes CLAUDE.md cleanup note and Playwright cache guidance. Co-Authored-By: 0531Kim <0531Kim@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): Windows launcher extraEnv + headed-mode token (#822) Community PR #822 by @pieterklue. Three fixes: 1. Windows launcher now merges extraEnv into spawned server env (was only passing BROWSE_STATE_FILE, dropping all other env vars) 2. Welcome page fallback serves inline HTML instead of about:blank redirect (avoids ERR_UNSAFE_REDIRECT on Windows) 3. /health returns auth token in headed mode even without Origin header (fixes Playwright Chromium extensions that don't send it) Also adds HOME/USERPROFILE fallback for cross-platform compatibility. Co-Authored-By: pieterklue <pieterklue@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): terminate orphan server when parent process exits (#808) Community PR #808 by @mmporong. Passes BROWSE_PARENT_PID to the spawned server process. The server polls every 15s with signal 0 and calls shutdown() if the parent is gone. Prevents orphaned chrome-headless-shell processes when Claude Code sessions exit abnormally. Co-Authored-By: mmporong <mmporong@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): IPv6 ULA blocking, cookie redaction, per-tab cancel, targeted token (#664) Community PR #664 by @mr-k-man (security audit round 1, new parts only). - IPv6 ULA prefix blocking (fc00::/7) in url-validation.ts with false-positive guard for hostnames like fd.example.com - Cookie value redaction for tokens, API keys, JWTs in browse cookies command - Per-tab cancel files in killAgent() replacing broken global kill-signal - design/serve.ts: realpathSync upgrade prevents symlink bypass in /api/reload - extension: targeted getToken handler replaces token-in-health-broadcast - Supabase migration 003: column-level GRANT restricts anon UPDATE scope - Telemetry sync: upsert error logging - 10 new tests for IPv6, cookie redaction, DNS rebinding, path traversal Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): CSS injection guard, timeout clamping, session validation, tests (#806) Community PR #806 by @mr-k-man (security audit round 2, new parts only). - CSS value validation (DANGEROUS_CSS) in cdp-inspector, write-commands, extension inspector - Queue file permissions (0o700/0o600) in cli, server, sidebar-agent - escapeRegExp for frame --url ReDoS fix - Responsive screenshot path validation with validateOutputPath - State load cookie filtering (reject localhost/.internal/metadata cookies) - Session ID format validation in loadSession - /health endpoint: remove currentUrl and currentMessage fields - QueueEntry interface + isValidQueueEntry validator for sidebar-agent - SIGTERM->SIGKILL escalation in timeout handler - Viewport dimension clamping (1-16384), wait timeout clamping (1s-300s) - Cookie domain validation in cookie-import and cookie-import-browser - DocumentFragment-based tab switching (XSS fix in sidepanel) - pollInProgress reentrancy guard for pollChat - toggleClass/injectCSS input validation in extension inspector - Snapshot annotated path validation with realpathSync - 714-line security-audit-r2.test.ts + 33-line learnings-injection.test.ts Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.13.0) Community security wave: 8 PRs from 4 contributors (@garagon, @mr-k-man, @mmporong, @0531Kim, @pieterklue). IPv6 ULA blocking, cookie redaction, per-tab cancel signaling, CSS injection guards, timeout clamping, session validation, DocumentFragment XSS fix, parent process watchdog, uninstall docs, Windows fixes, and 750+ lines of security regression tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garagon <garagon@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: 0531Kim <0531Kim@users.noreply.github.com> Co-authored-by: pieterklue <pieterklue@users.noreply.github.com> Co-authored-by: mmporong <mmporong@users.noreply.github.com> Co-authored-by: mr-k-man <mr-k-man@users.noreply.github.com> * feat: content security — 4-layer prompt injection defense for pair-agent (#815) * feat: token registry for multi-agent browser access Per-agent scoped tokens with read/write/admin/meta command categories, domain glob restrictions, rate limiting, expiry, and revocation. Setup key exchange for the /pair-agent ceremony (5-min one-time key → 24h session token). Idempotent exchange handles tunnel drops. 39 tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate token registry + scoped auth into browse server Server changes for multi-agent browser access: - /connect endpoint: setup key exchange for /pair-agent ceremony - /token endpoint: root-only minting of scoped sub-tokens - /token/:clientId DELETE: revoke agent tokens - /agents endpoint: list connected agents (root-only) - /health: strips root token when tunnel is active (P0 security fix) - /command: scope/rate/domain checks via token registry before dispatch - Idle timer skips shutdown when tunnel is active Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: ngrok tunnel integration + @ngrok/ngrok dependency BROWSE_TUNNEL=1 env var starts an ngrok tunnel after Bun.serve(). Reads NGROK_AUTHTOKEN from env or ~/.gstack/ngrok.env. Reads NGROK_DOMAIN for dedicated domain (stable URL). Updates state file with tunnel URL. Feasibility spike confirmed: SDK works in compiled Bun binary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: tab isolation for multi-agent browser access Add per-tab ownership tracking to BrowserManager. Scoped agents must create their own tab via newtab before writing. Unowned tabs (pre-existing, user-opened) are root-only for writes. Read access always allowed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: tab enforcement + POST /pair endpoint + activity attribution Server-side tab ownership check blocks scoped agents from writing to unowned tabs. Special-case newtab records ownership for scoped tokens. POST /pair endpoint creates setup keys for the pairing ceremony. Activity events now include clientId for attribution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: pair-agent CLI command + instruction block generator One command to pair a remote agent: $B pair-agent. Creates a setup key via POST /pair, prints a copy-pasteable instruction block with curl commands. Smart tunnel fallback (tunnel URL > auto-start > localhost). Flags: --for HOST, --local HOST, --admin, --client NAME. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: tab isolation + instruction block generator tests 14 tests covering tab ownership lifecycle (access checks, unowned tabs, transferTab) and instruction block generator (scopes, URLs, admin flag, troubleshooting section). Fix server-auth test that used fragile sliceBetween boundaries broken by new endpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: CSO security fixes — token leak, domain bypass, input validation 1. Remove root token from /health endpoint entirely (CSO #1 CRITICAL). Origin header is spoofable. Extension reads from ~/.gstack/.auth.json. 2. Add domain check for newtab URL (CSO #5). Previously only goto was checked, allowing domain-restricted agents to bypass via newtab. 3. Validate scope values, rateLimit, expiresSeconds in createToken() (CSO #4). Rejects invalid scopes and negative values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /pair-agent skill — syntactic sugar for browser sharing Users remember /pair-agent, not $B pair-agent. The skill walks through agent selection (OpenClaw, Hermes, Codex, Cursor, generic), local vs remote setup, tunnel configuration, and includes platform-specific notes for each agent type. Wraps the CLI command with context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remote browser access reference for paired agents Full API reference, snapshot→@ref pattern, scopes, tab isolation, error codes, ngrok setup, and same-machine shortcuts. The instruction block points here for deeper reading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: improved instruction block with snapshot→@ref pattern The paste-into-agent instruction block now teaches the snapshot→@ref workflow (the most powerful browsing pattern), shows the server URL prominently, and uses clearer formatting. Tests updated to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: smart ngrok detection + auto-tunnel in pair-agent The pair-agent command now checks ngrok's native config (not just ~/.gstack/ngrok.env) and auto-starts the tunnel when ngrok is available. The skill template walks users through ngrok install and auth if not set up, instead of just printing a dead localhost URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: on-demand tunnel start via POST /tunnel/start pair-agent now auto-starts the ngrok tunnel without restarting the server. New POST /tunnel/start endpoint reads authtoken from env, ~/.gstack/ngrok.env, or ngrok's native config. CLI detects ngrok availability and calls the endpoint automatically. Zero manual steps when ngrok is installed and authed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pair-agent skill must output the instruction block verbatim Added CRITICAL instruction: the agent MUST output the full instruction block so the user can copy it. Previously the agent could summarize over it, leaving the user with nothing to paste. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: scoped tokens rejected on /command — auth gate ordering bug The blanket validateAuth() gate (root-only) sat above the /command endpoint, rejecting all scoped tokens with 401 before they reached getTokenInfo(). Moved /command above the gate so both root and scoped tokens are accepted. This was the bug Wintermute hit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: pair-agent auto-launches headed mode before pairing When pair-agent detects headless mode, it auto-switches to headed (visible Chromium window) so the user can watch what the remote agent does. Use --headless to skip this. Fixed compiled binary path resolution (process.execPath, not process.argv[1] which is virtual /$bunfs/ in Bun compiled binaries). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: comprehensive tests for auth ordering, tunnel, ngrok, headed mode 16 new tests covering: - /command sits above blanket auth gate (Wintermute bug) - /command uses getTokenInfo not validateAuth - /tunnel/start requires root, checks native ngrok config, returns already_active - /pair creates setup keys not session tokens - Tab ownership checked before command dispatch - Activity events include clientId - Instruction block teaches snapshot→@ref pattern - pair-agent auto-headed mode, process.execPath, --headless skip - isNgrokAvailable checks all 3 sources (gstack env, env var, native config) - handlePairAgent calls /tunnel/start not server restart Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: chain scope bypass + /health info leak when tunneled 1. Chain command now pre-validates ALL subcommand scopes before executing any. A read+meta token can no longer escalate to admin via chain (eval, js, cookies were dispatched without scope checks). tokenInfo flows through handleMetaCommand into the chain handler. Rejects entire chain if any subcommand fails. 2. /health strips sensitive fields (currentUrl, agent.currentMessage, session) when tunnel is active. Only operational metadata (status, mode, uptime, tabs) exposed to the internet. Previously anyone reaching the ngrok URL could surveil browsing activity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: tout /pair-agent as headline feature in CHANGELOG + README Lead with what it does for the user: type /pair-agent, paste into your other agent, done. First time AI agents from different companies can coordinate through a shared browser with real security boundaries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: expand /pair-agent, /design-shotgun, /design-html in README Each skill gets a real narrative paragraph explaining the workflow, not just a table cell. design-shotgun: visual exploration with taste memory. design-html: production HTML with Pretext computed layout. pair-agent: cross-vendor AI agent coordination through shared browser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: split handleCommand into handleCommandInternal + HTTP wrapper Chain subcommands now route through handleCommandInternal for full security enforcement (scope, domain, tab ownership, rate limiting, content wrapping). Adds recursion guard for nested chains, rate-limit exemption for chain subcommands, and activity event suppression (1 event per chain, not per sub). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add content-security.ts with datamarking, envelope, and filter hooks Four-layer prompt injection defense for pair-agent browser sharing: - Datamarking: session-scoped watermark for text exfiltration detection - Content envelope: trust boundary wrapping with ZWSP marker escaping - Content filter hooks: extensible filter pipeline with warn/block modes - Built-in URL blocklist: requestbin, pipedream, webhook.site, etc. BROWSE_CONTENT_FILTER env var controls mode: off|warn|block (default: warn) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: centralize content wrapping in handleCommandInternal response path Single wrapping location replaces fragmented per-handler wrapping: - Scoped tokens: content filters + datamarking + enhanced envelope - Root tokens: existing basic wrapping (backward compat) - Chain subcommands exempt from top-level wrapping (wrapped individually) - Adds 'attrs' to PAGE_CONTENT_COMMANDS (ARIA value exposure defense) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: hidden element stripping for scoped token text extraction Detects CSS-hidden elements (opacity, font-size, off-screen, same-color, clip-path) and ARIA label injection patterns. Marks elements with data-gstack-hidden, extracts text from a clean clone (no DOM mutation), then removes markers. Only active for scoped tokens on text command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: snapshot split output format for scoped tokens Scoped tokens get a split snapshot: trusted @refs section (for click/fill) separated from untrusted web content in an envelope. Ref names truncated to 50 chars in trusted section. Root tokens unchanged (backward compat). Resume command also uses split format for scoped tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add SECURITY section to pair-agent instruction block Instructs remote agents to treat content inside untrusted envelopes as potentially malicious. Lists common injection phrases to watch for. Directs agents to only use @refs from the trusted INTERACTIVE ELEMENTS section, not from page content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add 4 prompt injection test fixtures - injection-visible.html: visible injection in product review text - injection-hidden.html: 7 CSS hiding techniques + ARIA injection + false positive - injection-social.html: social engineering in legitimate-looking content - injection-combined.html: all attack types + envelope escape attempt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: comprehensive content security tests (47 tests) Covers all 4 defense layers: - Datamarking: marker format, session consistency, text-only application - Content envelope: wrapping, ZWSP marker escaping, filter warnings - Content filter hooks: URL blocklist, custom filters, warn/block modes - Instruction block: SECURITY section content, ordering, generation - Centralized wrapping: source-level verification of integration - Chain security: recursion guard, rate-limit exemption, activity suppression - Hidden element stripping: 7 CSS techniques, ARIA injection, false positives - Snapshot split format: scoped vs root output, resume integration Also fixes: visibility:hidden detection, case-insensitive ARIA pattern matching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pair-agent skill compliance + fix all 16 pre-existing test failures Root cause: pair-agent was added without completing the gen-skill-docs compliance checklist. All 16 failures traced back to this. Fixes: - Sync package.json version to VERSION (0.15.9.0) - Add "(gstack)" to pair-agent description for discoverability - Add pair-agent to Codex path exception (legitimately documents ~/.codex/) - Add CLI_COMMANDS (status, pair-agent, tunnel) to skill parser allowlist - Regenerate SKILL.md for all hosts (claude, codex, factory, kiro, etc.) - Update golden file baselines for ship skill - Fix relink tests: pass GSTACK_INSTALL_DIR to auto-relink calls so they use the fast mock install instead of scanning real ~/.claude/skills/gstack Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.12.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: E2E exit reason precedence + worktree prune race condition Two fixes for E2E test reliability: 1. session-runner.ts: error_max_turns was misclassified as error_api because is_error flag was checked before subtype. Now known subtypes like error_max_turns are preserved even when is_error is set. The is_error override only applies when subtype=success (API failure). 2. worktree.ts: pruneStale() now skips worktrees < 1 hour old to avoid deleting worktrees from concurrent test runs still in progress. Previously any second test execution would kill the first's worktrees. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restore token in /health for localhost extension auth The CSO security fix stripped the token from /health to prevent leaking when tunneled. But the extension needs it to authenticate on localhost. Now returns token only when not tunneled (safe: localhost-only path). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: verify /health token is localhost-only, never served through tunnel Updated tests to match the restored token behavior: - Test 1: token assignment exists AND is inside the !tunnelActive guard - Test 1b: tunnel branch (else block) does not contain AUTH_TOKEN Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add security rationale for token in /health on localhost Explains why this is an accepted risk (no escalation over file-based token access), CORS protection, and tunnel guard. Prevents future CSO scans from stripping it without providing an alternative auth path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: verify tunnel is alive before returning URL to pair-agent Root cause: when ngrok dies externally (pkill, crash, timeout), the server still reports tunnelActive=true with a dead URL. pair-agent prints an instruction block pointing at a dead tunnel. The remote agent gets "endpoint offline" and the user has to manually restart everything. Three-layer fix: - Server /pair endpoint: probes tunnel URL before returning it. If dead, resets tunnelActive/tunnelUrl and returns null (triggers CLI restart). - Server /tunnel/start: probes cached tunnel before returning already_active. If dead, falls through to restart ngrok automatically. - CLI pair-agent: double-checks tunnel URL from server before printing instruction block. Falls through to auto-start on failure. 4 regression tests verify all three probe points + CLI verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add POST /batch endpoint for multi-command batching Remote agents controlling GStack Browser through a tunnel pay 2-5s of latency per HTTP round-trip. A typical "navigate and read" takes 4 sequential commands = 10-20 seconds. The /batch endpoint collapses N commands into a single HTTP round-trip, cutting a 20-tab crawl from ~60s to ~5s. Sequential execution through the full security pipeline (scope, domain, tab ownership, content wrapping). Rate limiting counts the batch as 1 request. Activity events emitted at batch level, not per-command. Max 50 commands per batch. Nested batches rejected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add source-level security tests for /batch endpoint 8 tests verifying: auth gate placement, scoped token support, max command limit, nested batch rejection, rate limiting bypass, batch-level activity events, command field validation, and tabId passthrough. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: correct CHANGELOG date from 2026-04-06 to 2026-04-05 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: consolidate Hermes into generic HTTP option in pair-agent Hermes doesn't have a host-specific config — it uses the same generic curl instructions as any other agent. Removing the dedicated option simplifies the menu and eliminates a misleading distinction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump VERSION to 0.15.14.0, add CHANGELOG entry for batch endpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate pair-agent/SKILL.md after main merge Vendoring deprecation section from main's template wasn't reflected in the generated file. Fixes check-freshness CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: checkTabAccess uses options object, add own-only tab policy Refactors checkTabAccess(tabId, clientId, isWrite) to use an options object { isWrite?, ownOnly? }. Adds tabPolicy === 'own-only' support in the server command dispatch — scoped tokens with this policy are restricted to their own tabs for all commands, not just writes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add --domain flag to pair-agent CLI for domain restrictions Allows passing --domain to pair-agent to restrict the remote agent's navigation to specific domains (comma-separated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: remove batch commands CHANGELOG entry and VERSION bump The batch endpoint work belongs on the browser-batch-multitab branch (port-louis), not this branch. Reverting VERSION to 0.15.14.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: adopt main's headed-mode /health token serving Our merge kept the old !tunnelActive guard which conflicted with main's security-audit-r2 tests that require no currentUrl/currentMessage in /health. Adopts main's approach: serve token conditionally based on headed mode or chrome-extension origin. Updates server-auth tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: improve snapshot flags docs completeness for LLM judge Adds $B placeholder explanation, explicit syntax line, and detailed flag behavior (-d depth values, -s CSS selector syntax, -D unified diff format and baseline persistence, -a screenshot vs text output relationship). Fixes snapshot flags reference LLM eval scoring completeness < 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: auto-symlink into ~/.claude/skills/ when cloned elsewhere (#865) setup would silently skip skill symlinks when run from outside a skills/ directory, leaving the install broken despite reporting success. Now it creates the symlink automatically and proceeds with the full install. Co-authored-by: Evan Solomon <evan@ycombinator.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pair-agent tunnel drops after 15s (v0.15.15.1) (#868) * fix: remove stray `domains` reference crashing connect command The connect command's status fetch had an undefined `domains` variable in the JSON body, causing "Connect failed: domains is not defined" and preventing headed mode from initializing properly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pair-agent server dies 15s after CLI exits The server monitors BROWSE_PARENT_PID and self-terminates when the parent exits. For pair-agent, the connect subprocess is the parent, so the server dies 15s after connect finishes. Disable parent-PID monitoring for pair-agent sessions so the server outlives the CLI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.15.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: newtab blocked by tab ownership check for scoped tokens The tab ownership check ran before the newtab handler, checking the active tab (owned by root) against the scoped token. Since the scoped token doesn't own the root tab, newtab returned 403. Skip the ownership check for newtab since it creates a new tab. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: regression tests for pair-agent tunnel fixes Three source-level tests covering the bugs fixed on this branch: - connect status fetch has no undefined variable references (domains) - pair-agent disables parent PID monitoring (BROWSE_PARENT_PID=0) - newtab excluded from tab ownership check for scoped tokens Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873) * plan: batch command endpoint + multi-tab parallel execution for GStack Browser * refactor: extract TabSession from BrowserManager for per-tab state Move per-tab state (refMap, lastSnapshot, frame) into a new TabSession class. BrowserManager delegates to the active TabSession via getActiveSession(). Zero behavior change — all existing tests pass. This is the foundation for the /batch endpoint: both /command and /batch will use the same handler functions with TabSession, eliminating shared state races during parallel tab execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: update handler signatures to use TabSession Change handleReadCommand and handleSnapshot to take TabSession instead of BrowserManager. Change handleWriteCommand to take both TabSession (per-tab ops) and BrowserManager (global ops like viewport, headers, dialog). handleMetaCommand keeps BrowserManager for tab management. Tests use thin wrapper functions that bridge the old 3-arg call pattern to the new signatures via bm.getActiveSession(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add POST /batch endpoint for parallel multi-tab execution Execute multiple commands across tabs in a single HTTP request. Commands targeting different tabs run concurrently via Promise.allSettled. Commands targeting the same tab run sequentially within that group. Features: - Batch-safe command subset (text, goto, click, snapshot, screenshot, etc.) - newtab/closetab as special commands within batch - SSE streaming mode (stream: true) for partial results - Per-command error isolation (one tab failing doesn't abort the batch) - Max 50 commands per batch, soft batch-level timeout A 143-page crawl drops from ~45 min (serial HTTP) to ~5 min (20 tabs in parallel, batched commands). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add batch endpoint integration tests 10 tests covering: - Multi-tab parallel execution (goto + text on different tabs) - Same-tab sequential ordering - Per-command error isolation (one tab fails, others succeed) - Page-scoped refs (snapshot refs are per-session, not global) - Per-tab lastSnapshot (snapshot -D with independent baselines) - getSession/getActiveSession API - Batch-safe command subset validation - closeTab via page.close preserves at-least-one-page invariant - Parallel goto on 3 tabs simultaneously Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden codex-review E2E — extract SKILL.md section, bump maxTurns to 25 The test was copying the full 55KB/1075-line codex SKILL.md into the fixture, requiring 8 Read calls just to consume it and exhausting the 15-turn budget before reaching the actual codex review command. Now extracts only the review-relevant section (~6KB/148 lines), reducing Read calls from 8 to 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move batch endpoint plan into BROWSER.md as feature documentation The batch endpoint is implemented — document it as an actual feature in BROWSER.md (architecture, API shape, design decisions, usage pattern) and remove the standalone plan file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.16.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: gstack <ship@gstack.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gstack-slug produces deterministic slugs across sessions (#897) When `git remote get-url origin` was piped directly into sed/tr, pipefail caused the entire pipeline to fail silently (via `|| true`), producing an empty RAW_SLUG. The basename fallback then generated a different slug, making per-project data (checkpoints, learnings, reviews) invisible. Two fixes: 1. Separate git command from pipeline so failures are handled explicitly 2. Cache computed slugs in ~/.gstack/slug-cache/ so subsequent sessions always resolve to the same slug regardless of transient git state Co-authored-by: Jared Friedman <jared@ycombinator.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: browser data platform for AI agents (v0.16.0.0) (#907) * refactor: extract path-security.ts shared module validateOutputPath, validateReadPath, and SAFE_DIRECTORIES were duplicated across write-commands.ts, meta-commands.ts, and read-commands.ts. Extract to a single shared module with re-exports for backward compatibility. Also adds validateTempPath() for the upcoming GET /file endpoint (TEMP_DIR only, not cwd, to prevent remote agents from reading project files). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: default paired agents to full access, split SCOPE_CONTROL The trust boundary for paired agents is the pairing ceremony itself, not the scope. An agent with write scope can already click anything and navigate anywhere. Gating js/cookies behind --admin was security theater. Changes: - Default pair scopes: read+write+admin+meta (was read+write) - New SCOPE_CONTROL for browser-wide destructive ops (stop, restart, disconnect, state, handoff, resume, connect) - --admin flag now grants control scope (backward compat) - New --restrict flag for limited access (e.g., --restrict read) - Updated hint text: "re-pair with --control" instead of "--admin" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add media and data commands for page content extraction media command: discovers all img/video/audio/background-image elements on the page. Returns JSON with URLs, dimensions, srcset, loading state, HLS/DASH detection. Supports --images/--videos/--audio filters and optional CSS selector scoping. data command: extracts structured data embedded in pages (JSON-LD, Open Graph, Twitter Cards, meta tags). One command returns product prices, article metadata, social share info without DOM scraping. Both are READ scope with untrusted content wrapping. Shared media-extract.ts helper for reuse by the upcoming scrape command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add download, scrape, and archive commands download: fetch any URL or @ref element to disk using browser session cookies via page.request.fetch(). Supports blob: URLs via in-page base64 conversion. --base64 flag returns inline data URI (cap 10MB). Detects HLS/DASH and rejects with yt-dlp hint. scrape: bulk media download composing media discovery + download loop. Sequential with 100ms delay, URL deduplication, configurable --limit. Writes manifest.json with per-file metadata for machine consumption. archive: saves complete page as MHTML via CDP Page.captureSnapshot. No silent fallback -- errors clearly if CDP unavailable. All three are WRITE scope (write to disk, blocked in watch mode). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add GET /file endpoint for remote agent file retrieval Remote paired agents can now retrieve downloaded files over HTTP. TEMP_DIR only (not cwd) to prevent project file exfiltration. - Bearer token auth (root or scoped with read scope) - Path validation via validateTempPath() (symlink-aware) - 200MB size cap - Extension-based MIME detection - Zero-copy streaming via Bun.file() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add scroll --times N for automated repeated scrolling Extends the scroll command with --times N flag for infinite feed scraping. Scrolls N times with configurable --wait delay (default 1000ms) between each scroll for content loading. Usage: scroll --times 10 scroll --times 5 --wait 2000 scroll --times 3 .feed-container Composable with scrape: scroll to load content, then scrape images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add network response body capture (--capture/--export/--bodies) The killer feature for social media scraping. Extends the existing network command to intercept API response bodies: network --capture [--filter graphql] # start capturing network --capture stop # stop network --export /tmp/api.jsonl # export as JSONL network --bodies # show summary Uses page.on('response') listener with URL pattern filtering. SizeCappedBuffer (50MB total, 5MB per-entry cap) evicts oldest entries when full. Binary responses stored as base64, text as-is. This lets agents tap Instagram's GraphQL API, TikTok's hydration data, and any SPA's internal API responses instead of fragile DOM scraping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add screenshot --base64 for inline image return Returns data:image/png;base64,... instead of writing to disk. Cap at 10MB. Works with all screenshot modes (element, clip, viewport). Eliminates the two-step screenshot+file-serve dance for remote agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add data platform tests and media fixture Tests for SizeCappedBuffer (eviction, export, summary), validateTempPath (TEMP_DIR only, rejects cwd), command registration (all new commands in correct scope sets), and MIME mapping source checks. Rich HTML fixture with: standard images, lazy-loaded images, srcset, video with sources + HLS, audio, CSS background-images, JSON-LD, Open Graph, Twitter Cards, and meta tags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: regenerate SKILL.md with Extraction category Add Extraction category to browse command table ordering. Regenerate SKILL.md files to include media, data, download, scrape, archive commands in the generated documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.16.0.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cookie picker auth token leak (v0.15.17.0) (#904) * fix: cookie picker auth token leak (CVE — CVSS 7.8) GET /cookie-picker served HTML that inlined the master bearer token without authentication. Any local process could extract it and use it to call /command, executing arbitrary JS in the browser context. Fix: Jupyter-style one-time code exchange. The picker URL now includes a one-time code that is consumed via 302 redirect, setting an HttpOnly session cookie. The master AUTH_TOKEN never appears in HTML. The session cookie is isolated from the scoped token system (not valid for /command). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.15.17.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: browse-snapshot E2E turn budget too tight (7 → 9) The agent consistently uses 8 turns for 5 snapshot commands because it reads the saved annotated PNG to verify it was created. All 3 CI attempts hit error_max_turns at exactly 8. Bumping to 9 gives headroom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: relationship closing — office-hours adapts to repeat users (v0.16.2.0) (#937) * fix: sync package.json version with VERSION file package.json was 0.15.15.0 while VERSION was 0.15.16.0, causing gen-skill-docs freshness check test failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add builder profile helper for office-hours relationship closing New bin/gstack-builder-profile reads ~/.gstack/builder-profile.jsonl and outputs structured summary (tier, signals, resources, topics). Single source of truth for all closing state — no separate config keys or logs. Uses bun-based JSONL parsing pattern from gstack-learnings-search. Graceful fallback to introduction tier if bun unavailable or file missing. 26 unit tests covering tier computation, signal accumulation, cross-project detection, nudge eligibility, resource dedup, and malformed JSONL handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: relationship closing — office-hours adapts to repeat users The office-hours closing now deepens over time instead of repeating the same YC plea every session. Four tiers based on session count: - Introduction (session 1): full YC plea + founder resources - Welcome Back (sessions 2-3): lead with recognition, skip plea - Regular (sessions 4-7): arc-level callbacks, signal visibility, builder-to-founder nudge, auto-generated journey summary - Inner Circle (sessions 8+): the data speaks Key design decisions (from CEO + Eng + Codex + DX reviews): - Single source of truth: one builder-profile.jsonl, no split-brain state - Lead with recognition on repeat visits (DX: magical moment hits immediately) - Narrative arc journey summary, not data tables - Tone examples per tier to prevent generic AI voice - Global resource dedup (low-sensitivity video watch history) - Migration merges per-project resource logs into builder profile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.16.2.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: rename remaining gstack references to nstack after upstream merge Upstream auto-merged files still had gstack naming. Fixed: - Source files (browse, scripts, tests) - Bin scripts (gstack-* -> nstack-*) - SKILL.md files from upstream - OpenClaw files and skill directories Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove orphaned supabase migration from upstream merge This migration (003_installations_upsert_policy.sql) was added upstream for the telemetry system we already removed. Orphaned after merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix:…

garrytan and others added 3 commits April 5, 2026 01:53

test: regression test for >128KB Codex session_meta

675a975

Documents the current 128KB buffer limitation. When Codex embeds session_meta beyond 128KB, this test will fail, signaling the need for a streaming parse or larger buffer.

chore: bump version and changelog (v0.15.8.1)

9eae53f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

garrytan merged commit 2b08cfe into main Apr 5, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: close redundant PRs + friendly error on all design commands (v0.15.8.1)#817

fix: close redundant PRs + friendly error on all design commands (v0.15.8.1)#817
garrytan merged 3 commits intomainfrom
garrytan/close-redundant-prs

garrytan commented Apr 5, 2026

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented Apr 5, 2026

Summary

Test Coverage

Pre-Landing Review

Plan Completion

Reviews

Test plan

Uh oh!

github-actions bot commented Apr 5, 2026

E2E Evals: ✅ PASS

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant