Fix/agent improvements 2 by NisargIO · Pull Request #65 · millionco/expect

NisargIO · 2026-03-31T04:14:03Z

Summary by cubic

Expands cross‑browser testing and hardens the agent/CLI: installs all Playwright engines in CI/init/Action, adds health checks and screenshot evidence, simplifies the TUI to keyboard‑only, and improves Action setup with optional Claude token automation and version checks. Drops the iOS simulator and the deprecated security audit tool.

New Features
- Browser MCP: open accepts browser (chromium | webkit | firefox); CDP limited to Chromium; Playwright exec gains snapshotAfter and viewport‑aware ARIA snapshots; screenshots use CSS scaling; CI reporter prints screenshot paths.
- CI/init/GitHub Action: install Chromium/WebKit/Firefox with streamed output and a 5‑minute timeout; Action installs deps and @anthropic-ai/claude-code, checks out full history (fetch-depth: 0), skips the dev server when EXPECT_BASE_URL is set; fixed 8‑step website test plan; Action helper can generate a Claude token and set ANTHROPIC_API_KEY via gh.
- CLI: detect nearby projects and show them in the port picker; pass base URLs via --url/-u and feed dev‑server hints into execution; website adds /api/version, and the CLI pings VERSION_API_URL on startup.
- Tools: network_requests flags failed requests, near‑duplicate requests within 500ms, and mixed content; console_logs summarizes error/warning counts.
Refactors
- Removed iOS Simulator/Appium and the security_audit MCP tool.
- Split browser runtime into overlay, performance, and rrweb; added tracing with named services in expect-cli and expect-mcp; MCP tools resolve in dev/test; CI stability: fail fast on auth/fatal adapter errors, emit run start immediately, treat any session update as activity, and print recent debug logs on failures.
- TUI: removed mouse/click support in favor of keyboard‑only; verbose logging is reactive.
- Skill install: download the expect skill tarball into .agents/skills/expect and symlink into selected agents.

^{Written for commit ef736a9. Summary will update on new commits.}

Note

Medium Risk
Moderate risk because it changes installation/setup flows, CI workflow behavior, and CLI navigation/port selection logic, which can break onboarding or automated runs across environments.

Overview
Streamlines and hardens Expect setup and CI runs. CI workflows now install all Playwright browsers (Chromium/WebKit/Firefox), and the generated expect.yml workflow is updated to fetch full git history, support overriding EXPECT_BASE_URL via repo variables (skipping local dev-server startup), and always wait on the resolved base URL.

Refactors the CLI onboarding and TUI behavior. init installs Playwright browsers with a timeout, add-github-action can optionally generate a Claude token and set ANTHROPIC_API_KEY via the GitHub CLI, and skill installation is reworked to download the skill tarball into .agents/skills/expect and symlink it into selected agent directories.

Improves test targeting and port picking. The watch command can accept a base URL, the main menu can pass CLI-provided base URLs into testing, and the port picker now surfaces detected (not-running) projects and passes devServerHints into the testing screen; the interactive UI is simplified by removing mouse/clickable support in favor of keyboard-driven selection.

^{Written by Cursor Bugbot for commit ef736a9. This will update automatically on new commits. Configure here.}

.agents/skills/expect/SKILL.md is the source of truth. The two copies had drifted apart: packages/ had Session Replay + Telemetry docs that .agents/ lacked, while .agents/ had been rewritten with accessibility (WCAG), performance (Web Vitals/LoAF), --cookies, and run_in_background that packages/ never received. This commit: - Adds the missing Session Replay and Telemetry sections to .agents/ - Overwrites packages/ from .agents/ (preserving its extra frontmatter) - Adds scripts/sync-skill.sh to copy .agents/ → packages/ or --check that they haven't drifted - Wires the --check into CI so future divergence fails the build Closes #63

The .agents/skills/ directory contains domain expertise (accessibility rules, SEO checklists, design guidelines) that was only used for developing expect itself. This wires them into the test agent so it applies those criteria when evaluating UIs during browser tests. Implementation: - scripts/generate-evaluation-skills.sh reads fixing-accessibility, fixing-seo, and web-design-guidelines skills and emits a TypeScript module with their content inlined as string constants - packages/shared/src/evaluation-skills.generated.ts is the output, committed so the CLI works without running the script first - buildExecutionSystemPrompt() appends the skill content to the system prompt as additional evaluation criteria - prebuild hook runs the codegen so the generated file stays current - 3 new tests verify the skills appear in the system prompt

This reverts commit 0366072.

- Added snapshotAfter parameter to the Playwright execution context, allowing automatic ARIA snapshot capture after code execution. - Updated input schema and documentation to reflect the new feature. - Enhanced error handling to return structured results, including success status and error messages. - Added tests to verify functionality of snapshotAfter, ensuring fresh snapshots are returned alongside execution results.

vercel · 2026-03-31T04:14:08Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
expect	Ready	Preview, Comment	Apr 1, 2026 11:35pm

pkg-pr-new · 2026-03-31T04:15:13Z

Open in StackBlitz

npm i https://pkg.pr.new/expect-cli@65

commit: 1cb67e8

cubic-dev-ai

16 issues found across 110 files

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/expect-skill/fixing-animation/SKILL.md">

<violation number="1" location="packages/expect-skill/fixing-animation/SKILL.md:45">
P2: Avoid documenting `will-change` as a permanent style; this conflicts with your own guidance and can cause unnecessary layer promotion. Scope it to active animation only.</violation>
</file>

<file name="packages/expect-skill/fixing-animation/rules/gpu-performance.md">

<violation number="1" location="packages/expect-skill/fixing-animation/rules/gpu-performance.md:78">
P2: The example applies `will-change: transform` permanently, which contradicts the file’s own guidance to scope it only to active animations.</violation>
</file>

<file name="apps/cli/src/commands/add-skill.ts">

<violation number="1" location="apps/cli/src/commands/add-skill.ts:63">
P2: Add an empty-agent guard for non-interactive runs so `-y` does not proceed as if installation were valid when no supported agents are detected.</violation>

<violation number="2" location="apps/cli/src/commands/add-skill.ts:101">
P2: Do not mark installation as successful when no selected agent symlink was created.</violation>
</file>

<file name="packages/expect-skill/fixing-animation/rules/consistency.md">

<violation number="1" location="packages/expect-skill/fixing-animation/rules/consistency.md:81">
P2: `duration-fast` is defined with conflicting values (100ms vs 125ms) in the same rule, which can cause inconsistent implementation across components.</violation>
</file>

<file name="packages/expect-skill/react-best-practices/SKILL.md">

<violation number="1" location="packages/expect-skill/react-best-practices/SKILL.md:12">
P3: The summary count is inconsistent with the rules listed below (8 categories/58 rules vs a 9th category with an additional rule). Update the summary text to avoid contradictory guidance.</violation>
</file>

<file name="packages/expect-skill/react-best-practices/rules/bundle-conditional.md">

<violation number="1" location="packages/expect-skill/react-best-practices/rules/bundle-conditional.md:22">
P2: The example calls `setEnabled(false)` but no `setEnabled` is defined, so the snippet does not compile as written.</violation>
</file>

<file name="apps/cli/src/commands/skill-content.ts">

<violation number="1" location="apps/cli/src/commands/skill-content.ts:38">
P2: The skill content references `--base-url`, but this CLI flag is not supported. Users following this instruction will hit an unknown option error; keep the guidance to `EXPECT_BASE_URL` only.</violation>
</file>

<file name="packages/expect-skill/react-best-practices/rules/rendering-animate-svg-wrapper.md">

<violation number="1" location="packages/expect-skill/react-best-practices/rules/rendering-animate-svg-wrapper.md:38">
P3: The guidance line mixes CSS properties/functions/transition terminology, which makes the rule technically inaccurate and potentially confusing.</violation>
</file>

<file name="packages/expect-skill/fixing-animation/rules/accessibility.md">

<violation number="1" location="packages/expect-skill/fixing-animation/rules/accessibility.md:96">
P2: The example uses `motion.div` without importing `motion`, so the snippet is not runnable as written.</violation>
</file>

<file name="packages/expect-skill/react-best-practices/rules/js-set-map-lookups.md">

<violation number="1" location="packages/expect-skill/react-best-practices/rules/js-set-map-lookups.md:15">
P3: The example snippets are syntactically invalid because they use a bare `...` inside array literals.</violation>
</file>

<file name="packages/expect-skill/react-best-practices/rules/async-parallel.md">

<violation number="1" location="packages/expect-skill/react-best-practices/rules/async-parallel.md:20">
P2: The phrase "1 round trip" is inaccurate for `Promise.all`; this is still multiple requests executed concurrently, not a single round trip.</violation>
</file>

<file name="packages/expect-skill/react-best-practices/rules/js-cache-storage.md">

<violation number="1" location="packages/expect-skill/react-best-practices/rules/js-cache-storage.md:48">
P2: Cookie parsing splits on every `=`, which can truncate valid cookie values that contain `=`.</violation>

<violation number="2" location="packages/expect-skill/react-best-practices/rules/js-cache-storage.md:65">
P2: The invalidation example clears only `storageCache` and leaves `cookieCache` stale, despite mentioning external cookie changes.</violation>
</file>

<file name="packages/expect-skill/react-best-practices/rules/js-cache-function-results.md">

<violation number="1" location="packages/expect-skill/react-best-practices/rules/js-cache-function-results.md:33">
P2: The recommended module-level `Map` cache is unbounded and never invalidated, which can cause unbounded memory growth.</violation>
</file>

<file name="packages/expect-skill/react-best-practices/rules/client-swr-dedup.md">

<violation number="1" location="packages/expect-skill/react-best-practices/rules/client-swr-dedup.md:48">
P2: Use a default import for `useSWRMutation`; the current named import is incorrect for `swr/mutation`.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

- Enhanced the CI reporter to include screenshot paths in the artifacts output. - Updated the artifact extraction logic to capture screenshot paths from execution events. - Modified the run-test utility to pass screenshot paths to the CI reporter. - Updated the MCP session to save and return screenshot paths as part of the close result. - Adjusted the server logic to handle and display screenshot information in the results.

github-actions · 2026-03-31T04:20:35Z

Test Results

❌ Website Test: failed

10 passed, 1 failed out of 12 steps — 535s

Step	Status	Duration
Homepage loads with hero section and install commands	✅ passed	26s
View demo navigates to /replay?demo=true and loads player	✅ passed	60s
Replay controls — play/pause, speed selector, step list	✅ passed	181s
Copy button — install command copied to clipboard	✅ passed	37s
Theme toggle — dark mode switch and back to light	✅ passed	35s
Footer links — GitHub and X have correct URLs and target="_blank"	✅ passed	11s
Legal pages — /terms, /privacy, /security each load with text content	✅ passed	31s
Mobile viewport — 375x812 renders without horizontal scrollbar, key content visible	✅ passed	61s
Accessibility audit (WCAG 2.1 AA)	⬜ not-run	—
Performance metrics	✅ passed	14s
Project healthcheck — pnpm check	❌ failed	23s
Cross-browser (WebKit/Safari) — primary flow recheck	✅ passed	50s

Session Recording

https://github.com/millionco/expect/releases/download/ci-pr-65/e3a44d78-742d-40b3-a444-a7bb5bee5902.webm

Workflow run #118 | 📎 Download all recordings

cubic-dev-ai

2 issues found across 6 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/browser/src/mcp/mcp-session.ts">

<violation number="1" location="packages/browser/src/mcp/mcp-session.ts:148">
P2: Screenshot paths accumulate across session lifetimes, so `close()` may return stale paths from earlier runs.</violation>

<violation number="2" location="packages/browser/src/mcp/mcp-session.ts:168">
P2: `saveScreenshot` records a screenshot path even when writing the file fails, which can return non-existent screenshot artifacts.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

- Revised the description to clarify usage for fixing accessibility violations and compliance with WCAG 2.1 AA. - Enhanced rules section with detailed accessibility requirements for interactive elements, keyboard access, focus management, and semantics. - Added examples for common accessibility issues and their corrections. - Expanded the guidelines for forms, announcements, contrast, and content handling to improve overall clarity and usability.

cubic-dev-ai

3 issues found across 7 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/expect-skill/fixing-accessibility/SKILL.md">

<violation number="1" location="packages/expect-skill/fixing-accessibility/SKILL.md:27">
P2: This line turns 44x44 touch targets into a strict WCAG 2.1 AA requirement, which conflicts with the repo’s own guidance and can produce false-positive accessibility findings. Reword it as a recommendation instead of a mandatory rule.</violation>
</file>

<file name="packages/expect-skill/web-design-guidelines/SKILL.md">

<violation number="1" location="packages/expect-skill/web-design-guidelines/SKILL.md:20">
P2: This rule is too broad: native interactive elements should not require manual keyboard handlers. Limit this requirement to custom non-native interactive elements.</violation>

<violation number="2" location="packages/expect-skill/web-design-guidelines/SKILL.md:62">
P3: The quote examples are incorrect: the line says to use curly quotes but only shows straight quotes, making the guidance ambiguous.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-31T04:36:23Z

+### Typography
+
+- `…` not `...`
+- Curly quotes `"` `"` not straight `"`


P3: The quote examples are incorrect: the line says to use curly quotes but only shows straight quotes, making the guidance ambiguous.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/expect-skill/web-design-guidelines/SKILL.md, line 62: <comment>The quote examples are incorrect: the line says to use curly quotes but only shows straight quotes, making the guidance ambiguous.</comment> <file context> @@ -9,31 +9,173 @@ metadata: +### Typography + +- `…` not `...` +- Curly quotes `"` `"` not straight `"` +- Non-breaking spaces: `10 MB`, `⌘ K`, brand names +- Loading states end with `…`: `"Loading…"`, `"Saving…"` </file context>

Suggested change

- Curly quotes `"` `"` not straight `"`

- Curly quotes `“` `”` not straight `"`

- performance_metrics: add page health section with DOM node count, viewport overflow detection, broken images, images without dimensions, and duplicate ID detection - console_logs: add error/warning count summary to response - network_requests: detect failed requests (4xx/5xx), duplicate requests within 500ms window, and mixed content (HTTP on HTTPS)

…kflow

cubic-dev-ai

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/browser/src/mcp/server.ts">

<violation number="1" location="packages/browser/src/mcp/server.ts:368">
P2: Mixed-content detection uses any historical HTTPS document in the filtered entries, which can produce false positives across multi-navigation sessions.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

vercel · 2026-03-31T04:58:38Z

+          const isHttps = entries.some(
+            (entry) => entry.resourceType === "document" && entry.url.startsWith("https://"),
+          );
+          const mixedContent = isHttps
+            ? entries.filter(
+                (entry) => entry.resourceType !== "document" && entry.url.startsWith("http://"),
+              )
+            : [];


Mixed-content detection incorrectly flags HTTP resources as mixed content when ANY historical document was HTTPS, causing false positives in multi-navigation sessions

- Added @expect/shared as a workspace dependency in pnpm-lock.yaml and package.json. - Updated MCP runtime to include Tracing layer for improved observability. - Enhanced various MCP server methods with tracing spans for better performance monitoring.

Added logic to the ACP client to monitor stderr for fatal errors reported by adapters, ensuring immediate failure of active session queues upon detection. This prevents consumers from hanging indefinitely and improves overall error handling in the system.

cubic-dev-ai

3 issues found across 17 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/cli/src/commands/init-utils.ts">

<violation number="1" location="apps/cli/src/commands/init-utils.ts:119">
P2: `generateClaudeToken` now buffers all stdout and prints it only after process completion, which can hide interactive prompts and stall user-driven token setup.</violation>
</file>

<file name="packages/shared/src/observability/Tracing.ts">

<violation number="1" location="packages/shared/src/observability/Tracing.ts:12">
P1: Do not hard-code the Axiom API token in source. Read it from configuration/environment so credentials aren’t committed to the repo.</violation>
</file>

<file name="packages/agent/src/acp-client.ts">

<violation number="1" location="packages/agent/src/acp-client.ts:573">
P1: `Stream.take(1)` stops draining adapter stderr after the first fatal match, which can block the child process if stderr continues to be written.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-01T09:00:18Z

-    const dataset = "expect-cli";
-    const token = "xaat-a6ce2fdb-d378-444e-9d72-bb458867187a";
+const AXIOM_DATASET = "expect-cli";
+const AXIOM_TOKEN = "xaat-a6ce2fdb-d378-444e-9d72-bb458867187a";


P1: Do not hard-code the Axiom API token in source. Read it from configuration/environment so credentials aren’t committed to the repo.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/shared/src/observability/Tracing.ts, line 12: <comment>Do not hard-code the Axiom API token in source. Read it from configuration/environment so credentials aren’t committed to the repo.</comment> <file context> @@ -1,25 +1,22 @@ - const dataset = "expect-cli"; - const token = "xaat-a6ce2fdb-d378-444e-9d72-bb458867187a"; +const AXIOM_DATASET = "expect-cli"; +const AXIOM_TOKEN = "xaat-a6ce2fdb-d378-444e-9d72-bb458867187a"; - return Otlp.layerJson({ </file context>

Suggested change

const AXIOM_TOKEN = "xaat-a6ce2fdb-d378-444e-9d72-bb458867187a";

const AXIOM_TOKEN = process.env.AXIOM_TOKEN ?? "";

cubic-dev-ai · 2026-04-01T09:00:18Z

      Stream.tap((line) => Effect.logDebug("ACP adapter stderr", { line })),
+      Stream.map(getAdapterSessionError),
+      Stream.filter((error): error is SessionQueueError => error !== undefined),
+      Stream.take(1),


P1: Stream.take(1) stops draining adapter stderr after the first fatal match, which can block the child process if stderr continues to be written.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/agent/src/acp-client.ts, line 573: <comment>`Stream.take(1)` stops draining adapter stderr after the first fatal match, which can block the child process if stderr continues to be written.</comment> <file context> @@ -560,18 +559,20 @@ export class AcpClient extends ServiceMap.Service<AcpClient>()("@expect/AcpClien - Stream.filterEffect(() => - Ref.get(adapterSessionErrorRef).pipe(Effect.map((existing) => !existing)), - ), + Stream.take(1), Stream.tap((adapterSessionError) => Effect.gen(function* () { </file context>

Got it. Can you share what’s incorrect or the intended behavior here so I can refine future reviews?

cubic-dev-ai · 2026-04-01T09:00:18Z

+  const [stdoutText, exitCode] = yield* Effect.all(
+    [Stream.mkString(Stream.decodeText(handle.stdout)), handle.exitCode],
+    { concurrency: 2 },
+  );
+
+  process.stdout.write(stdoutText);


P2: generateClaudeToken now buffers all stdout and prints it only after process completion, which can hide interactive prompts and stall user-driven token setup.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/cli/src/commands/init-utils.ts, line 119: <comment>`generateClaudeToken` now buffers all stdout and prints it only after process completion, which can hide interactive prompts and stall user-driven token setup.</comment> <file context> @@ -115,33 +116,36 @@ export const generateClaudeToken = Effect.gen(function* () { - process.stdout.write(text); - }), - ), + const [stdoutText, exitCode] = yield* Effect.all( + [Stream.mkString(Stream.decodeText(handle.stdout)), handle.exitCode], + { concurrency: 2 }, </file context>

Suggested change

const [stdoutText, exitCode] = yield* Effect.all(

[Stream.mkString(Stream.decodeText(handle.stdout)), handle.exitCode],

{ concurrency: 2 },

);

process.stdout.write(stdoutText);

let stdoutText = "";

yield* handle.stdout.pipe(

Stream.decodeText(),

Stream.runForEach((text) =>

Effect.sync(() => {

stdoutText += text;

process.stdout.write(text);

}),

),

);

const exitCode = yield* handle.exitCode;

…sion errors Refactored the error handling logic in the ACP client to return an Option type for session errors, enhancing type safety and clarity. This change allows for better handling of potential errors reported by adapters, ensuring that only valid errors are processed and logged.

Reverted all test body changes back to main. Only changed TEST_LAYERS to add availability detection via isCommandAvailable so tests skip when the agent binary isn't installed. Un-skipped session resume test per review feedback. Made-with: Cursor

Cleaned up the init-utils.ts and init-utils.test.ts files by removing trailing whitespace, improving code readability without altering functionality.

skoshx

round2

skoshx · 2026-04-01T11:22:33Z

 .testie-agent-traces/
-packages/video/build
 .cursor/plans
+.repos


this shouldn't be gitignored because then the agent cant get results from effect so it doesn't improve perf and become Effect god

skoshx · 2026-04-01T11:38:49Z

+          }),
+          Effect.forEach(
+            Array.from(sessionUpdatesMap.values()),
+            (updatesQueue) => Queue.fail(updatesQueue, adapterSessionError),


a Stream is only failed once, so this sessionUpdatesMap isn't needed..

Just listen to stderr, parse the error, then Stream.tap(() => Queue.fail(updatesQueue, error),

We are adding multiple sessions per test to improve testing perf in the future

skoshx · 2026-04-01T11:39:20Z

      Stream.tap((line) => Effect.logDebug("ACP adapter stderr", { line })),
+      Stream.map(getAdapterSessionError),
+      Stream.filter((error): error is SessionQueueError => error !== undefined),
+      Stream.take(1),


skoshx · 2026-04-01T11:57:15Z

-    });
-  }),
-).pipe(Layer.provide(NodeHttpClient.layerUndici), Layer.orDie);
+export const layerAxiom = (serviceName = AXIOM_DATASET) =>


this looks sus?? serviceName isn't the same as dataset name AFAIK ? the dataset should always be "expect-cli", but service name could be either "expect-mcp" or like "expect-clI"

skoshx · 2026-04-01T11:59:25Z

-    }).pipe(Effect.orDie);
+    }).pipe(
+      Effect.catchTag("UnknownError", (cause) =>
+        Effect.logWarning("Failed to get machine ID, using fallback", { cause }).pipe(


this is bad because if machineId fails, now well have noisy ahh analytics because same user will get a different UUID for each run

machineId is unlikely to fail only in extreme edge cases

skoshx · 2026-04-01T12:01:01Z

+            Effect.logWarning("Git failed to get diff preview", {
+              error: error.message,
+              scope: "WorkingTree",
+            }).pipe(Effect.map(() => "")),


don't swallow errors just let them bubble up.

this Git error should not happen, so swallowing it just means masking a potential bug

These can throw an error i.e for example if the user doesn't add fetch-depth: 0 to GHA. I think it's fine instead of bubbling up. We don't want this to crash the session

stepStatuses was creating Map entries for step events whose stepId didn't match any step in the steps array, causing status to return "failed" even when 0 visible steps failed.

The AI agent was inventing inconsistent test plans each run, leading to flaky steps like cross-browser WebKit, accessibility audits, and font loading checks. A fixed checklist gives deterministic step coverage.

Restored the .cursor/plans entry in the .gitignore file to ensure that the plans directory is ignored during version control, maintaining a clean repository.

Modified the layerAxiom function to use a default service name constant, AXIOM_DEFAULT_SERVICE_NAME, improving code readability and ensuring consistency in service naming.

Eliminated unnecessary imports from Tracing.ts, including NodeSocket and DevTools, to streamline the code and enhance readability. This change focuses on maintaining a cleaner codebase without altering existing functionality.

…ll container support Updated the snapshot process to include viewport-aware options, allowing for more accurate representation of elements within scroll containers. Introduced new parameters in SnapshotOptions and adjusted the computeSnapshotStats function to account for hidden nodes. Added tests to validate the new behavior and ensure correct statistics are computed when scroll containers are present.

Refactored the scroll detection logic to improve readability and maintainability by introducing helper functions for hiding child elements and inserting markers. Updated the snapshot process to utilize a constant for no scroll containers, enhancing the handling of viewport-aware options. This change simplifies the code while preserving existing functionality.

…oration Improved error handling in the viewport snapshot process by adding debug logging for failures during both preparation and restoration. This change provides better insights into issues that may arise, ensuring more robust handling of scroll container scenarios.

…etching Introduced a new VERSION_API_URL constant to facilitate fetching the current version from the API. Updated the CLI entry point to make a request to this API, enhancing the update check functionality. Refactored the version fetching logic to utilize the new API endpoint, improving the overall update mechanism.

cubic-dev-ai

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/cli/src/index.tsx">

<violation number="1" location="apps/cli/src/index.tsx:25">
P2: Guard this startup fetch behind the telemetry opt-out flags; currently it still sends a network request even when `NO_TELEMETRY=1`.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

…CP tool The agent could only launch Chromium — cross-browser testing in the ui_quality_rules always failed. Now the `open` tool accepts a `browser` parameter (chromium | webkit | firefox), threading through McpSession and Browser.createPage to Playwright's engine-specific launchers. - Install all three Playwright browsers in CI, init, and generated GH Action - CDP connections correctly restricted to chromium only - Chromium-specific launch args skipped for non-chromium engines - Agent prompt updated to document browser switching workflow

…l in init The previous approach used tryRun (60s timeout, stdio ignored) which would silently hang on slow connections downloading ~170MB of browsers. Now streams download progress to the terminal and allows 5 minutes.

- Move PLAYWRIGHT_INSTALL_TIMEOUT_MS to constants.ts per magic number rule - Use BrowserEngine type in McpSession OpenOptions instead of inline union - Replace switch with record lookup in resolveBrowserType

cursor · 2026-04-01T23:30:07Z

 export const UPDATE_CHECK_TIMEOUT_MS = 5_000;
+export const PLAYWRIGHT_INSTALL_TIMEOUT_MS = 300_000;
+
+export type PackageManager = "npm" | "pnpm" | "yarn" | "bun" | "deno";


Duplicate PackageManager types cause silent type divergence

Medium Severity

Two different PackageManager types with the same name are now exported from different modules. The one in constants.ts is "npm" | "pnpm" | "yarn" | "bun" | "deno", while the one in init-utils.ts is "npm" | "pnpm" | "yarn" | "bun" | "vp". The detect-projects.ts utility re-exports the constants version and uses it in DetectedProject, which can return "deno". Any consumer expecting the init-utils variant won't handle "deno", and vice versa for "vp". This creates a subtle type-safety gap that TypeScript won't flag when both types are imported under the same name.

Additional Locations (1)

apps/cli/src/commands/init-utils.ts#L13-L14

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 12 total unresolved issues (including 11 from previous reviews).

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-01T23:38:55Z

  pnpm: "pnpm dlx",
  yarn: "npx",
  bun: "bunx",
+  deno: "deno run -A npm:",


Deno DLX command produces invalid package specifier

Low Severity

The DLX_COMMANDS entry for deno is "deno run -A npm:". When interpolated into the workflow template as ${dlx} expect-cli@latest --ci, this produces deno run -A npm: expect-cli@latest --ci with a space between npm: and the package name. Deno requires npm: to be directly attached to the package name (e.g., npm:expect-cli@latest). This would produce an invalid command if deno is ever detected as the package manager.

cubic-dev-ai

4 issues found across 21 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/browser/src/mcp/server.ts">

<violation number="1" location="packages/browser/src/mcp/server.ts:102">
P2: Validate or normalize `cdp` when a non-Chromium browser is requested; current behavior can falsely report CDP connection even though CDP is ignored.</violation>
</file>

<file name="packages/browser/tests/browser-e2e.test.ts">

<violation number="1" location="packages/browser/tests/browser-e2e.test.ts:378">
P2: Close `chromiumSession` in a `finally` block so failures in snapshot/assertion do not leak a browser instance.</violation>
</file>

<file name="apps/cli/src/commands/add-github-action.ts">

<violation number="1" location="apps/cli/src/commands/add-github-action.ts:39">
P3: The Deno dlx command prefix leaves a space between `npm:` and the package name when interpolated, which makes the generated `deno run` command invalid. Generate the command as `npm:expect-cli@latest` without a separating space.</violation>

<violation number="2" location="apps/cli/src/commands/add-github-action.ts:48">
P3: The new Deno workflow command mapping is dead code because package-manager detection never returns `deno` here.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-01T23:42:43Z

+          const result = yield* session.open(url, {
+            headed,
+            cookies,
+            waitUntil,
+            cdpUrl,
+            browserType,
+          });
+          const engineSuffix = browserType && browserType !== "chromium" ? ` [${browserType}]` : "";
          const cdpSuffix = cdpUrl ? ` (connected via CDP: ${cdpUrl})` : "";


P2: Validate or normalize cdp when a non-Chromium browser is requested; current behavior can falsely report CDP connection even though CDP is ignored.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/browser/src/mcp/server.ts, line 102: <comment>Validate or normalize `cdp` when a non-Chromium browser is requested; current behavior can falsely report CDP connection even though CDP is ignored.</comment> <file context> @@ -93,10 +99,17 @@ export const createBrowserMcpServer = <E>( } - const result = yield* session.open(url, { headed, cookies, waitUntil, cdpUrl }); + const result = yield* session.open(url, { + headed, + cookies, </file context>

Suggested change

const result = yield* session.open(url, {

headed,

cookies,

waitUntil,

cdpUrl,

browserType,

});

const engineSuffix = browserType && browserType !== "chromium" ? ` [${browserType}]` : "";

const cdpSuffix = cdpUrl ? ` (connected via CDP: ${cdpUrl})` : "";

const effectiveCdpUrl =

!browserType || browserType === "chromium" ? cdpUrl : undefined;

const result = yield* session.open(url, {

headed,

cookies,

waitUntil,

cdpUrl: effectiveCdpUrl,

browserType,

});

const engineSuffix = browserType && browserType !== "chromium" ? ` [${browserType}]` : "";

const cdpSuffix = effectiveCdpUrl ? ` (connected via CDP: ${effectiveCdpUrl})` : "";

cubic-dev-ai · 2026-04-01T23:42:43Z

+  });
+
+  it("can switch between chromium and webkit sessions", async () => {
+    const chromiumSession = await runBrowser((browser) =>


P2: Close chromiumSession in a finally block so failures in snapshot/assertion do not leak a browser instance.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/browser/tests/browser-e2e.test.ts, line 378: <comment>Close `chromiumSession` in a `finally` block so failures in snapshot/assertion do not leak a browser instance.</comment> <file context> @@ -288,3 +289,107 @@ describe("browser e2e", () => { + }); + + it("can switch between chromium and webkit sessions", async () => { + const chromiumSession = await runBrowser((browser) => + browser.createPage(origin, { waitUntil: "domcontentloaded" }), + ); </file context>

cubic-dev-ai · 2026-04-01T23:42:44Z

  pnpm: "pnpm install",
  yarn: "yarn install --frozen-lockfile",
  bun: "bun install",
+  deno: "deno install",


P3: The new Deno workflow command mapping is dead code because package-manager detection never returns deno here.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/cli/src/commands/add-github-action.ts, line 48: <comment>The new Deno workflow command mapping is dead code because package-manager detection never returns `deno` here.</comment> <file context> @@ -43,6 +45,7 @@ const INSTALL_COMMANDS: Record<PackageManager, string> = { pnpm: "pnpm install", yarn: "yarn install --frozen-lockfile", bun: "bun install", + deno: "deno install", vp: "npm ci", }; </file context>

cubic-dev-ai · 2026-04-01T23:42:44Z

  pnpm: "pnpm dlx",
  yarn: "npx",
  bun: "bunx",
+  deno: "deno run -A npm:",


P3: The Deno dlx command prefix leaves a space between npm: and the package name when interpolated, which makes the generated deno run command invalid. Generate the command as npm:expect-cli@latest without a separating space.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/cli/src/commands/add-github-action.ts, line 39: <comment>The Deno dlx command prefix leaves a space between `npm:` and the package name when interpolated, which makes the generated `deno run` command invalid. Generate the command as `npm:expect-cli@latest` without a separating space.</comment> <file context> @@ -35,6 +36,7 @@ const DLX_COMMANDS: Record<PackageManager, string> = { pnpm: "pnpm dlx", yarn: "npx", bun: "bunx", + deno: "deno run -A npm:", vp: "npx", }; </file context>

ben-million and others added 8 commits March 30, 2026 20:10

fix: layered interactions prompt

4058f3b

Revert "feat: inject evaluation skills into test agent system prompt"

d8a3c1a

This reverts commit 0366072.

yo

56d85f8

yo

695e876

yo

76d984f

cubic-dev-ai bot reviewed Mar 31, 2026

View reviewed changes

vercel bot deployed to Preview March 31, 2026 04:18 View deployment

cubic-dev-ai bot reviewed Mar 31, 2026

View reviewed changes

Comment thread packages/browser/src/mcp/mcp-session.ts Outdated

Comment thread packages/browser/src/mcp/mcp-session.ts Outdated

vercel bot deployed to Preview March 31, 2026 04:32 View deployment

cursor bot reviewed Mar 31, 2026

View reviewed changes

Comment thread packages/expect-skill/SKILL.md

Comment thread packages/browser/src/mcp/mcp-session.ts Outdated

cubic-dev-ai bot reviewed Mar 31, 2026

View reviewed changes

vercel bot deployed to Preview March 31, 2026 04:40 View deployment

aidenybai force-pushed the fix/agent-improvements-2 branch from f7377fa to 40f62ce Compare March 31, 2026 04:42

vercel bot deployed to Preview March 31, 2026 04:42 View deployment

fix: add Playwright and Claude Code install steps to generated CI wor…

d40d092

…kflow

vercel bot deployed to Preview March 31, 2026 04:46 View deployment

cursor bot reviewed Mar 31, 2026

View reviewed changes

Comment thread .github/workflows/code-quality.yml Outdated

cubic-dev-ai bot reviewed Mar 31, 2026

View reviewed changes

Comment thread packages/browser/src/mcp/server.ts

vercel bot reviewed Mar 31, 2026

View reviewed changes

vercel bot deployed to Preview March 31, 2026 05:17 View deployment

vercel bot deployed to Preview April 1, 2026 08:55 View deployment

vercel bot deployed to Preview April 1, 2026 08:56 View deployment

cubic-dev-ai bot reviewed Apr 1, 2026

View reviewed changes

vercel bot deployed to Preview April 1, 2026 09:02 View deployment

NisargIO added 2 commits April 1, 2026 02:11

refactor: remove unnecessary whitespace in init-utils and test files

3c1a5b2

Cleaned up the init-utils.ts and init-utils.test.ts files by removing trailing whitespace, improving code readability without altering functionality.

skoshx reviewed Apr 1, 2026

View reviewed changes

aidenybai added 2 commits April 1, 2026 14:26

fix

9d671f8

fix

02ea1c6

cursor bot reviewed Apr 1, 2026

View reviewed changes

Comment thread apps/cli/src/components/screens/main-menu-screen.tsx Outdated

aidenybai and others added 9 commits April 1, 2026 14:36

fix: ignore phantom step events in TestReport status computation

59d4f1f

stepStatuses was creating Map entries for step events whose stepId didn't match any step in the steps array, causing status to return "failed" even when 0 visible steps failed.

fix: replace vague CI test prompt with structured 8-step checklist

38b258d

The AI agent was inventing inconsistent test plans each run, leading to flaky steps like cross-browser WebKit, accessibility audits, and font loading checks. A fixed checklist gives deterministic step coverage.

fix: correct .gitignore entry for cursor plans

ba5de95

Restored the .cursor/plans entry in the .gitignore file to ensure that the plans directory is ignored during version control, maintaining a clean repository.

fix: update default service name in layerAxiom for clarity

aba759f

Modified the layerAxiom function to use a default service name constant, AXIOM_DEFAULT_SERVICE_NAME, improving code readability and ensuring consistency in service naming.

refactor: remove unused imports and clean up Tracing.ts

f1b4bd8

Eliminated unnecessary imports from Tracing.ts, including NodeSocket and DevTools, to streamline the code and enhance readability. This change focuses on maintaining a cleaner codebase without altering existing functionality.

cubic-dev-ai bot reviewed Apr 1, 2026

View reviewed changes

Comment thread apps/cli/src/index.tsx

aidenybai added 3 commits April 1, 2026 16:17

refactor: address code review findings

f5d8ca2

- Move PLAYWRIGHT_INSTALL_TIMEOUT_MS to constants.ts per magic number rule - Use BrowserEngine type in McpSession OpenOptions instead of inline union - Replace switch with record lookup in resolveBrowserType

cursor bot reviewed Apr 1, 2026

View reviewed changes

aidenybai added 2 commits April 1, 2026 16:32

fix CI failure

1cb67e8

fix

ef736a9

cursor bot reviewed Apr 1, 2026

View reviewed changes

cubic-dev-ai bot reviewed Apr 1, 2026

View reviewed changes

	- Curly quotes `"` `"` not straight `"`
	- Curly quotes `“` `”` not straight `"`

	const AXIOM_TOKEN = "xaat-a6ce2fdb-d378-444e-9d72-bb458867187a";
	const AXIOM_TOKEN = process.env.AXIOM_TOKEN ?? "";

-  const [stdoutText, exitCode] = yield* Effect.all(
-    [Stream.mkString(Stream.decodeText(handle.stdout)), handle.exitCode],
-    { concurrency: 2 },
-  );
-  process.stdout.write(stdoutText);
+  let stdoutText = "";
+  yield* handle.stdout.pipe(
+    Stream.decodeText(),
+    Stream.runForEach((text) =>
+      Effect.sync(() => {
+        stdoutText += text;
+        process.stdout.write(text);
+      }),
+    ),
+  );
+  const exitCode = yield* handle.exitCode;

Conversation

NisargIO commented Mar 31, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

vercel bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

❌ Website Test: failed

Session Recording

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vercel bot Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

NisargIO commented Mar 31, 2026 •

edited by cubic-dev-ai bot

Loading

vercel bot commented Mar 31, 2026 •

edited

Loading

pkg-pr-new bot commented Mar 31, 2026 •

edited

Loading

github-actions bot commented Mar 31, 2026 •

edited

Loading

cubic-dev-ai bot Mar 31, 2026 •

edited

Loading

vercel bot Mar 31, 2026 •

edited

Loading

cubic-dev-ai bot Apr 1, 2026 •

edited

Loading

cubic-dev-ai bot Apr 1, 2026 •

edited

Loading

cubic-dev-ai bot Apr 1, 2026 •

edited

Loading

Duplicate `PackageManager` types cause silent type divergence

cubic-dev-ai bot Apr 1, 2026 •

edited

Loading