Skip to content

e2e cursor#577

Draft
squishykid wants to merge 9 commits intomainfrom
rwr/e2e-cursor
Draft

e2e cursor#577
squishykid wants to merge 9 commits intomainfrom
rwr/e2e-cursor

Conversation

@squishykid
Copy link
Member

No description provided.

squishykid and others added 4 commits March 2, 2026 13:46
Research findings for the `agent` binary (Cursor CLI):
- AGENT.md one-pager with hook payloads, transcript format, CLI flags
- Probe script for capturing hook payloads from the CLI
- Key finding: beforeSubmitPrompt and stop hooks don't fire in -p mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 44abbd4f67c4
Registers the `agent` binary (Cursor CLI) with the E2E framework as
"cursor-cli". Maps to Entire agent "cursor" so existing hooks/strategy
code is reused.

- RunPrompt: `agent -p <prompt> --force --trust --workspace <dir>`
- StartSession: tmux-based interactive mode with `agent --force`
- Bootstrap: validates CURSOR_API_KEY on CI
- TimeoutMultiplier: 1.5x (slightly conservative for API variability)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 44f3de857440
Handle workspace trust dialog (--trust only works in headless mode) and
update prompt pattern to match Cursor's actual TUI input indicator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 9093105ebe04
RunPrompt now uses tmux interactive sessions instead of headless -p mode,
which ensures beforeSubmitPrompt and stop hooks fire correctly. Extracts
shared helpers for session startup and workspace trust dialog handling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 9a7ed0c8e36e
Copilot AI review requested due to automatic review settings March 2, 2026 14:21
@cursor
Copy link

cursor bot commented Mar 2, 2026

PR Summary

Medium Risk
Adds a new E2E agent that auto-registers when the agent binary is present, which can expand the default E2E matrix and introduce new tmux/prompt-matching and auth-related flake points. Changes are otherwise additive and limited to test tooling/docs.

Overview
Adds a new cursor-cli E2E agent that drives Cursor’s agent binary via an interactive tmux session (including workspace trust handling) to ensure full hook lifecycle events fire, and includes transient-error detection plus a CI guard requiring CURSOR_API_KEY.

Also adds a Cursor CLI integration one-pager (AGENT.md) documenting hook behavior differences between headless -p and interactive modes and transcript location, plus a scripts/test-cursor-cli-agent-integration.sh probe script that installs capture hooks into .cursor/hooks.json and collects hook payloads for debugging.

Written by Cursor Bugbot for commit cadb34a. Configure here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial end-to-end (E2E) support and exploratory tooling for running Entire against the Cursor CLI (agent) and documenting observed hook/transcript behavior.

Changes:

  • Introduces an E2E agent implementation for Cursor CLI using tmux-based interactive mode.
  • Adds a probe script to install Cursor hooks and capture hook payloads to disk for manual/automated investigation.
  • Adds a Cursor CLI integration one-pager documenting hook semantics, transcript location/layout, and limitations.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
scripts/test-cursor-cli-agent-integration.sh New probe script that wires .cursor/hooks.json, captures stdin payloads per hook event, and prints collected JSON.
e2e/agents/cursor_cli.go New E2E agent adapter that runs Cursor CLI interactively via tmux and drives prompts based on a TUI prompt regex.
cmd/entire/cli/agent/cursor/AGENT.md New documentation summarizing Cursor CLI hook behavior and transcript layout expectations.

Comment on lines +82 to +123
timeout := 90 * time.Second
if cfg.PromptTimeout > 0 {
timeout = cfg.PromptTimeout
}

displayCmd := a.Binary() + " --force --workspace " + dir + " (interactive prompt: " + prompt + ")"

// Start an interactive tmux session so all hooks fire
// (beforeSubmitPrompt and stop don't fire in headless -p mode).
s, err := a.startInteractiveSession(dir)
if err != nil {
return Output{Command: displayCmd, ExitCode: -1},
fmt.Errorf("start interactive session: %w", err)
}
defer s.Close()

// Wait for trust dialog and accept it.
if err := a.acceptTrustDialogIfNeeded(s); err != nil {
return Output{Command: displayCmd, Stdout: s.Capture(), ExitCode: -1}, err
}

// Wait for the TUI to be ready.
if _, err := s.WaitFor(a.PromptPattern(), 30*time.Second); err != nil {
return Output{Command: displayCmd, Stdout: s.Capture(), ExitCode: -1},
fmt.Errorf("waiting for startup prompt: %w", err)
}

// Send the prompt.
if err := s.Send(prompt); err != nil {
return Output{Command: displayCmd, Stdout: s.Capture(), ExitCode: -1},
fmt.Errorf("sending prompt: %w", err)
}

// Wait for the prompt pattern to reappear (agent finished processing).
content, waitErr := s.WaitFor(a.PromptPattern(), timeout)
if waitErr != nil {
// Check for deadline exceeded to allow transient error detection.
if ctx.Err() == context.DeadlineExceeded {
waitErr = fmt.Errorf("%w: %w", waitErr, context.DeadlineExceeded)
}
return Output{Command: displayCmd, Stdout: content, ExitCode: -1}, waitErr
}
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunPrompt doesn’t respect ctx cancellation/deadline: the tmux WaitFor calls only use local timeouts, so a test context expiring won’t stop the wait and can run past the scenario budget. Bound the per-prompt timeout by ctx’s deadline (when set) and/or actively abort/close the session when ctx.Done() is triggered.

Copilot uses AI. Check for mistakes.
// in interactive mode (no -p flag) so all hooks fire.
func (a *CursorCLI) startInteractiveSession(dir string) (*TmuxSession, error) {
name := fmt.Sprintf("cursor-cli-test-%d", time.Now().UnixNano())
return NewTmuxSession(name, dir, nil, a.Binary(), "--force", "--workspace", dir)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startInteractiveSession launches the agent without the E2E env vars used elsewhere (notably ENTIRE_TEST_TTY=0, and typically ACCESSIBLE=1). Since Cursor runs hook commands as child processes, missing these env vars can make hook/test behavior diverge from the rest of the E2E suite (see e.g. e2e/agents/gemini.go:128, e2e/agents/claude.go:178-193). Consider wrapping the command with env ... like the other tmux-based agents.

Suggested change
return NewTmuxSession(name, dir, nil, a.Binary(), "--force", "--workspace", dir)
env := map[string]string{
"ENTIRE_TEST_TTY": "0",
}
if accessible := os.Getenv("ACCESSIBLE"); accessible != "" {
env["ACCESSIBLE"] = accessible
}
return NewTmuxSession(name, dir, env, a.Binary(), "--force", "--workspace", dir)

Copilot uses AI. Check for mistakes.
- `--model <model>`: Model override (e.g., `sonnet-4`, `gpt-5`)
- `--output-format <fmt>`: `text` (default), `json`, `stream-json`
- Interactive mode: `agent --force` (launches TUI)
- Prompt pattern for TUI ready: TBD (needs interactive probe)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc says the interactive-mode prompt pattern is “TBD”, but the new E2E agent implementation depends on a concrete PromptPattern() for readiness detection. Please update this section to reflect the actual prompt pattern being used (or document how it’s derived/validated) so the one-pager stays consistent with the code.

Suggested change
- Prompt pattern for TUI ready: TBD (needs interactive probe)
- Prompt pattern for TUI ready: Defined by `PromptPattern()` in the Cursor agent integration; derived from the `agent` startup banner and validated via interactive E2E tests.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants