feat: auto-pause agent during Browserbase captcha solving (WIP)#1752
feat: auto-pause agent during Browserbase captcha solving (WIP)#1752derekmeegan wants to merge 9 commits intomainfrom
Conversation
Automatically pause agent execution when Browserbase's captcha solver is active. Listens for browserbase-solving-started/finished/errored console messages and blocks the agent's prepareStep (DOM/hybrid) or action handler (CUA) until solving completes, errors, or hits a 90s timeout. Also updates the system prompt to tell the agent not to interact with captchas since they are handled transparently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Greptile SummaryIntegrated automatic captcha solving pause mechanism for Browserbase environments. Agent execution (both DOM/hybrid and CUA modes) now pauses when Browserbase's captcha solver is active, listening for console messages ( Key changes:
Issues found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent as Agent (DOM/CUA)
participant Handler as Agent Handler
participant CS as CaptchaSolver
participant Page as Browser Page
participant BB as Browserbase
Agent->>Handler: execute() / stream()
Handler->>CS: new CaptchaSolver()
Handler->>CS: attach(page)
CS->>Page: on("console", listener)
loop Each Step/Action
Agent->>Handler: prepareStep() / actionHandler()
Handler->>CS: waitIfSolving()
alt No captcha solving
CS-->>Handler: resolve immediately
Handler->>Agent: continue
else Captcha solving
BB->>Page: console: "browserbase-solving-started"
Page->>CS: listener triggered
CS->>CS: solving = true
Note over CS,Handler: Agent blocked
BB->>Page: console: "browserbase-solving-finished"
Page->>CS: listener triggered
CS->>CS: solving = false, resolveWait()
CS-->>Handler: resolve
Handler->>Agent: continue
else Timeout (90s)
CS->>CS: timeout triggers
CS->>CS: solving = false, lastSolveErrored = true
CS-->>Handler: resolve
Handler->>Handler: log error
Handler->>Agent: continue
end
end
Agent->>Handler: complete/error/abort
Handler->>CS: dispose()
CS->>Page: off("console", listener)
CS->>CS: resolve pending wait
Last reviewed commit: 6d8d7f3 |
There was a problem hiding this comment.
2 issues found across 5 files
Confidence score: 3/5
- Concurrent
waitIfSolving()calls inpackages/core/lib/v3/agent/utils/captchaSolver.tscan orphan earlier waiters due to a singleresolveWaitslot, causing hangs until timeout for some callers. packages/core/lib/v3/handlers/v3AgentHandler.tsmay leakcaptchaSolverifstreamText()throws before callbacks run, since disposal only happens inside stream handlers instream().- These are concrete runtime risks (waiting hangs and resource leaks), so there is some user-impacting risk despite the rest of the change looking contained.
- Pay close attention to
packages/core/lib/v3/agent/utils/captchaSolver.tsandpackages/core/lib/v3/handlers/v3AgentHandler.ts- concurrency waiting and disposal paths.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/agent/utils/captchaSolver.ts">
<violation number="1" location="packages/core/lib/v3/agent/utils/captchaSolver.ts:72">
P1: Bug: `resolveWait` is a single slot, so concurrent `waitIfSolving()` calls orphan earlier waiters. The second call overwrites the resolver, leaving the first caller's promise unresolved until its 90s timeout. Consider sharing a single deferred promise across all waiters, e.g.:
```ts
private waitPromise: Promise<void> | null = null;
waitIfSolving(): Promise<void> {
if (!this.solving) return Promise.resolve();
if (!this.waitPromise) {
this.waitPromise = new Promise<void>((resolve) => {
const timer = setTimeout(() => { /* ... */ }, SOLVE_TIMEOUT_MS);
this.resolveWait = () => { clearTimeout(timer); resolve(); this.waitPromise = null; };
});
}
return this.waitPromise;
}
```</violation>
</file>
<file name="packages/core/lib/v3/handlers/v3AgentHandler.ts">
<violation number="1" location="packages/core/lib/v3/handlers/v3AgentHandler.ts:293">
P2: Resource leak: if `streamText()` throws synchronously, `captchaSolver` is never disposed. The `execute()` method properly wraps this in a `try/finally`, but `stream()` only disposes inside stream callbacks (`onError`/`onFinish`/`onAbort`), which won't fire if `streamText` itself throws. Wrap the `streamText` call and subsequent setup in a try/catch that calls `captchaSolver?.dispose()` on failure.</violation>
</file>
Architecture diagram
sequenceDiagram
participant User as Client/V3
participant Handler as Agent Handler (DOM/CUA)
participant Solver as NEW: CaptchaSolver
participant Page as Browser Page (Browserbase)
participant LLM as LLM Provider
User->>Handler: execute(instruction)
opt NEW: isBrowserbase && captchaSolverEnabled
Handler->>Solver: attach(page)
Solver->>Page: on("console", listener)
end
Note over Handler, LLM: CHANGED: System prompt tells LLM to ignore captchas
loop Agent Step Loop
Page-->>Solver: NEW: console("browserbase-solving-started")
Note right of Solver: solving = true
Handler->>Handler: prepareStep() / actionHandler()
Handler->>Solver: NEW: waitIfSolving()
alt Solver is active
Note over Solver: NEW: Block execution (max 90s timeout)
alt Solve Success
Page-->>Solver: NEW: console("browserbase-solving-finished")
Solver-->>Handler: Resolve (Resume)
else Solve Error
Page-->>Solver: NEW: console("browserbase-solving-errored")
Solver-->>Handler: Resolve (Resume with error flag)
Handler->>Handler: Log solver error & resetError()
else Timeout (90s)
Solver->>Solver: Internal timeout
Solver-->>Handler: Resolve (Resume)
end
else Solver is idle
Solver-->>Handler: Resolve immediately
end
Handler->>LLM: Request next action
LLM-->>Handler: Action response
Handler->>Page: Execute action
end
Handler->>Solver: NEW: dispose()
Solver->>Page: off("console", listener)
Handler-->>User: Result
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
- Fix race condition: concurrent waitIfSolving() callers now share a single deferred promise so no waiter is orphaned. - Extract settle() helper for consistent timeout/resolve/cleanup paths. - Fix stream() disposal: wrap streamText() in try/catch so captchaSolver is disposed if streamText throws synchronously. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc13b5a25a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const page = await this.v3.context.awaitActivePage(); | ||
| captchaSolver.attach(page); |
There was a problem hiding this comment.
Reattach captcha listener when active page changes
In execute() the solver is attached once to the initial page, but agent actions/tools later resolve the current page dynamically via awaitActivePage(), which can switch to a popup/new tab during the run. In that case captcha console events from the new active page are never observed, so waitIfSolving() will not block and the agent can keep interacting while Browserbase is solving. Please track active-page changes (or attach listeners per active page) instead of binding only once.
Useful? React with 👍 / 👎.
| const roadblocksSection = isBrowserbase | ||
| ? `<roadblocks> | ||
| <note>captchas, popups, etc.</note> | ||
| <captcha>If you see a captcha, use the wait tool. It will automatically be solved by our internal solver.</captcha> | ||
| <captcha>Captchas are automatically detected and solved by the Browserbase captcha solver. Do NOT attempt to solve or interact with captchas yourself — they will be handled transparently without any action from you. Your execution will be paused automatically while a captcha is being solved and will resume once it is complete.</captcha> |
There was a problem hiding this comment.
Gate captcha auto-pause prompt on solveCaptchas setting
The prompt now always claims execution is automatically paused for captchas whenever isBrowserbase is true, but runtime only enables this behavior when isCaptchaSolverEnabled (browserSettings.solveCaptchas !== false). For Browserbase sessions with solveCaptchas: false, the model is told not to act because pause is automatic even though no solver/pause is active, which can break captcha handling. This roadblocks text should be conditioned on the same enabled flag as the handlers.
Useful? React with 👍 / 👎.
…aptchas - CaptchaSolver now accepts a page-provider callback and re-attaches the console listener whenever the active page changes (popups, new tabs). This ensures captcha events are observed on whichever page is currently active. - System prompt captcha messaging is now gated on captchaSolverEnabled (solveCaptchas !== false) rather than just isBrowserbase, so sessions with solveCaptchas: false don't incorrectly tell the agent that captchas are auto-solved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/agent/utils/captchaSolver.ts">
<violation number="1" location="packages/core/lib/v3/agent/utils/captchaSolver.ts:54">
P1: Stale `solving` state after page change: when `ensureAttached()` detects a page switch and detaches the old listener, `this.solving` is not reset. If a solve was in progress on the old page, the agent will block for up to 90 seconds waiting for a finish/error event that can never arrive (since the old page's listener was removed). Reset `solving` and settle pending waiters when detaching from a changed page.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| if (page === this.attachedPage) return; | ||
|
|
||
| // Detach from the old page | ||
| this.detachListener(); |
There was a problem hiding this comment.
P1: Stale solving state after page change: when ensureAttached() detects a page switch and detaches the old listener, this.solving is not reset. If a solve was in progress on the old page, the agent will block for up to 90 seconds waiting for a finish/error event that can never arrive (since the old page's listener was removed). Reset solving and settle pending waiters when detaching from a changed page.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/utils/captchaSolver.ts, line 54:
<comment>Stale `solving` state after page change: when `ensureAttached()` detects a page switch and detaches the old listener, `this.solving` is not reset. If a solve was in progress on the old page, the agent will block for up to 90 seconds waiting for a finish/error event that can never arrive (since the old page's listener was removed). Reset `solving` and settle pending waiters when detaching from a changed page.</comment>
<file context>
@@ -29,13 +33,27 @@ export class CaptchaSolver {
+ if (page === this.attachedPage) return;
+ // Detach from the old page
+ this.detachListener();
+
+ this.attachedPage = page;
</file context>
| this.detachListener(); | |
| this.detachListener(); | |
| // Reset solving state — we can no longer receive events from the old page, | |
| // so any in-progress solve must be treated as resolved to avoid a 90s hang. | |
| if (this.solving) { | |
| this.solving = false; | |
| this._lastSolveErrored = false; | |
| this.settle(); | |
| } |
The test was passing isBrowserbase: true but not captchaSolverEnabled, so the auto-solve captcha messaging wasn't included in the prompt. Updated test to pass captchaSolverEnabled: true and match new wording. Added a new test case for isBrowserbase: true with captchaSolverEnabled: false to verify the solver-off path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
why
Automatically pause agent execution when Browserbase's captcha solver is active. Listens for browserbase-solving-started/finished/errored console messages and blocks the agent's prepareStep (DOM/hybrid) or action handler (CUA) until solving completes, errors, or hits a 90s timeout.
Also updates the system prompt to tell the agent not to interact with captchas since they are handled transparently.
what changed
added callbacks
test plan
will review and test code tm
Summary by cubic
Automatically pauses agent actions while Browserbase’s captcha solver runs to prevent accidental interactions and keep execution stable. We listen for solver console events and resume on finish, error, or a 90s timeout.
Written for commit 6d8d7f3. Summary will update on new commits. Review in cubic