Skip to content

feat(browser): Reusable browser sessions in codemode#1656

Open
cjol wants to merge 6 commits into
mainfrom
feat/browser-codemode-tools
Open

feat(browser): Reusable browser sessions in codemode#1656
cjol wants to merge 6 commits into
mainfrom
feat/browser-codemode-tools

Conversation

@cjol
Copy link
Copy Markdown
Contributor

@cjol cjol commented Jun 2, 2026

This PR refactors browser tooling to run inside codemode and introduces the primitives needed to make those sessions reusable across tool calls. It supercedes #1492 and closes #1501.

Three packages change together because they share a hand-off: codemode grows the runtime contract, agents/browser implements a provider that satisfies it, and think switches from its old hand-rolled browser tools to that provider.

Demo

Opening a persistent browser session, saving a screenshot to the workspace, display of the browser session in the UI, interactive manipulation of the live browser preview, saving another screenshot. No audio.

GMT20260603-092551_Clip_Christopher.Little-Savage.s.Clip.06_03_2026.mp4

Why

The old browser_search / browser_execute tools in @cloudflare/think were AI SDK tools that:

  1. Duplicated logic that already exists in @cloudflare/codemode (sandboxed code execution, JSON schema generation, truncation).
  2. Opened a fresh CDP session for every tool call. That is fine for one-shot queries but blocks any workflow that needs to keep a page open across turns (login, multi-step navigation, form fills).
  3. Had no way to share a browser session across multiple tools or across multiple LLM steps without re-implementing session storage in the agent.

Codemode already had the right shape for (1), so the work was to make codemode expressive enough that a single provider can own a stateful resource for the lifetime of one execution, and then build the browser session manager on top of it.

Architectural shift

Before:

LLM -> browser_search / browser_execute (AI SDK tool) -> CDP -> Browser Rendering
LLM -> code_execute (codemode)           -> codemode sandbox -> workspace

After:

LLM -> browser_execute (codemode tool with cdp provider) -> codemode sandbox -> CDP -> Browser Rendering
                                                                     ^
                                                                     |
                                                       sessions managed via SessionManager
                                                       (one-shot | reuse | dynamic)

The model now writes JS against a cdp namespace inside the sandbox instead of calling two separate AI SDK tools. Browser session lifecycle moves into a SessionManager that the provider creates per code-mode runtime.

What changed, package by package

@cloudflare/codemode (patch)

The executor contract grew two concepts so providers can own resources for the lifetime of one execution:

  • ProviderRuntime — a { fns, dispose? } pair returned by a new optional createRuntime() on ResolvedProvider. The executor calls createRuntime() at the start of execute() and dispose() when the run finishes (success or failure). Existing providers that only expose fns keep working unchanged.
  • transformResult on createCodeTool — lets a tool transform the final value before it is serialized back to the model. We need this so the browser tool can apply structured truncation without forcing every codemode caller to do it.

Truncation utilities (truncateResponse, truncateResult, sandboxResponseText) move into packages/codemode/src/truncate.ts and are re-exported from the package root. agents/browser had its own copy; it now imports the shared one.

Tests for both new behaviors live in packages/codemode/src/tests/executor.test.ts and tool.test.ts.

agents (minor — agents/browser)

New SessionManager (packages/agents/src/browser/session-manager.ts, ~430 lines) implements the three policies documented in design/browser-tools.md:

Mode Behavior
one-shot Fresh Browser Rendering session per code-mode runtime; closed on dispose()
reuse Always-on session keyed by a user-supplied key; reconnected per runtime
dynamic One-shot until the model calls cdp.startSession(), then promoted to reusable

Reuse and dynamic modes require a BrowserSessionStore (interface in the same file) so the sessionId survives DO hibernation. DurableBrowserSessionStore is the SQLite-backed default.

The store must hold a per-key lock for the full browser lease. This prevents two provider runtimes with the same key from racing to create competing sessions or one closing a session another is using. The lock contract is documented inline on BrowserSessionStore and tested across the new test cases in session-manager.test.ts.

Supporting Browser Rendering helpers (createBrowserSession, connectBrowserSession, deleteBrowserSession, etc.) move into a dedicated browser-run.ts so they can be reused outside the provider.

createBrowserProvider() now contributes both the existing cdp.spec() / cdp.send() / cdp.attachToTarget() and new session-management RPCs: cdp.sessionInfo(), cdp.startSession(), cdp.closeSession(), cdp.resetSession(). The provider returns a ProviderRuntime so codemode disposes the session correctly.

@cloudflare/think (patch)

packages/think/src/tools/browser.ts shrinks from ~140 lines to ~80. It is now a thin wrapper that registers one codemode tool with createBrowserProvider() as the provider and truncateResult as the result transform. The old browser_search AI SDK tool is gone — protocol discovery happens via cdp.spec() inside browser_execute. Public types are re-exported from agents/browser so consumers do not need a second import.

Examples

examples/assistant gets a working browser session demo (client UI + worker config + tests) showing the dynamic session mode end to end. examples/ai-chat is updated to match the new tool shape.

What to look at first

Suggested review order:

  1. design/browser-tools.md — the updated design doc explains the three session modes, the lock contract, and the tradeoffs. Start here.
  2. packages/codemode/src/executor-types.ts and packages/codemode/src/executor.ts — the runtime contract change. Small and self-contained.
  3. packages/agents/src/browser/session-manager.ts — the bulk of the new logic. Tests in session-manager.test.ts cover each mode and the lock behavior.
  4. packages/agents/src/browser/browser-run.ts — Browser Rendering REST helpers. Mostly mechanical.
  5. packages/agents/src/browser/index.ts and shared.ts — provider wiring.
  6. packages/think/src/tools/browser.ts — the consumer side; mostly deletions.
  7. examples/assistant/src/client.tsx + server.ts — the end-to-end shape from a user's perspective.

Compatibility

  • Codemode changes are additive: createRuntime and transformResult are optional. No published API was removed.
  • @cloudflare/think/tools/browser removes the browser_search tool. Any agent that referenced it by name in a system prompt needs to switch to telling the model to call cdp.spec() inside browser_execute. This is called out in the changeset and in the updated docs/think/tools.md.
  • The cdpUrl escape hatch for custom CDP endpoints is preserved.

Verification

  • npm run check passes locally.
  • npm run test passes, including the new executor.test.ts, tool.test.ts, and session-manager.test.ts suites.
  • Browser end-to-end tests (packages/agents/src/browser-tests/) run against a real wrangler dev instance with the Browser Rendering binding and cover all three session modes.
  • examples/assistant has been exercised manually against wrangler dev.

Changesets

Package Bump Note
@cloudflare/codemode patch Provider runtimes, transformResult, shared truncation exports
agents minor Browser Run session primitives, structured truncation default
@cloudflare/think patch Browser tools moved to codemode-backed browser_execute

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 2, 2026

🦋 Changeset detected

Latest commit: 8784547

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
agents Minor
@cloudflare/codemode Patch
@cloudflare/think Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cjol cjol force-pushed the feat/browser-codemode-tools branch from f52ceea to 5fa18a9 Compare June 2, 2026 15:24
Problem:
Codemode tools could run against a fixed set of resolved functions, but browser automation needs provider-owned runtime state. A browser provider has to acquire a CDP connection for a single execution, expose RPC helpers to sandboxed code, and reliably release the connection afterward. Browser output also exposed a second issue: large CDP or sandbox results could overwhelm tool responses unless truncation was handled consistently outside the browser package.

Solution:
This adds provider runtimes to the codemode executor. A resolved provider can now create per-execution runtime functions and an optional disposer, while the executor still keeps provider namespaces isolated in the sandbox. Code tools also get an explicit final result transform option, used by callers that need to shape or truncate results before returning them to model-facing tool surfaces.

This commit also moves truncation into codemode as a general utility. `truncateResponse` is for text-only responses, `truncateResult` preserves structured values unless they are oversized, and `sandboxResponseText` avoids double-truncating sandbox output that already has truncation metadata.

Decisions:
The result transform is an explicit `createCodeTool` option, not a provider-level hook. Provider-level transforms would be easy to compose incorrectly and could affect unrelated provider output. Truncation is exported only from the root `@cloudflare/codemode` entry point because it is a codemode concern, not a browser-specific API surface.

Review focus:
Please focus on executor lifecycle semantics, especially runtime disposal on success and failure, preservation of provider namespace isolation, and whether the truncation utilities have the right boundaries for callers that return either text or structured values.
@cjol cjol force-pushed the feat/browser-codemode-tools branch from 5fa18a9 to b87231b Compare June 2, 2026 15:53
cjol added 3 commits June 2, 2026 17:03
Problem:
The existing browser tools opened a fresh browser session for each execute call. That kept cleanup simple, but it made multi-step browser workflows awkward because state such as tabs, cookies, local storage, and navigation context could not survive across tool calls. We also needed a cleaner bridge from `agents/browser` into codemode provider runtimes rather than keeping browser-specific execution logic outside codemode.

Solution:
This adds Browser Run session primitives and a codemode browser provider in `agents/browser`. The provider exposes a `cdp` namespace to sandboxed code, including `cdp.spec()`, `cdp.send()`, `cdp.attachToTarget()`, debug log helpers, and reusable-session helpers when session reuse is enabled.

The session manager supports three policies: `one-shot`, `reuse`, and `dynamic`. One-shot remains the default. Reuse always connects through a stored Browser Run session. Dynamic starts as one-shot and lets the model opt into persistence by calling `cdp.startSession()`. Durable Object storage can back reusable session ids through `DurableBrowserSessionStore`, and per-key locking prevents competing executions from racing over the same Browser Run session.

Decisions:
Reusable and dynamic sessions are only supported with the Browser Rendering binding. Custom `cdpUrl` endpoints are treated as externally managed and stay one-shot. Browser state remains host-side; sandboxed code only receives RPC helpers. Browser helper tools opt into codemode's structured truncation, but truncation stays in codemode rather than in `agents/browser`.

Review focus:
Please focus on Browser Run lifecycle management, lock handling, cleanup behavior, and whether the public `agents/browser` exports are the right shape for an experimental provider API. The browser tests cover one-shot, reusable, and dynamic session behavior against the local Browser Rendering binding.
Problem:
Think's experimental browser tools still carried the older two-tool shape: `browser_search` for CDP spec discovery and `browser_execute` for live browser commands. That duplicated concepts now handled by the codemode browser provider, and it split discovery and execution even though both are part of the same CDP helper surface.

Solution:
This moves Think browser tools onto the `agents/browser` codemode provider. `browser_execute` now exposes both CDP spec discovery and browser commands through the `cdp` namespace, so models can call `cdp.spec()` inside the same execution environment they use for CDP commands. The Think browser tests and docs are updated for the new surface, and Think's dependency range now requires the codemode version that includes provider runtimes and result truncation.

Decisions:
This removes the experimental `browser_search` tool instead of preserving a compatibility wrapper. The preferred path is a single code-mode browser tool because it gives the model one place to discover protocol metadata and act on it. The agents commit briefly kept compatibility exports so the stack stayed green at that point in history, and this commit removes them once Think no longer imports the old helpers.

Review focus:
Please focus on the prompt and API shape for `browser_execute`, the migration from `browser_search` to `cdp.spec()`, and whether the tests cover the important Think-level behavior without duplicating lower-level browser provider tests.
Problem:
The package changes add reusable browser sessions, dynamic session control, and richer browser result rendering, but the examples did not show how an app should wire those pieces together. Reviewers and users need a concrete reference for exposing Browser Run sessions through agents and surfacing live browser metadata in a UI.

Solution:
This updates the assistant and ai-chat examples to demonstrate browser session flows on top of the new provider-backed tools. The assistant example wires Browser Rendering and Worker Loader bindings, adds shared workspace/browser-session test coverage, and updates generated env types and config. The ai-chat example updates client rendering so button-only browser rich results can be displayed cleanly.

Decisions:
This commit intentionally stays in examples only and does not add a changeset. The package-level behavior and release notes live in the earlier commits. The examples are kept as integration references rather than adding another abstraction layer over the browser provider APIs.

Review focus:
Please focus on whether the examples make the new browser session behavior understandable, whether the UI handles browser result metadata clearly, and whether the config/env changes are limited to what the demos need.
@cjol cjol force-pushed the feat/browser-codemode-tools branch from b87231b to be24ace Compare June 2, 2026 16:06
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 2, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1656

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1656

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1656

hono-agents

npm i https://pkg.pr.new/hono-agents@1656

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1656

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1656

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1656

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1656

commit: 8784547

@cjol cjol marked this pull request as ready for review June 3, 2026 09:34
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment thread packages/agents/src/browser/browser-run.ts
@threepointone
Copy link
Copy Markdown
Contributor

is this related to #1492

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use browser inside codemode

2 participants