feat(browser): Reusable browser sessions in codemode#1656
Open
cjol wants to merge 6 commits into
Open
Conversation
🦋 Changeset detectedLatest commit: 8784547 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
f52ceea to
5fa18a9
Compare
Problem: Codemode tools could run against a fixed set of resolved functions, but browser automation needs provider-owned runtime state. A browser provider has to acquire a CDP connection for a single execution, expose RPC helpers to sandboxed code, and reliably release the connection afterward. Browser output also exposed a second issue: large CDP or sandbox results could overwhelm tool responses unless truncation was handled consistently outside the browser package. Solution: This adds provider runtimes to the codemode executor. A resolved provider can now create per-execution runtime functions and an optional disposer, while the executor still keeps provider namespaces isolated in the sandbox. Code tools also get an explicit final result transform option, used by callers that need to shape or truncate results before returning them to model-facing tool surfaces. This commit also moves truncation into codemode as a general utility. `truncateResponse` is for text-only responses, `truncateResult` preserves structured values unless they are oversized, and `sandboxResponseText` avoids double-truncating sandbox output that already has truncation metadata. Decisions: The result transform is an explicit `createCodeTool` option, not a provider-level hook. Provider-level transforms would be easy to compose incorrectly and could affect unrelated provider output. Truncation is exported only from the root `@cloudflare/codemode` entry point because it is a codemode concern, not a browser-specific API surface. Review focus: Please focus on executor lifecycle semantics, especially runtime disposal on success and failure, preservation of provider namespace isolation, and whether the truncation utilities have the right boundaries for callers that return either text or structured values.
5fa18a9 to
b87231b
Compare
Problem: The existing browser tools opened a fresh browser session for each execute call. That kept cleanup simple, but it made multi-step browser workflows awkward because state such as tabs, cookies, local storage, and navigation context could not survive across tool calls. We also needed a cleaner bridge from `agents/browser` into codemode provider runtimes rather than keeping browser-specific execution logic outside codemode. Solution: This adds Browser Run session primitives and a codemode browser provider in `agents/browser`. The provider exposes a `cdp` namespace to sandboxed code, including `cdp.spec()`, `cdp.send()`, `cdp.attachToTarget()`, debug log helpers, and reusable-session helpers when session reuse is enabled. The session manager supports three policies: `one-shot`, `reuse`, and `dynamic`. One-shot remains the default. Reuse always connects through a stored Browser Run session. Dynamic starts as one-shot and lets the model opt into persistence by calling `cdp.startSession()`. Durable Object storage can back reusable session ids through `DurableBrowserSessionStore`, and per-key locking prevents competing executions from racing over the same Browser Run session. Decisions: Reusable and dynamic sessions are only supported with the Browser Rendering binding. Custom `cdpUrl` endpoints are treated as externally managed and stay one-shot. Browser state remains host-side; sandboxed code only receives RPC helpers. Browser helper tools opt into codemode's structured truncation, but truncation stays in codemode rather than in `agents/browser`. Review focus: Please focus on Browser Run lifecycle management, lock handling, cleanup behavior, and whether the public `agents/browser` exports are the right shape for an experimental provider API. The browser tests cover one-shot, reusable, and dynamic session behavior against the local Browser Rendering binding.
Problem: Think's experimental browser tools still carried the older two-tool shape: `browser_search` for CDP spec discovery and `browser_execute` for live browser commands. That duplicated concepts now handled by the codemode browser provider, and it split discovery and execution even though both are part of the same CDP helper surface. Solution: This moves Think browser tools onto the `agents/browser` codemode provider. `browser_execute` now exposes both CDP spec discovery and browser commands through the `cdp` namespace, so models can call `cdp.spec()` inside the same execution environment they use for CDP commands. The Think browser tests and docs are updated for the new surface, and Think's dependency range now requires the codemode version that includes provider runtimes and result truncation. Decisions: This removes the experimental `browser_search` tool instead of preserving a compatibility wrapper. The preferred path is a single code-mode browser tool because it gives the model one place to discover protocol metadata and act on it. The agents commit briefly kept compatibility exports so the stack stayed green at that point in history, and this commit removes them once Think no longer imports the old helpers. Review focus: Please focus on the prompt and API shape for `browser_execute`, the migration from `browser_search` to `cdp.spec()`, and whether the tests cover the important Think-level behavior without duplicating lower-level browser provider tests.
Problem: The package changes add reusable browser sessions, dynamic session control, and richer browser result rendering, but the examples did not show how an app should wire those pieces together. Reviewers and users need a concrete reference for exposing Browser Run sessions through agents and surfacing live browser metadata in a UI. Solution: This updates the assistant and ai-chat examples to demonstrate browser session flows on top of the new provider-backed tools. The assistant example wires Browser Rendering and Worker Loader bindings, adds shared workspace/browser-session test coverage, and updates generated env types and config. The ai-chat example updates client rendering so button-only browser rich results can be displayed cleanly. Decisions: This commit intentionally stays in examples only and does not add a changeset. The package-level behavior and release notes live in the earlier commits. The examples are kept as integration references rather than adding another abstraction layer over the browser provider APIs. Review focus: Please focus on whether the examples make the new browser session behavior understandable, whether the UI handles browser result metadata clearly, and whether the config/env changes are limited to what the demos need.
b87231b to
be24ace
Compare
agents
@cloudflare/ai-chat
@cloudflare/codemode
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
Contributor
|
is this related to #1492 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR refactors browser tooling to run inside codemode and introduces the primitives needed to make those sessions reusable across tool calls. It supercedes #1492 and closes #1501.
Three packages change together because they share a hand-off:
codemodegrows the runtime contract,agents/browserimplements a provider that satisfies it, andthinkswitches from its old hand-rolled browser tools to that provider.Demo
Opening a persistent browser session, saving a screenshot to the workspace, display of the browser session in the UI, interactive manipulation of the live browser preview, saving another screenshot. No audio.
GMT20260603-092551_Clip_Christopher.Little-Savage.s.Clip.06_03_2026.mp4
Why
The old
browser_search/browser_executetools in@cloudflare/thinkwere AI SDK tools that:@cloudflare/codemode(sandboxed code execution, JSON schema generation, truncation).Codemode already had the right shape for (1), so the work was to make codemode expressive enough that a single provider can own a stateful resource for the lifetime of one execution, and then build the browser session manager on top of it.
Architectural shift
Before:
After:
The model now writes JS against a
cdpnamespace inside the sandbox instead of calling two separate AI SDK tools. Browser session lifecycle moves into aSessionManagerthat the provider creates per code-mode runtime.What changed, package by package
@cloudflare/codemode(patch)The executor contract grew two concepts so providers can own resources for the lifetime of one execution:
ProviderRuntime— a{ fns, dispose? }pair returned by a new optionalcreateRuntime()onResolvedProvider. The executor callscreateRuntime()at the start ofexecute()anddispose()when the run finishes (success or failure). Existing providers that only exposefnskeep working unchanged.transformResultoncreateCodeTool— lets a tool transform the final value before it is serialized back to the model. We need this so the browser tool can apply structured truncation without forcing every codemode caller to do it.Truncation utilities (
truncateResponse,truncateResult,sandboxResponseText) move intopackages/codemode/src/truncate.tsand are re-exported from the package root.agents/browserhad its own copy; it now imports the shared one.Tests for both new behaviors live in
packages/codemode/src/tests/executor.test.tsandtool.test.ts.agents(minor —agents/browser)New
SessionManager(packages/agents/src/browser/session-manager.ts, ~430 lines) implements the three policies documented indesign/browser-tools.md:dispose()cdp.startSession(), then promoted to reusableReuse and dynamic modes require a
BrowserSessionStore(interface in the same file) so thesessionIdsurvives DO hibernation.DurableBrowserSessionStoreis the SQLite-backed default.The store must hold a per-key lock for the full browser lease. This prevents two provider runtimes with the same key from racing to create competing sessions or one closing a session another is using. The lock contract is documented inline on
BrowserSessionStoreand tested across the new test cases insession-manager.test.ts.Supporting Browser Rendering helpers (
createBrowserSession,connectBrowserSession,deleteBrowserSession, etc.) move into a dedicatedbrowser-run.tsso they can be reused outside the provider.createBrowserProvider()now contributes both the existingcdp.spec()/cdp.send()/cdp.attachToTarget()and new session-management RPCs:cdp.sessionInfo(),cdp.startSession(),cdp.closeSession(),cdp.resetSession(). The provider returns aProviderRuntimeso codemode disposes the session correctly.@cloudflare/think(patch)packages/think/src/tools/browser.tsshrinks from ~140 lines to ~80. It is now a thin wrapper that registers one codemode tool withcreateBrowserProvider()as the provider andtruncateResultas the result transform. The oldbrowser_searchAI SDK tool is gone — protocol discovery happens viacdp.spec()insidebrowser_execute. Public types are re-exported fromagents/browserso consumers do not need a second import.Examples
examples/assistantgets a working browser session demo (client UI + worker config + tests) showing the dynamic session mode end to end.examples/ai-chatis updated to match the new tool shape.What to look at first
Suggested review order:
design/browser-tools.md— the updated design doc explains the three session modes, the lock contract, and the tradeoffs. Start here.packages/codemode/src/executor-types.tsandpackages/codemode/src/executor.ts— the runtime contract change. Small and self-contained.packages/agents/src/browser/session-manager.ts— the bulk of the new logic. Tests insession-manager.test.tscover each mode and the lock behavior.packages/agents/src/browser/browser-run.ts— Browser Rendering REST helpers. Mostly mechanical.packages/agents/src/browser/index.tsandshared.ts— provider wiring.packages/think/src/tools/browser.ts— the consumer side; mostly deletions.examples/assistant/src/client.tsx+server.ts— the end-to-end shape from a user's perspective.Compatibility
createRuntimeandtransformResultare optional. No published API was removed.@cloudflare/think/tools/browserremoves thebrowser_searchtool. Any agent that referenced it by name in a system prompt needs to switch to telling the model to callcdp.spec()insidebrowser_execute. This is called out in the changeset and in the updateddocs/think/tools.md.cdpUrlescape hatch for custom CDP endpoints is preserved.Verification
npm run checkpasses locally.npm run testpasses, including the newexecutor.test.ts,tool.test.ts, andsession-manager.test.tssuites.packages/agents/src/browser-tests/) run against a realwrangler devinstance with the Browser Rendering binding and cover all three session modes.examples/assistanthas been exercised manually againstwrangler dev.Changesets
@cloudflare/codemodetransformResult, shared truncation exportsagents@cloudflare/thinkbrowser_execute