Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ integrations/
│ ├── logs/ # Logging utilities
│ ├── mastra/ # Mastra AI agent integration
│ ├── mongodb/ # MongoDB data extraction & storage
│ ├── openai/ # OpenAI Realtime voice agent + browser agent
│ ├── stripe/ # Stripe Issuing + automation
│ ├── temporal/ # Temporal workflow orchestration
│ ├── trigger/ # Trigger.dev background jobs & automation
Expand Down
7 changes: 7 additions & 0 deletions examples/integrations/openai/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
BROWSERBASE_API_KEY=
BROWSERBASE_PROJECT_ID=
OPENAI_REALTIME_MODEL=gpt-realtime-2
OPENAI_REALTIME_VOICE=marin
BROWSE_BIN=browse
7 changes: 7 additions & 0 deletions examples/integrations/openai/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.next
node_modules
.env
.env.local
.env.*.local
.DS_Store
tsconfig.tsbuildinfo
55 changes: 55 additions & 0 deletions examples/integrations/openai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# OpenAI Realtime + Browserbase

Give your voice agent access to the whole web.

A **voice agent** (OpenAI Realtime, speech-to-speech) talks with the user. A **browser agent** (Claude driving a Browserbase session) operates a real browser underneath it — opening sites, clicking, and reading pages. They share one live session, so what the agent says stays in sync with what it does.

The user watches the browser work live, and can interrupt or redirect at any time, just by talking.

## Required environment variables

Create `.env.local` or `.env` with:

```bash
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
BROWSERBASE_API_KEY=
BROWSERBASE_PROJECT_ID=
```

Optional:

```bash
OPENAI_REALTIME_MODEL=gpt-realtime-2
OPENAI_REALTIME_VOICE=marin
BROWSE_BIN=browse
```

## Run locally

```bash
pnpm install
pnpm dev
```

Open:

```text
http://127.0.0.1:3002
```

Click **Start voice**, allow the microphone, and just talk — ask it to open a site, search, click, or read a page.

The browser agent uses the Browserbase [Browse CLI](https://www.npmjs.com/package/browse), which must be installed and available on your `PATH`. If it lives elsewhere, point `BROWSE_BIN` at it.

## How it works

Two cooperating agents joined by a server-side connection:

- **Voice plane** — the browser connects to OpenAI Realtime over WebRTC (audio is peer-to-peer). The voice agent has one tool, `control_browser`, which it calls whenever the user wants something done on the web.
- **Server bridge** — the connect route creates the Realtime call, then attaches a server-side WebSocket to the same call so the backend can answer `control_browser` tool calls and speak the result back in the same conversation.
- **Browser plane** — one **persistent Claude agent runs for the whole call**. Each `control_browser` instruction is appended to the same conversation, so the agent remembers everything it has already opened and done (the user can say "go back to the first result and compare"). It drives the browser through compact tools (`navigate`, `click`, `type_text`, `press_key`, `go_back`, `read_page`) against a shared Browserbase session shown in the live-view iframe.

Because the tool call only returns once the browser work has actually happened — and answers are grounded in (and quoted from) the live page — the spoken conversation stays in sync with the screen instead of narrating ahead of it.

This is a prototype meant to inspire people building voice agents: the same pattern works with any speech-to-speech voice runtime in front of a Browserbase-backed browser agent.
22 changes: 22 additions & 0 deletions examples/integrations/openai/app/api/demo/control/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import { NextResponse } from "next/server";
import { z } from "zod";
import { runDemoInstruction } from "../../../../lib/demo-controller";

export const runtime = "nodejs";

const bodySchema = z.object({
demoId: z.string().uuid(),
instruction: z.string().min(3),
interrupt: z.boolean().optional()
});

export async function POST(request: Request) {
try {
const parsed = bodySchema.parse(await request.json());
const snapshot = await runDemoInstruction(parsed);
return NextResponse.json(snapshot);
} catch (error) {
const message = error instanceof Error ? error.message : "Demo control failed.";
return NextResponse.json({ error: message }, { status: 500 });
}
}
15 changes: 15 additions & 0 deletions examples/integrations/openai/app/api/demo/session/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import { NextResponse } from "next/server";
import { getDemoSnapshot } from "../../../../lib/demo-controller";

export const runtime = "nodejs";

export async function GET(request: Request) {
const url = new URL(request.url);
const demoId = url.searchParams.get("demoId");

if (!demoId) {
return NextResponse.json({ error: "Missing demoId." }, { status: 400 });
}

return NextResponse.json(getDemoSnapshot(demoId));
}
48 changes: 48 additions & 0 deletions examples/integrations/openai/app/api/demo/stream/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import { getDemoSnapshot, subscribeToDemo } from "../../../../lib/demo-controller";

export const runtime = "nodejs";

export async function GET(request: Request) {
const url = new URL(request.url);
const demoId = url.searchParams.get("demoId");

if (!demoId) {
return new Response("Missing demoId.", { status: 400 });
}

const encoder = new TextEncoder();

const stream = new ReadableStream<Uint8Array>({
start(controller) {
const sendSnapshot = (snapshot = getDemoSnapshot(demoId)) => {
controller.enqueue(
encoder.encode(`event: snapshot\ndata: ${JSON.stringify(snapshot)}\n\n`)
);
};

sendSnapshot();

const unsubscribe = subscribeToDemo(demoId, (snapshot) => {
sendSnapshot(snapshot);
});

const heartbeat = setInterval(() => {
controller.enqueue(encoder.encode("event: ping\ndata: {}\n\n"));
}, 15000);

request.signal.addEventListener("abort", () => {
clearInterval(heartbeat);
unsubscribe();
controller.close();
});
}
});

return new Response(stream, {
headers: {
"Cache-Control": "no-cache, no-transform",
Connection: "keep-alive",
"Content-Type": "text/event-stream"
}
});
}
57 changes: 57 additions & 0 deletions examples/integrations/openai/app/api/realtime/connect/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import { NextResponse } from "next/server";
import { attachRealtimeSideband } from "../../../../lib/openai-realtime-sideband";
import { buildRealtimeSessionConfig } from "../../../../lib/realtime-config";

export const runtime = "nodejs";

function getOpenAiApiKey() {
return process.env.OPENAI_API_KEY ?? process.env.openai_key ?? null;
}

export async function POST(request: Request) {
const apiKey = getOpenAiApiKey();
if (!apiKey) {
return NextResponse.json({ error: "OPENAI_API_KEY is missing." }, { status: 500 });
}

const url = new URL(request.url);
const demoId = url.searchParams.get("demoId");
if (!demoId) {
return NextResponse.json({ error: "Missing demoId." }, { status: 400 });
}

const sdp = await request.text();
if (!sdp.trim()) {
return NextResponse.json({ error: "Missing SDP offer." }, { status: 400 });
}

const formData = new FormData();
formData.set("sdp", sdp);
formData.set("session", JSON.stringify(buildRealtimeSessionConfig()));

const response = await fetch("https://api.openai.com/v1/realtime/calls", {
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`
},
body: formData
});

const answerSdp = await response.text();
if (!response.ok) {
return new Response(answerSdp || "Failed to create Realtime call.", { status: response.status });
}

const location = response.headers.get("Location");
const callId = location?.split("/").pop() ?? null;
if (callId) {
attachRealtimeSideband({ callId, demoId });
}

return new Response(answerSdp, {
headers: {
"Content-Type": "application/sdp",
...(callId ? { "X-OpenAI-Call-Id": callId } : {})
}
});
}
Loading