diff --git a/.changeset/v020-release.md b/.changeset/v020-release.md
new file mode 100644
index 0000000..53e17c1
--- /dev/null
+++ b/.changeset/v020-release.md
@@ -0,0 +1,9 @@
+---
+"@browseragentprotocol/protocol": minor
+"@browseragentprotocol/logger": minor
+"@browseragentprotocol/client": minor
+"@browseragentprotocol/server-playwright": minor
+"@browseragentprotocol/mcp": minor
+---
+
+v0.2.0 — browser selection, clean tool names, smarter extract
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
new file mode 100644
index 0000000..fd04a4a
--- /dev/null
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,7 @@
+{
+ "name": "BAPBrowser",
+ "description": "AI-optimized browser automation with semantic selectors, smart observations, and multi-step actions. Navigate, click, fill forms, extract data, and take screenshots across Chrome, Firefox, and WebKit.",
+ "author": {
+ "name": "Browser Agent Protocol"
+ }
+}
diff --git a/.gitignore b/.gitignore
index 8bc0fef..1a20a60 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,3 +33,6 @@ coverage/
# Claude Code instructions (local only)
CLAUDE.md
docs/plans/
+
+# Internal roadmap (local only)
+ROADMAP.md
diff --git a/.mcp.json b/.mcp.json
new file mode 100644
index 0000000..fcf645a
--- /dev/null
+++ b/.mcp.json
@@ -0,0 +1,8 @@
+{
+ "mcpServers": {
+ "bap-browser": {
+ "command": "npx",
+ "args": ["-y", "@browseragentprotocol/mcp@0.2.0"]
+ }
+ }
+}
diff --git a/README.md b/README.md
index 10bdac1..d880a86 100644
--- a/README.md
+++ b/README.md
@@ -1,11 +1,11 @@
# Browser Agent Protocol (BAP)
-[](https://www.npmjs.com/package/@browseragentprotocol/protocol)
+[](https://www.npmjs.com/package/@browseragentprotocol/mcp)
[](https://opensource.org/licenses/Apache-2.0)
An open standard for AI agents to interact with web browsers.
-> **v0.1.0:** First public release. APIs may evolve based on feedback.
+> **v0.2.0:** Renamed MCP tools, auto-reconnect, multi-context support, streaming, and more. APIs may evolve based on feedback.
## Overview
@@ -49,69 +49,104 @@ BAP (Browser Agent Protocol) provides a standardized way for AI agents to contro
## Quick Start
-### Using with MCP (Recommended for AI Agents)
+BAP works with any MCP-compatible client. The server auto-starts — no separate setup needed.
-BAP works with any MCP-compatible client including Claude Code, Claude Desktop, OpenAI Codex, and Google Antigravity.
+### Claude Code
-**Claude Code:**
+**MCP server** (one command):
```bash
-claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp
+claude mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp
+```
+
+**Plugin** (includes SKILL.md for smarter tool usage):
+```bash
+claude plugin add --from https://github.com/browseragentprotocol/bap
```
- 
+ 
Claude Code browsing Hacker News with BAP
-**Claude Desktop** (`claude_desktop_config.json`):
+### Claude Desktop
+
+Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):
+
```json
{
"mcpServers": {
"bap-browser": {
"command": "npx",
- "args": ["@browseragentprotocol/mcp"]
+ "args": ["-y", "@browseragentprotocol/mcp"]
}
}
}
```
+Restart Claude Desktop after saving.
+
- 
+ 
Claude Desktop browsing Hacker News with BAP
-**Codex CLI:**
+### Codex CLI
+
```bash
-codex mcp add bap-browser -- npx @browseragentprotocol/mcp
+codex mcp add bap-browser -- npx -y @browseragentprotocol/mcp
+```
+
+Or add to `~/.codex/config.toml`:
+
+```toml
+[mcp_servers.bap-browser]
+command = "npx"
+args = ["-y", "@browseragentprotocol/mcp"]
```
- 
+ 
Codex CLI browsing Hacker News with BAP
-**Codex Desktop** (`~/.codex/config.toml`):
+### Codex Desktop
+
+Add to `~/.codex/config.toml`:
+
```toml
[mcp_servers.bap-browser]
command = "npx"
-args = ["@browseragentprotocol/mcp"]
+args = ["-y", "@browseragentprotocol/mcp"]
```
- 
+ 
Codex Desktop browsing Hacker News with BAP
-> **💡 Tip:** Codex may default to web search. Be explicit: *"Using the bap-browser MCP tools..."*
+### Browser Selection
+
+By default, BAP uses your locally installed Chrome. You can choose a different browser with the `--browser` flag:
+
+```bash
+npx @browseragentprotocol/mcp --browser firefox
+```
+| Value | Browser | Notes |
+|---|---|---|
+| `chrome` (default) | Local Chrome | Falls back to bundled Chromium if not installed |
+| `chromium` | Bundled Chromium | Playwright's built-in Chromium |
+| `firefox` | Firefox | Requires local Firefox |
+| `webkit` | WebKit | Playwright's WebKit engine |
+| `edge` | Microsoft Edge | Requires local Edge |
-**Antigravity** (`mcp_config.json` via "..." → MCP Store → Manage):
+In a JSON MCP config, pass the flag via args:
```json
{
"mcpServers": {
"bap-browser": {
"command": "npx",
- "args": ["@browseragentprotocol/mcp"]
+ "args": ["-y", "@browseragentprotocol/mcp", "--browser", "firefox"]
}
}
}
@@ -255,7 +290,7 @@ const data = await client.extract({
});
```
-> **Note:** `agent/extract` (and `bap_extract` in MCP) uses heuristic-based extraction (CSS patterns). For complex pages, consider using `bap_content` to get page content as markdown and extract data yourself.
+> **Note:** `agent/extract` (and `extract` in MCP) uses heuristic-based extraction (CSS patterns). For complex pages, consider using `content` to get page content as markdown and extract data yourself.
## Server Options
diff --git a/packages/client/src/__tests__/client-methods.test.ts b/packages/client/src/__tests__/client-methods.test.ts
index 85e3445..ee2190d 100644
--- a/packages/client/src/__tests__/client-methods.test.ts
+++ b/packages/client/src/__tests__/client-methods.test.ts
@@ -197,6 +197,43 @@ describe("BAPClient Methods", () => {
expect(parsed.params.headless).toBe(true);
});
+ it("launch() passes channel param correctly", async () => {
+ const { client, transport } = await createConnectedClient();
+ transport.setAutoResponse("browser/launch", { browserId: "browser-2" });
+
+ const result = await client.launch({ browser: "chromium", channel: "chrome", headless: false });
+
+ expect(result.browserId).toBe("browser-2");
+
+ const launchRequest = transport.sentMessages.find((msg) => {
+ const parsed = JSON.parse(msg);
+ return parsed.method === "browser/launch";
+ });
+ expect(launchRequest).toBeDefined();
+ const parsed = JSON.parse(launchRequest!);
+ expect(parsed.params.browser).toBe("chromium");
+ expect(parsed.params.channel).toBe("chrome");
+ expect(parsed.params.headless).toBe(false);
+ });
+
+ it("launch() without channel works (backwards compat)", async () => {
+ const { client, transport } = await createConnectedClient();
+ transport.setAutoResponse("browser/launch", { browserId: "browser-3" });
+
+ const result = await client.launch({ browser: "firefox" });
+
+ expect(result.browserId).toBe("browser-3");
+
+ const launchRequest = transport.sentMessages.find((msg) => {
+ const parsed = JSON.parse(msg);
+ return parsed.method === "browser/launch";
+ });
+ expect(launchRequest).toBeDefined();
+ const parsed = JSON.parse(launchRequest!);
+ expect(parsed.params.browser).toBe("firefox");
+ expect(parsed.params.channel).toBeUndefined();
+ });
+
it("closeBrowser() sends correct request", async () => {
const { client, transport } = await createConnectedClient();
transport.setAutoResponse("browser/close", {});
diff --git a/packages/client/src/index.ts b/packages/client/src/index.ts
index cac28b5..33c6602 100644
--- a/packages/client/src/index.ts
+++ b/packages/client/src/index.ts
@@ -376,7 +376,7 @@ export class BAPClient extends EventEmitter {
this.options = {
token: options.token,
name: options.name ?? "bap-client",
- version: options.version ?? "0.1.0",
+ version: options.version ?? "0.2.0",
timeout: options.timeout ?? 30000,
events: options.events ?? ["page", "console", "network", "dialog"],
};
diff --git a/packages/mcp/README.md b/packages/mcp/README.md
index c3bf6f1..9ab3e2d 100644
--- a/packages/mcp/README.md
+++ b/packages/mcp/README.md
@@ -1,100 +1,63 @@
# @browseragentprotocol/mcp
-MCP (Model Context Protocol) server for Browser Agent Protocol. Enables AI assistants to control web browsers.
-
-## Supported Clients
-
-| Client | Status |
-|--------|--------|
-| Claude Code | Supported |
-| Claude Desktop | Supported |
-| OpenAI Codex | Supported |
-| Google Antigravity | Supported |
-| Any MCP-compatible client | Supported |
+MCP (Model Context Protocol) server for Browser Agent Protocol. Gives any MCP-compatible AI agent full browser control.
## Installation
-### With Claude Code
+### One command — standalone mode
```bash
-# Add the BAP browser server
-claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp
+npx @browseragentprotocol/mcp
```
-That's it! Claude Code can now control browsers. Try asking: *"Go to example.com and take a screenshot"*
+This auto-starts a BAP Playwright server and exposes browser tools over MCP stdio. No separate server process needed.
-### With OpenAI Codex
+### Add to an MCP client
+**Claude Code:**
```bash
-# Add the BAP browser server
-codex mcp add bap-browser -- npx @browseragentprotocol/mcp
-```
-
-Or add to your `~/.codex/config.toml`:
-
-```toml
-[mcp_servers.bap-browser]
-command = "npx"
-args = ["@browseragentprotocol/mcp"]
+claude mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp
```
-### With Claude Desktop
-
-Add to your config file:
-
-**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
-**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
-
+**Claude Desktop** — add to `claude_desktop_config.json`:
```json
{
"mcpServers": {
"bap-browser": {
"command": "npx",
- "args": ["@browseragentprotocol/mcp"]
+ "args": ["-y", "@browseragentprotocol/mcp"]
}
}
}
```
-Restart Claude Desktop after saving.
-
-### With Google Antigravity
-
-1. Open the MCP Store via the **"..."** dropdown at the top of the editor's agent panel
-2. Click **Manage MCP Servers**
-3. Click **View raw config**
-4. Add the BAP browser server to your `mcp_config.json`:
+**Codex CLI:**
+```bash
+codex mcp add bap-browser -- npx -y @browseragentprotocol/mcp
+```
-```json
-{
- "mcpServers": {
- "bap-browser": {
- "command": "npx",
- "args": ["@browseragentprotocol/mcp"]
- }
- }
-}
+**Codex Desktop** — add to `~/.codex/config.toml`:
+```toml
+[mcp_servers.bap-browser]
+command = "npx"
+args = ["-y", "@browseragentprotocol/mcp"]
```
-5. Save and refresh to load the new configuration
+### Connect to an existing BAP server
-### Standalone
+If you already have a BAP Playwright server running, pass `--url` to skip auto-start:
```bash
-# Start the MCP server (connects to BAP server on localhost:9222)
-npx @browseragentprotocol/mcp
-
-# With custom BAP server URL
-npx @browseragentprotocol/mcp --bap-url ws://localhost:9333
+npx @browseragentprotocol/mcp --url ws://localhost:9222
```
## How It Works
```
┌─────────────┐ MCP ┌─────────────┐ BAP ┌─────────────┐
-│ Claude │ ──────────── │ BAP MCP │ ────────── │ BAP Server │
-│ (or other │ (stdio) │ Server │ (WebSocket)│ (Playwright)│
-│ MCP host) │ │ │ │ │
+│ AI Agent │ ──────────── │ BAP MCP │ ────────── │ BAP Server │
+│ (any MCP │ (stdio) │ Server │ (WebSocket)│ (Playwright)│
+│ client) │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
@@ -103,61 +66,59 @@ npx @browseragentprotocol/mcp --bap-url ws://localhost:9333
└─────────────┘
```
-1. Claude sends tool calls via MCP (stdio transport)
+1. AI agent sends tool calls via MCP (stdio transport)
2. This package translates them to BAP protocol
3. BAP server controls the browser via Playwright
-4. Results flow back to Claude
+4. Results flow back to the agent
## Available Tools
-When connected, Claude has access to these browser automation tools:
-
### Navigation
| Tool | Description |
|------|-------------|
-| `bap_navigate` | Navigate to a URL |
-| `bap_go_back` | Navigate back in browser history |
-| `bap_go_forward` | Navigate forward in browser history |
-| `bap_reload` | Reload the current page |
+| `navigate` | Navigate to a URL |
+| `go_back` | Navigate back in browser history |
+| `go_forward` | Navigate forward in browser history |
+| `reload` | Reload the current page |
### Element Interaction
| Tool | Description |
|------|-------------|
-| `bap_click` | Click an element using semantic selectors |
-| `bap_type` | Type text character by character (first clicks element) |
-| `bap_fill` | Fill a form field (clears existing content first) |
-| `bap_press` | Press keyboard keys (Enter, Tab, shortcuts) |
-| `bap_select` | Select an option from a dropdown |
-| `bap_scroll` | Scroll the page or a specific element |
-| `bap_hover` | Hover over an element |
+| `click` | Click an element using semantic selectors |
+| `type` | Type text character by character (first clicks element) |
+| `fill` | Fill a form field (clears existing content first) |
+| `press` | Press keyboard keys (Enter, Tab, shortcuts) |
+| `select` | Select an option from a dropdown |
+| `scroll` | Scroll the page or a specific element |
+| `hover` | Hover over an element |
### Observation
| Tool | Description |
|------|-------------|
-| `bap_screenshot` | Take a screenshot of the page |
-| `bap_accessibility` | Get the full accessibility tree |
-| `bap_aria_snapshot` | Get a token-efficient YAML accessibility snapshot (~80% fewer tokens) |
-| `bap_content` | Get page text content as text or markdown |
-| `bap_element` | Query element properties (exists, visible, enabled) |
+| `screenshot` | Take a screenshot of the page |
+| `accessibility` | Get the full accessibility tree |
+| `aria_snapshot` | Token-efficient YAML accessibility snapshot (~80% fewer tokens) |
+| `content` | Get page text content as text or markdown |
+| `element` | Query element properties (exists, visible, enabled) |
### Page Management
| Tool | Description |
|------|-------------|
-| `bap_pages` | List all open pages/tabs |
-| `bap_activate_page` | Switch to a different page/tab |
-| `bap_close_page` | Close the current page/tab |
+| `pages` | List all open pages/tabs |
+| `activate_page` | Switch to a different page/tab |
+| `close_page` | Close the current page/tab |
### AI Agent Methods
| Tool | Description |
|------|-------------|
-| `bap_observe` | Get AI-optimized page observation with interactive elements and stable refs |
-| `bap_act` | Execute a sequence of browser actions in a single call |
-| `bap_extract` | Extract structured data from the page using schema and CSS heuristics |
+| `observe` | AI-optimized page observation with interactive elements and stable refs |
+| `act` | Execute a sequence of browser actions in a single call |
+| `extract` | Extract structured data from the page using schema and CSS heuristics |
### Selector Formats
@@ -168,73 +129,24 @@ role:button:Submit # ARIA role + accessible name (recommended)
text:Sign in # Visible text content
label:Email address # Associated label
testid:submit-button # data-testid attribute
-ref:@submitBtn # Stable element reference from bap_observe
+ref:@submitBtn # Stable element reference from observe
css:.btn-primary # CSS selector (fallback)
xpath://button[@type] # XPath selector (fallback)
```
-## Example Conversations
-
-**You:** Go to Hacker News and tell me the top 3 stories
-
-**Claude:** I'll browse to Hacker News and get the top stories for you.
-
-*[Uses bap_navigate, bap_accessibility]*
-
-Here are the top 3 stories on Hacker News right now:
-1. "Show HN: I built a tool for..."
-2. "Why we switched from..."
-3. "The future of..."
-
----
-
-**You:** Fill out the contact form on example.com with my details
-
-**Claude:** I'll navigate to the contact form and fill it out.
-
-*[Uses bap_navigate, bap_fill, bap_click]*
-
-Done! I've filled in the form with your details and submitted it.
-
## CLI Options
```
Options:
- --bap-url, -u BAP server URL (default: ws://localhost:9222)
- --allowed-domains Comma-separated list of allowed domains (e.g., "*.example.com,trusted.org")
- --verbose Enable verbose logging
- --help Show help
-```
-
-## Managing the Server
-
-```bash
-# List configured MCP servers
-claude mcp list
-
-# Get details for the BAP browser server
-claude mcp get bap-browser
-
-# Remove the server
-claude mcp remove bap-browser
-
-# Check server status (within Claude Code)
-/mcp
-```
-
-## Configuration Scopes
-
-When adding the server with Claude Code, you can specify where to store the configuration:
-
-```bash
-# Local scope (default) - only you, only this project
-claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp
-
-# User scope - available to you across all projects
-claude mcp add --transport stdio --scope user bap-browser -- npx @browseragentprotocol/mcp
-
-# Project scope - shared with team via .mcp.json
-claude mcp add --transport stdio --scope project bap-browser -- npx @browseragentprotocol/mcp
+ -b, --browser Browser: chrome (default), chromium, firefox, webkit, edge
+ -u, --url Connect to existing BAP server (skips auto-start)
+ -p, --port Port for auto-started server (default: 9222)
+ --headless Run browser headless (default: true)
+ --no-headless Visible browser window
+ --allowed-domains Comma-separated list of allowed domains
+ -v, --verbose Enable verbose logging to stderr
+ -h, --help Show help
+ --version Show version
```
## Programmatic Usage
@@ -248,14 +160,15 @@ const server = new BAPMCPServer({
verbose: true,
});
-await server.start();
+await server.run();
```
## Requirements
- Node.js >= 20.0.0
-- A running BAP server (`npx @browseragentprotocol/server-playwright`)
-- An MCP-compatible client (Claude Code, Claude Desktop, OpenAI Codex, etc.)
+- An MCP-compatible client
+
+The BAP Playwright server is auto-started by default. To install Playwright browsers manually: `npx playwright install chromium`.
## Troubleshooting
@@ -263,22 +176,23 @@ await server.start();
On native Windows (not WSL), use the `cmd /c` wrapper:
-```bash
-claude mcp add --transport stdio bap-browser -- cmd /c npx @browseragentprotocol/mcp
+```json
+{
+ "mcpServers": {
+ "bap-browser": {
+ "command": "cmd",
+ "args": ["/c", "npx", "-y", "@browseragentprotocol/mcp"]
+ }
+ }
+}
```
-**Server not connecting?**
-
-Make sure the BAP server is running:
-
-```bash
-npx @browseragentprotocol/server-playwright
-```
+**Server not starting?**
-Then check the MCP server status:
+Ensure Playwright browsers are installed:
```bash
-/mcp # Within Claude Code
+npx playwright install chromium
```
## License
diff --git a/packages/mcp/package.json b/packages/mcp/package.json
index 3eeb663..9d2347e 100644
--- a/packages/mcp/package.json
+++ b/packages/mcp/package.json
@@ -41,8 +41,8 @@
"automation",
"ai",
"agent",
- "claude",
- "anthropic"
+ "playwright",
+ "web-scraping"
],
"scripts": {
"build": "tsup src/index.ts src/cli.ts --format cjs,esm --dts --clean",
diff --git a/packages/mcp/src/__tests__/standalone.test.ts b/packages/mcp/src/__tests__/standalone.test.ts
new file mode 100644
index 0000000..6adc397
--- /dev/null
+++ b/packages/mcp/src/__tests__/standalone.test.ts
@@ -0,0 +1,181 @@
+import { describe, it, expect } from "vitest";
+import net from "node:net";
+
+/**
+ * Tests for the standalone server management utilities in cli.ts.
+ *
+ * Since cli.ts runs as a script (calls main() at module level), we test
+ * the port-checking logic by reimplementing the core utility functions
+ * that are used by the standalone server management.
+ */
+
+// ---------------------------------------------------------------------------
+// Port detection tests (mirrors isPortInUse from cli.ts)
+// ---------------------------------------------------------------------------
+
+function isPortInUse(port: number, host: string = "localhost"): Promise {
+ return new Promise((resolve) => {
+ const socket = net.createConnection({ port, host });
+ socket.setTimeout(500);
+ socket.on("connect", () => {
+ socket.destroy();
+ resolve(true);
+ });
+ socket.on("timeout", () => {
+ socket.destroy();
+ resolve(false);
+ });
+ socket.on("error", () => {
+ socket.destroy();
+ resolve(false);
+ });
+ });
+}
+
+async function waitForServer(
+ port: number,
+ host: string = "localhost",
+ timeoutMs: number = 2000,
+ intervalMs: number = 50,
+): Promise {
+ const start = Date.now();
+ while (Date.now() - start < timeoutMs) {
+ if (await isPortInUse(port, host)) {
+ return;
+ }
+ await new Promise((resolve) => setTimeout(resolve, intervalMs));
+ }
+ throw new Error(`Server did not start within ${timeoutMs}ms on port ${port}`);
+}
+
+describe("standalone server utilities", () => {
+ describe("isPortInUse()", () => {
+ it("returns false for a port with nothing listening", async () => {
+ // Use a random high port that's unlikely to be in use
+ const result = await isPortInUse(59999);
+ expect(result).toBe(false);
+ });
+
+ it("returns true when a server is listening on the port", async () => {
+ // Start a temporary TCP server
+ const server = net.createServer();
+ const port = await new Promise((resolve) => {
+ server.listen(0, "localhost", () => {
+ const addr = server.address();
+ if (addr && typeof addr === "object") {
+ resolve(addr.port);
+ }
+ });
+ });
+
+ try {
+ const result = await isPortInUse(port, "localhost");
+ expect(result).toBe(true);
+ } finally {
+ server.close();
+ }
+ });
+ });
+
+ describe("waitForServer()", () => {
+ it("resolves immediately if server is already running", async () => {
+ const server = net.createServer();
+ const port = await new Promise((resolve) => {
+ server.listen(0, "localhost", () => {
+ const addr = server.address();
+ if (addr && typeof addr === "object") {
+ resolve(addr.port);
+ }
+ });
+ });
+
+ try {
+ // Should resolve almost instantly
+ const start = Date.now();
+ await waitForServer(port, "localhost", 2000, 50);
+ const elapsed = Date.now() - start;
+ expect(elapsed).toBeLessThan(500);
+ } finally {
+ server.close();
+ }
+ });
+
+ it("waits for a server that starts after a delay", async () => {
+ const server = net.createServer();
+ // Find a free port
+ const port = await new Promise((resolve) => {
+ const tmp = net.createServer();
+ tmp.listen(0, "localhost", () => {
+ const addr = tmp.address();
+ if (addr && typeof addr === "object") {
+ const p = addr.port;
+ tmp.close(() => resolve(p));
+ }
+ });
+ });
+
+ // Start the server after 200ms
+ const startTimer = setTimeout(() => {
+ server.listen(port, "localhost");
+ }, 200);
+
+ try {
+ await waitForServer(port, "localhost", 3000, 50);
+ // If we get here, the server was detected
+ expect(true).toBe(true);
+ } finally {
+ clearTimeout(startTimer);
+ server.close();
+ }
+ });
+
+ it("throws if server does not start within timeout", async () => {
+ await expect(
+ waitForServer(59998, "localhost", 300, 50)
+ ).rejects.toThrow("Server did not start within 300ms");
+ });
+ });
+
+ describe("CLI argument parsing", () => {
+ it("standalone mode is the default (no --url)", () => {
+ // When no --url is provided, isStandalone should be true
+ const url = undefined;
+ const isStandalone = !url;
+ expect(isStandalone).toBe(true);
+ });
+
+ it("providing --url disables standalone mode", () => {
+ const url = "ws://remote:9222";
+ const isStandalone = !url;
+ expect(isStandalone).toBe(false);
+ });
+
+ it("default port is 9222", () => {
+ const configPort: number | undefined = undefined;
+ const port = configPort ?? 9222;
+ expect(port).toBe(9222);
+ });
+
+ it("custom port overrides default", () => {
+ const configPort: number | undefined = 9333;
+ const port = configPort ?? 9222;
+ expect(port).toBe(9333);
+ });
+
+ it("browser mapping to server-playwright names", () => {
+ const browserMap: Record = {
+ chrome: "chromium",
+ chromium: "chromium",
+ firefox: "firefox",
+ webkit: "webkit",
+ edge: "chromium",
+ };
+
+ expect(browserMap["chrome"]).toBe("chromium");
+ expect(browserMap["chromium"]).toBe("chromium");
+ expect(browserMap["firefox"]).toBe("firefox");
+ expect(browserMap["webkit"]).toBe("webkit");
+ expect(browserMap["edge"]).toBe("chromium");
+ });
+ });
+});
diff --git a/packages/mcp/src/cli.ts b/packages/mcp/src/cli.ts
index e9da4a8..17de706 100644
--- a/packages/mcp/src/cli.ts
+++ b/packages/mcp/src/cli.ts
@@ -3,16 +3,24 @@
* @fileoverview BAP MCP Server CLI
*
* Run the BAP MCP server from the command line.
+ * By default, auto-starts a BAP Playwright server (standalone mode).
+ * Use --url to connect to an existing BAP server instead.
*
* Usage:
- * bap-mcp # Use defaults (ws://localhost:9222)
- * bap-mcp --url ws://host:port # Custom BAP server URL
+ * bap-mcp # Standalone: auto-starts BAP server
+ * bap-mcp --url ws://host:port # Connect to existing BAP server
+ * bap-mcp --browser firefox # Use Firefox
* bap-mcp --verbose # Enable verbose logging
* bap-mcp --allowed-domains example.com,api.example.com
*/
+import { spawn, type ChildProcess } from "node:child_process";
+import fs from "node:fs";
+import net from "node:net";
+import path from "node:path";
+import { fileURLToPath } from "node:url";
import { Logger, icons, pc } from "@browseragentprotocol/logger";
-import { BAPMCPServer } from "./index.js";
+import { BAPMCPServer, type BrowserChoice } from "./index.js";
// MCP servers should log to stderr to avoid interfering with stdio transport
const log = new Logger({ prefix: "BAP MCP", stderr: true });
@@ -23,7 +31,10 @@ const log = new Logger({ prefix: "BAP MCP", stderr: true });
interface CLIArgs {
url?: string;
+ port?: number;
+ browser?: string;
verbose?: boolean;
+ headless?: boolean;
allowedDomains?: string[];
help?: boolean;
version?: boolean;
@@ -42,8 +53,18 @@ function parseArgs(): CLIArgs {
args.version = true;
} else if (arg === "--url" || arg === "-u" || arg === "--bap-url") {
args.url = argv[++i];
+ } else if (arg === "--port" || arg === "-p") {
+ args.port = parseInt(argv[++i] ?? "9222", 10);
+ } else if (arg === "--browser" || arg === "-b") {
+ args.browser = argv[++i];
} else if (arg === "--verbose" || arg === "-v") {
args.verbose = true;
+ } else if (arg === "--headless") {
+ args.headless = true;
+ } else if (arg === "--headless=true") {
+ args.headless = true;
+ } else if (arg === "--headless=false" || arg === "--no-headless") {
+ args.headless = false;
} else if (arg === "--allowed-domains") {
args.allowedDomains = argv[++i]?.split(",").map((d) => d.trim());
}
@@ -59,45 +80,232 @@ ${pc.bold("BAP MCP Server")} ${pc.dim("- Browser Agent Protocol as MCP")}
${pc.cyan("USAGE")}
${pc.dim("$")} npx @browseragentprotocol/mcp ${pc.dim("[OPTIONS]")}
+ By default, auto-starts a local BAP Playwright server (standalone mode).
+ Pass ${pc.yellow("--url")} to connect to an existing BAP server instead.
+
${pc.cyan("OPTIONS")}
- ${pc.yellow("-u, --url")} ${pc.dim("")} BAP server WebSocket URL ${pc.dim("(default: ws://localhost:9222)")}
+ ${pc.yellow("-b, --browser")} ${pc.dim("")} Browser: chrome ${pc.dim("(default)")}, chromium, firefox, webkit, edge
+ ${pc.yellow("-u, --url")} ${pc.dim("")} Connect to existing BAP server ${pc.dim("(skips auto-start)")}
+ ${pc.yellow("-p, --port")} ${pc.dim("")} Port for auto-started server ${pc.dim("(default: 9222)")}
+ ${pc.yellow("--headless")} Run browser in headless mode ${pc.dim("(default: true)")}
+ ${pc.yellow("--no-headless")} Run with visible browser window
${pc.yellow("-v, --verbose")} Enable verbose logging to stderr
${pc.yellow("--allowed-domains")} ${pc.dim("")} Comma-separated list of allowed domains
${pc.yellow("-h, --help")} Show this help message
${pc.yellow("--version")} Show version
${pc.cyan("EXAMPLES")}
- ${pc.dim("# Start with defaults (connect to localhost:9222)")}
+ ${pc.dim("# Standalone mode (auto-starts server, uses local Chrome)")}
${pc.dim("$")} npx @browseragentprotocol/mcp
- ${pc.dim("# Connect to a remote BAP server")}
- ${pc.dim("$")} npx @browseragentprotocol/mcp --url ws://192.168.1.100:9222
+ ${pc.dim("# Use Firefox")}
+ ${pc.dim("$")} npx @browseragentprotocol/mcp --browser firefox
+
+ ${pc.dim("# Visible browser window")}
+ ${pc.dim("$")} npx @browseragentprotocol/mcp --no-headless
- ${pc.dim("# Enable verbose logging")}
- ${pc.dim("$")} npx @browseragentprotocol/mcp --verbose
+ ${pc.dim("# Connect to a remote BAP server (skips auto-start)")}
+ ${pc.dim("$")} npx @browseragentprotocol/mcp --url ws://192.168.1.100:9222
- ${pc.dim("# Restrict to specific domains (security)")}
+ ${pc.dim("# Domain allowlist")}
${pc.dim("$")} npx @browseragentprotocol/mcp --allowed-domains example.com,api.example.com
-${pc.cyan("CLAUDE DESKTOP")}
- Add to ${pc.dim("claude_desktop_config.json")}:
+${pc.cyan("MCP CLIENT SETUP")}
+ ${pc.dim("Add to any MCP-compatible client (config examples):")}
+ ${pc.dim("JSON config:")}
${pc.dim("{")}
${pc.dim('"mcpServers"')}: {
${pc.green('"bap-browser"')}: {
"command": "npx",
- "args": ["@browseragentprotocol/mcp"]
+ "args": ["-y", "@browseragentprotocol/mcp"]
}
}
${pc.dim("}")}
-${pc.cyan("CLAUDE CODE")}
- ${pc.dim("$")} claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp
+ ${pc.dim("CLI:")}
+ ${pc.dim("$")} ${pc.dim("")} mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp
-${pc.dim("For more information:")} ${pc.cyan("https://github.com/browseragentprotocol/bap")}
+${pc.dim("Docs:")} ${pc.cyan("https://github.com/browseragentprotocol/bap")}
`);
}
+// =============================================================================
+// Standalone Server Management
+// =============================================================================
+
+/**
+ * Check if a port is available by attempting a TCP connection.
+ * Returns true if something is already listening on the port.
+ */
+function isPortInUse(port: number, host: string = "localhost"): Promise {
+ return new Promise((resolve) => {
+ const socket = net.createConnection({ port, host });
+ socket.setTimeout(500);
+ socket.on("connect", () => {
+ socket.destroy();
+ resolve(true);
+ });
+ socket.on("timeout", () => {
+ socket.destroy();
+ resolve(false);
+ });
+ socket.on("error", () => {
+ socket.destroy();
+ resolve(false);
+ });
+ });
+}
+
+/**
+ * Wait for a server to become available on the given port.
+ * Polls with the specified interval until timeout.
+ */
+async function waitForServer(
+ port: number,
+ host: string = "localhost",
+ timeoutMs: number = 15000,
+ intervalMs: number = 150,
+): Promise {
+ const start = Date.now();
+
+ while (Date.now() - start < timeoutMs) {
+ if (await isPortInUse(port, host)) {
+ return;
+ }
+ await new Promise((resolve) => setTimeout(resolve, intervalMs));
+ }
+
+ throw new Error(
+ `BAP server did not start within ${timeoutMs / 1000}s on port ${port}. ` +
+ `Ensure Playwright browsers are installed: npx playwright install chromium`
+ );
+}
+
+/**
+ * Resolve the command to start the server-playwright CLI.
+ *
+ * In monorepo development, the sibling package's built CLI is used directly
+ * to avoid npx overhead. In published (npm install) usage, falls back to npx.
+ */
+function resolveServerCommand(): { command: string; args: string[] } {
+ try {
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
+ const siblingCli = path.resolve(__dirname, "../../server-playwright/dist/cli.js");
+
+ if (fs.existsSync(siblingCli)) {
+ return { command: "node", args: [siblingCli] };
+ }
+ } catch {
+ // import.meta.url resolution failed — not in ESM context, fall through
+ }
+
+ return { command: "npx", args: ["-y", "@browseragentprotocol/server-playwright"] };
+}
+
+interface StandaloneServerOptions {
+ port: number;
+ host: string;
+ browser: string;
+ headless: boolean;
+ verbose: boolean;
+}
+
+/**
+ * Start the BAP Playwright server as a child process.
+ *
+ * If a server is already listening on the target port, reuses it and returns
+ * null (caller should not attempt to kill it on shutdown).
+ *
+ * Otherwise, spawns the server-playwright CLI, waits for it to be ready,
+ * and returns the ChildProcess handle for lifecycle management.
+ */
+async function startStandaloneServer(
+ options: StandaloneServerOptions,
+): Promise {
+ const { port, host, browser, headless, verbose } = options;
+
+ // Reuse an existing server if one is already on this port
+ if (await isPortInUse(port, host)) {
+ log.info(`BAP server already running on ${host}:${port}, reusing`);
+ return null;
+ }
+
+ log.info(`Starting BAP Playwright server on port ${port}...`);
+
+ const { command, args } = resolveServerCommand();
+ const serverArgs = [
+ ...args,
+ "--port", port.toString(),
+ "--host", host,
+ headless ? "--headless" : "--no-headless",
+ ];
+
+ // Map MCP-level browser names to server-playwright's accepted values
+ const browserMap: Record = {
+ chrome: "chromium",
+ chromium: "chromium",
+ firefox: "firefox",
+ webkit: "webkit",
+ edge: "chromium",
+ };
+ serverArgs.push("--browser", browserMap[browser] ?? "chromium");
+
+ if (verbose) {
+ serverArgs.push("--debug");
+ }
+
+ const child = spawn(command, serverArgs, {
+ stdio: ["ignore", "pipe", "pipe"],
+ detached: false,
+ env: { ...process.env },
+ });
+
+ // Pipe server output to stderr when verbose (MCP uses stdout for stdio transport)
+ if (verbose) {
+ child.stdout?.on("data", (data: Buffer) => {
+ process.stderr.write(`[BAP Server] ${data.toString()}`);
+ });
+ child.stderr?.on("data", (data: Buffer) => {
+ process.stderr.write(`[BAP Server] ${data.toString()}`);
+ });
+ }
+
+ child.on("error", (err) => {
+ log.error("Failed to start BAP server", err);
+ });
+
+ child.on("exit", (code, signal) => {
+ if (code !== null && code !== 0) {
+ log.error(`BAP server exited with code ${code}`);
+ } else if (signal && verbose) {
+ log.info(`BAP server stopped (${signal})`);
+ }
+ });
+
+ // Wait for the server to become available
+ try {
+ await waitForServer(port, host);
+ log.info(`BAP server ready on ws://${host}:${port}`);
+ } catch (err) {
+ child.kill("SIGTERM");
+ throw err;
+ }
+
+ return child;
+}
+
+/**
+ * Kill a child process gracefully (SIGTERM), escalating to SIGKILL after 500ms.
+ */
+async function killServer(child: ChildProcess): Promise {
+ child.kill("SIGTERM");
+ await new Promise((resolve) => setTimeout(resolve, 500));
+ if (!child.killed) {
+ child.kill("SIGKILL");
+ }
+}
+
// =============================================================================
// Main
// =============================================================================
@@ -112,54 +320,84 @@ async function main(): Promise {
if (args.version) {
console.error(
- `${icons.connection} BAP MCP Server ${pc.dim("v0.1.0-alpha.1")}`
+ `${icons.connection} BAP MCP Server ${pc.dim("v0.2.0")}`
);
process.exit(0);
}
- // Set log level based on verbose flag
if (args.verbose) {
log.setLevel("debug");
}
- const server = new BAPMCPServer({
- bapServerUrl: args.url,
- verbose: args.verbose,
- allowedDomains: args.allowedDomains,
- });
+ // Determine mode: standalone (auto-start server) vs connect to existing
+ const isStandalone = !args.url;
+ const port = args.port ?? 9222;
+ const host = "localhost";
+ const bapServerUrl = args.url ?? `ws://${host}:${port}`;
+ let serverProcess: ChildProcess | null = null;
- // Handle graceful shutdown
- const shutdown = async (signal: string) => {
- if (args.verbose) {
- log.info(`${signal} received, shutting down...`);
- }
- await server.close();
- if (args.verbose) {
- log.success("Server stopped");
+ try {
+ if (isStandalone) {
+ if (args.verbose) {
+ log.info("Standalone mode: auto-starting BAP Playwright server");
+ }
+
+ serverProcess = await startStandaloneServer({
+ port,
+ host,
+ browser: args.browser ?? "chrome",
+ headless: args.headless ?? true,
+ verbose: args.verbose ?? false,
+ });
}
- process.exit(0);
- };
- process.on("SIGINT", () => shutdown("SIGINT"));
- process.on("SIGTERM", () => shutdown("SIGTERM"));
+ const server = new BAPMCPServer({
+ bapServerUrl,
+ browser: args.browser as BrowserChoice | undefined,
+ headless: args.headless ?? true,
+ verbose: args.verbose,
+ allowedDomains: args.allowedDomains,
+ });
- // Handle uncaught errors
- process.on("uncaughtException", (error) => {
- log.error("Uncaught exception", error);
- process.exit(1);
- });
+ // Graceful shutdown — clean up MCP server and child process
+ const shutdown = async (signal: string) => {
+ if (args.verbose) {
+ log.info(`${signal} received, shutting down...`);
+ }
- process.on("unhandledRejection", (reason) => {
- log.error("Unhandled rejection", reason);
- process.exit(1);
- });
+ await server.close();
+
+ if (serverProcess) {
+ await killServer(serverProcess);
+ }
+
+ if (args.verbose) {
+ log.success("Server stopped");
+ }
+ process.exit(0);
+ };
+
+ process.on("SIGINT", () => shutdown("SIGINT"));
+ process.on("SIGTERM", () => shutdown("SIGTERM"));
+
+ process.on("uncaughtException", (error) => {
+ log.error("Uncaught exception", error);
+ serverProcess?.kill("SIGTERM");
+ process.exit(1);
+ });
+
+ process.on("unhandledRejection", (reason) => {
+ log.error("Unhandled rejection", reason);
+ serverProcess?.kill("SIGTERM");
+ process.exit(1);
+ });
- try {
if (args.verbose) {
- log.info(`Connecting to BAP server at ${args.url ?? "ws://localhost:9222"}`);
+ log.info(`Connecting to BAP server at ${bapServerUrl}`);
}
await server.run();
} catch (error) {
+ serverProcess?.kill("SIGTERM");
log.error("Failed to start server", error);
process.exit(1);
}
diff --git a/packages/mcp/src/index.ts b/packages/mcp/src/index.ts
index 89f9318..ee4e040 100644
--- a/packages/mcp/src/index.ts
+++ b/packages/mcp/src/index.ts
@@ -1,14 +1,14 @@
/**
* @fileoverview BAP MCP Integration
* @module @browseragentprotocol/mcp
- * @version 0.1.0
+ * @version 0.2.0
*
* Exposes Browser Agent Protocol as an MCP (Model Context Protocol) server.
- * Allows AI agents like Claude to control browsers through standardized MCP tools.
+ * Allows AI agents to control browsers through standardized MCP tools.
*
* TODO (MEDIUM): Add input validation on tool arguments before passing to BAP client
* TODO (MEDIUM): Enforce session timeout (maxSessionDuration) - currently unused
- * TODO (MEDIUM): Add resource cleanup on partial failure in ensureClient()
+ * TODO (MEDIUM): Add resource cleanup on partial failure in ensureClient() — DONE (v0.2.0)
* TODO (LOW): parseSelector should validate empty/whitespace-only strings
* TODO (LOW): Consider sanitizing URLs in verbose logging to prevent token leakage
*/
@@ -45,13 +45,19 @@ import {
// Types
// =============================================================================
+export type BrowserChoice = "chrome" | "chromium" | "firefox" | "webkit" | "edge";
+
export interface BAPMCPServerOptions {
/** BAP server URL (default: ws://localhost:9222) */
bapServerUrl?: string;
- /** Server name for MCP (default: bap-browser) */
+ /** Server name for MCP (default: BAPBrowser) */
name?: string;
/** Server version (default: 1.0.0) */
version?: string;
+ /** Browser to use: chrome (default), chromium, firefox, webkit, edge */
+ browser?: BrowserChoice;
+ /** Run browser in headless mode (default: true) */
+ headless?: boolean;
/** Enable verbose logging */
verbose?: boolean;
/** Allowed domains for navigation (empty = all allowed) */
@@ -195,7 +201,7 @@ function formatSelectorForDisplay(selector: BAPSelector): string {
const TOOLS: Tool[] = [
// Navigation
{
- name: "bap_navigate",
+ name: "navigate",
description:
"Navigate the browser to a URL. Use this to open web pages. Returns the page title and URL after navigation.",
inputSchema: {
@@ -217,7 +223,7 @@ const TOOLS: Tool[] = [
// Element Interaction
{
- name: "bap_click",
+ name: "click",
description:
'Click an element on the page. Use semantic selectors like "role:button:Submit" or "text:Sign in" for reliability.',
inputSchema: {
@@ -237,7 +243,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_type",
+ name: "type",
description:
"Type text into an input field. First clicks the element, then types the text character by character.",
inputSchema: {
@@ -260,9 +266,9 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_fill",
+ name: "fill",
description:
- "Fill an input field with text (clears existing content first). Faster than bap_type for form filling.",
+ "Fill an input field with text (clears existing content first). Faster than type for form filling.",
inputSchema: {
type: "object",
properties: {
@@ -279,7 +285,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_press",
+ name: "press",
description: "Press a keyboard key. Use for Enter, Tab, Escape, or keyboard shortcuts like Ctrl+A.",
inputSchema: {
type: "object",
@@ -297,7 +303,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_select",
+ name: "select",
description: "Select an option from a dropdown/select element.",
inputSchema: {
type: "object",
@@ -315,7 +321,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_scroll",
+ name: "scroll",
description: "Scroll the page or a specific element.",
inputSchema: {
type: "object",
@@ -337,7 +343,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_hover",
+ name: "hover",
description: "Hover over an element. Useful for triggering hover menus or tooltips.",
inputSchema: {
type: "object",
@@ -353,7 +359,7 @@ const TOOLS: Tool[] = [
// Observations
{
- name: "bap_screenshot",
+ name: "screenshot",
description:
"Take a screenshot of the current page. Returns the image as base64. Use for visual verification.",
inputSchema: {
@@ -363,11 +369,20 @@ const TOOLS: Tool[] = [
type: "boolean",
description: "Capture full page including scrollable content (default: false)",
},
+ format: {
+ type: "string",
+ enum: ["jpeg", "png"],
+ description: "Image format (default: jpeg). JPEG is ~60% smaller than PNG for typical pages.",
+ },
+ quality: {
+ type: "number",
+ description: "JPEG quality 0-100 (default: 80). Ignored for PNG.",
+ },
},
},
},
{
- name: "bap_accessibility",
+ name: "accessibility",
description:
"Get the accessibility tree of the page. Returns a structured representation ideal for understanding page layout and finding elements. RECOMMENDED: Use this before interacting with elements.",
inputSchema: {
@@ -381,7 +396,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_aria_snapshot",
+ name: "aria_snapshot",
description:
"Get a token-efficient YAML snapshot of the page accessibility tree. ~80% fewer tokens than full accessibility tree. Best for LLM context.",
inputSchema: {
@@ -395,7 +410,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_content",
+ name: "content",
description:
"Get page content as text or markdown. Useful for reading article content or extracting text.",
inputSchema: {
@@ -410,7 +425,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_element",
+ name: "element",
description:
"Query properties of a specific element. Check if an element exists, is visible, enabled, etc.",
inputSchema: {
@@ -435,7 +450,7 @@ const TOOLS: Tool[] = [
// Page Management
{
- name: "bap_pages",
+ name: "pages",
description: "List all open pages/tabs. Returns page IDs and URLs.",
inputSchema: {
type: "object",
@@ -443,21 +458,21 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_activate_page",
+ name: "activate_page",
description: "Switch to a different page/tab by its ID.",
inputSchema: {
type: "object",
properties: {
pageId: {
type: "string",
- description: "ID of the page to activate (from bap_pages)",
+ description: "ID of the page to activate (from pages)",
},
},
required: ["pageId"],
},
},
{
- name: "bap_close_page",
+ name: "close_page",
description: "Close the current page/tab.",
inputSchema: {
type: "object",
@@ -465,7 +480,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_go_back",
+ name: "go_back",
description: "Navigate back in browser history.",
inputSchema: {
type: "object",
@@ -473,7 +488,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_go_forward",
+ name: "go_forward",
description: "Navigate forward in browser history.",
inputSchema: {
type: "object",
@@ -481,7 +496,7 @@ const TOOLS: Tool[] = [
},
},
{
- name: "bap_reload",
+ name: "reload",
description: "Reload the current page.",
inputSchema: {
type: "object",
@@ -491,7 +506,7 @@ const TOOLS: Tool[] = [
// Agent (Composite Actions, Observations, and Data Extraction)
{
- name: "bap_act",
+ name: "act",
description: `Execute a sequence of browser actions in a single call.
Useful for multi-step flows like login, form submission, or navigation sequences.
Each step can have conditions and error handling. More efficient than calling actions individually.`,
@@ -546,7 +561,7 @@ Each step can have conditions and error handling. More efficient than calling ac
},
},
{
- name: "bap_observe",
+ name: "observe",
description: `Get an AI-optimized observation of the current page.
Returns interactive elements with pre-computed selectors, making it easy to determine
what actions are possible. Supports stable element refs and annotated screenshots.
@@ -583,10 +598,10 @@ RECOMMENDED: Use this before complex interactions to understand the page.`,
},
},
{
- name: "bap_extract",
+ name: "extract",
description: `Extract structured data from the current page.
Uses CSS heuristics to find lists, tables, and labeled data matching your schema.
-Works best with standard HTML patterns (ul/ol, tables, cards). For complex pages, use bap_content instead.`,
+Works best with standard HTML patterns (ul/ol, tables, cards). For complex pages, use content instead.`,
inputSchema: {
type: "object",
properties: {
@@ -659,6 +674,21 @@ const RESOURCES: Resource[] = [
},
];
+// =============================================================================
+// Browser Resolution
+// =============================================================================
+
+function resolveBrowser(browser: BrowserChoice): { browser: "chromium" | "firefox" | "webkit"; channel?: string } {
+ switch (browser) {
+ case "chrome": return { browser: "chromium", channel: "chrome" };
+ case "chromium": return { browser: "chromium" };
+ case "firefox": return { browser: "firefox" };
+ case "webkit": return { browser: "webkit" };
+ case "edge": return { browser: "chromium", channel: "msedge" };
+ default: return { browser: "chromium", channel: "chrome" };
+ }
+}
+
// =============================================================================
// BAP MCP Server
// =============================================================================
@@ -675,8 +705,10 @@ export class BAPMCPServer {
constructor(options: BAPMCPServerOptions = {}) {
this.options = {
bapServerUrl: options.bapServerUrl ?? "ws://localhost:9222",
- name: options.name ?? "bap-browser",
+ name: options.name ?? "BAPBrowser",
version: options.version ?? "1.0.0",
+ browser: options.browser ?? "chrome",
+ headless: options.headless ?? true,
verbose: options.verbose ?? false,
allowedDomains: options.allowedDomains ?? [],
maxSessionDuration: options.maxSessionDuration ?? 3600,
@@ -730,28 +762,93 @@ export class BAPMCPServer {
}
/**
- * Ensure BAP client is connected
+ * Ensure BAP client is connected.
+ * - On first call: creates transport, connects, launches browser
+ * - On subsequent calls: checks if connection is still alive, reconnects if needed
*/
private async ensureClient(): Promise {
- if (this.client) {
- return this.client;
+ // If we have a client, verify the connection is still alive
+ if (this.client && this.transport) {
+ try {
+ // Lightweight health check — list pages to confirm the connection works
+ await this.client.listPages();
+ return this.client;
+ } catch {
+ // Connection is dead — clean up and reconnect
+ this.log("Connection lost, reconnecting...");
+ await this.resetClient();
+ }
}
this.log("Connecting to BAP server:", this.options.bapServerUrl);
- this.transport = new WebSocketTransport(this.options.bapServerUrl);
+ this.transport = new WebSocketTransport(this.options.bapServerUrl, {
+ autoReconnect: true,
+ maxReconnectAttempts: 5,
+ reconnectDelay: 1000,
+ });
+
+ // Hook reconnect callbacks for verbose logging
+ this.transport.onReconnecting = (attempt, max) => {
+ this.log(`Reconnecting to BAP server (attempt ${attempt}/${max})...`);
+ };
+ this.transport.onReconnected = () => {
+ this.log("Reconnected to BAP server");
+ };
+ this.transport.onClose = () => {
+ this.log("BAP server connection closed");
+ };
+
this.client = new BAPClient(this.transport);
// Connect and initialize the protocol
await this.client.connect();
- // Launch browser
- await this.client.launch({ headless: false });
+ // Launch browser with configured browser/channel
+ const resolved = resolveBrowser(this.options.browser);
+ const headless = this.options.headless ?? true;
+ try {
+ await this.client.launch({
+ browser: resolved.browser,
+ channel: resolved.channel,
+ headless,
+ });
+ } catch (err) {
+ // If a channel was specified (e.g. local Chrome) and it's not found, fall back to bundled Chromium
+ if (resolved.channel && String(err).includes("Looks like")) {
+ this.log(
+ `Local ${this.options.browser} not found, falling back to bundled Chromium`
+ );
+ await this.client.launch({ browser: "chromium", headless });
+ } else {
+ // Clean up on launch failure to avoid leaking the transport
+ await this.resetClient();
+ throw err;
+ }
+ }
this.log("BAP client connected and browser launched");
return this.client;
}
+ /**
+ * Reset client and transport state for reconnection
+ */
+ private async resetClient(): Promise {
+ try {
+ if (this.client) {
+ await this.client.close();
+ }
+ } catch { /* ignore cleanup errors */ }
+ try {
+ if (this.transport) {
+ await this.transport.close();
+ }
+ } catch { /* ignore cleanup errors */ }
+ this.client = null;
+ this.transport = null;
+ }
+
/**
* Set up MCP request handlers
*/
@@ -815,7 +912,7 @@ export class BAPMCPServer {
switch (name) {
// Navigation
- case "bap_navigate": {
+ case "navigate": {
const url = args.url as string;
// Security check
@@ -853,7 +950,7 @@ export class BAPMCPServer {
}
// Element Interaction
- case "bap_click": {
+ case "click": {
const selector = parseSelector(args.selector as string);
const options = args.clickCount ? { clickCount: args.clickCount as number } : undefined;
await client.click(selector, options);
@@ -862,7 +959,7 @@ export class BAPMCPServer {
};
}
- case "bap_type": {
+ case "type": {
const selector = parseSelector(args.selector as string);
const text = args.text as string;
const delay = args.delay as number | undefined;
@@ -872,7 +969,7 @@ export class BAPMCPServer {
};
}
- case "bap_fill": {
+ case "fill": {
const selector = parseSelector(args.selector as string);
const value = args.value as string;
await client.fill(selector, value);
@@ -881,7 +978,7 @@ export class BAPMCPServer {
};
}
- case "bap_press": {
+ case "press": {
const key = args.key as string;
const selector = args.selector ? parseSelector(args.selector as string) : undefined;
await client.press(key, selector);
@@ -890,7 +987,7 @@ export class BAPMCPServer {
};
}
- case "bap_select": {
+ case "select": {
const selector = parseSelector(args.selector as string);
const value = args.value as string;
await client.select(selector, value);
@@ -899,7 +996,7 @@ export class BAPMCPServer {
};
}
- case "bap_scroll": {
+ case "scroll": {
const direction = (args.direction as ScrollDirection) ?? "down";
const amount = (args.amount as number) ?? 500;
const selector = args.selector ? parseSelector(args.selector as string) : undefined;
@@ -909,7 +1006,7 @@ export class BAPMCPServer {
};
}
- case "bap_hover": {
+ case "hover": {
const selector = parseSelector(args.selector as string);
await client.hover(selector);
return {
@@ -918,9 +1015,11 @@ export class BAPMCPServer {
}
// Observations
- case "bap_screenshot": {
+ case "screenshot": {
const fullPage = args.fullPage as boolean ?? false;
- const result = await client.screenshot({ fullPage });
+ const format = (args.format as string) === "png" ? "png" : "jpeg";
+ const quality = typeof args.quality === "number" ? args.quality : (format === "jpeg" ? 80 : undefined);
+ const result = await client.screenshot({ fullPage, format, quality });
return {
content: [
{
@@ -932,7 +1031,7 @@ export class BAPMCPServer {
};
}
- case "bap_accessibility": {
+ case "accessibility": {
const interestingOnly = args.interestingOnly as boolean ?? true;
const result = await client.accessibility({ interestingOnly });
return {
@@ -945,7 +1044,7 @@ export class BAPMCPServer {
};
}
- case "bap_aria_snapshot": {
+ case "aria_snapshot": {
const selector = args.selector ? parseSelector(args.selector as string) : undefined;
const result = await client.ariaSnapshot(selector);
return {
@@ -958,7 +1057,7 @@ export class BAPMCPServer {
};
}
- case "bap_content": {
+ case "content": {
const format = (args.format as ContentFormat) ?? "text";
const result = await client.content(format);
return {
@@ -966,7 +1065,7 @@ export class BAPMCPServer {
};
}
- case "bap_element": {
+ case "element": {
const selector = parseSelector(args.selector as string);
const properties = (args.properties as ElementProperty[]) ?? ["visible", "enabled"];
const result = await client.element(selector, properties);
@@ -981,7 +1080,7 @@ export class BAPMCPServer {
}
// Page Management
- case "bap_pages": {
+ case "pages": {
const result = await client.listPages();
const text = result.pages
.map((p) => `${p.id === result.activePage ? "* " : " "}${p.id}: ${p.url} (${p.title})`)
@@ -991,7 +1090,7 @@ export class BAPMCPServer {
};
}
- case "bap_activate_page": {
+ case "activate_page": {
const pageId = args.pageId as string;
await client.activatePage(pageId);
return {
@@ -999,28 +1098,28 @@ export class BAPMCPServer {
};
}
- case "bap_close_page": {
+ case "close_page": {
await client.closePage();
return {
content: [{ type: "text", text: "Closed current page" }],
};
}
- case "bap_go_back": {
+ case "go_back": {
await client.goBack();
return {
content: [{ type: "text", text: "Navigated back" }],
};
}
- case "bap_go_forward": {
+ case "go_forward": {
await client.goForward();
return {
content: [{ type: "text", text: "Navigated forward" }],
};
}
- case "bap_reload": {
+ case "reload": {
await client.reload();
return {
content: [{ type: "text", text: "Reloaded page" }],
@@ -1028,7 +1127,7 @@ export class BAPMCPServer {
}
// Agent (Composite Actions, Observations, and Data Extraction)
- case "bap_act": {
+ case "act": {
interface InputStep {
label?: string;
action: string;
@@ -1094,7 +1193,7 @@ export class BAPMCPServer {
};
}
- case "bap_observe": {
+ case "observe": {
const annotate = args.annotateScreenshot as boolean;
const result = await client.observe({
includeScreenshot: (args.includeScreenshot as boolean) || annotate,
@@ -1166,7 +1265,7 @@ export class BAPMCPServer {
return { content };
}
- case "bap_extract": {
+ case "extract": {
const result = await client.extract({
instruction: args.instruction as string,
schema: args.schema as ExtractionSchema,
@@ -1300,14 +1399,7 @@ export class BAPMCPServer {
* Close the server and clean up
*/
async close(): Promise {
- if (this.client) {
- await this.client.close();
- this.client = null;
- }
- if (this.transport) {
- await this.transport.close();
- this.transport = null;
- }
+ await this.resetClient();
await this.server.close();
this.log("BAP MCP Server closed");
}
diff --git a/packages/protocol/src/types/methods.ts b/packages/protocol/src/types/methods.ts
index 9d87de5..d9b0a3a 100644
--- a/packages/protocol/src/types/methods.ts
+++ b/packages/protocol/src/types/methods.ts
@@ -42,6 +42,7 @@ export type ProxyConfig = z.infer;
/** browser/launch parameters */
export const BrowserLaunchParamsSchema = z.object({
browser: BrowserTypeSchema.optional(),
+ channel: z.string().optional(),
headless: z.boolean().optional(),
args: z.array(z.string()).optional(),
proxy: ProxyConfigSchema.optional(),
diff --git a/packages/protocol/src/types/protocol.ts b/packages/protocol/src/types/protocol.ts
index 1ae91b9..77de263 100644
--- a/packages/protocol/src/types/protocol.ts
+++ b/packages/protocol/src/types/protocol.ts
@@ -10,7 +10,7 @@ import { z } from "zod";
// =============================================================================
/** Current BAP protocol version */
-export const BAP_VERSION = "0.1.0";
+export const BAP_VERSION = "0.2.0";
// =============================================================================
// JSON-RPC 2.0 Schemas
@@ -70,18 +70,18 @@ export const JSONRPCNotificationSchema = z.object({
});
export type JSONRPCNotification = z.infer;
-/** Union of all JSON-RPC response types */
+/** Union of all JSON-RPC response types (error first to prevent z.unknown() from swallowing errors) */
export const JSONRPCResponseSchema = z.union([
- JSONRPCSuccessResponseSchema,
JSONRPCErrorResponseSchema,
+ JSONRPCSuccessResponseSchema,
]);
export type JSONRPCResponse = z.infer;
-/** Union of all JSON-RPC message types */
+/** Union of all JSON-RPC message types (error before success to prevent z.unknown() from swallowing errors) */
export const JSONRPCMessageSchema = z.union([
JSONRPCRequestSchema,
- JSONRPCSuccessResponseSchema,
JSONRPCErrorResponseSchema,
+ JSONRPCSuccessResponseSchema,
JSONRPCNotificationSchema,
]);
export type JSONRPCMessage = z.infer;
diff --git a/packages/python-sdk/package.json b/packages/python-sdk/package.json
index e28d347..232de4b 100644
--- a/packages/python-sdk/package.json
+++ b/packages/python-sdk/package.json
@@ -1,6 +1,6 @@
{
"name": "@browseragentprotocol/python-client",
- "version": "0.1.0",
+ "version": "0.2.0",
"private": true,
"description": "Python SDK for Browser Agent Protocol (BAP) - build scripts only",
"scripts": {
@@ -9,5 +9,10 @@
"lint": "python -m ruff check src/browseragentprotocol || true",
"lint:fix": "python -m ruff check --fix src/browseragentprotocol || true",
"clean": "rm -rf dist build *.egg-info .mypy_cache .ruff_cache __pycache__ src/**/__pycache__"
+ },
+ "turbo": {
+ "build": {
+ "outputs": []
+ }
}
}
diff --git a/packages/python-sdk/pyproject.toml b/packages/python-sdk/pyproject.toml
index 7010f88..7f7fe45 100644
--- a/packages/python-sdk/pyproject.toml
+++ b/packages/python-sdk/pyproject.toml
@@ -1,6 +1,6 @@
[project]
name = "browser-agent-protocol"
-version = "0.1.0a1"
+version = "0.2.0"
description = "Python SDK for the Browser Agent Protocol (BAP) - control browsers with AI agents"
readme = "README.md"
license = { text = "Apache-2.0" }
@@ -55,10 +55,10 @@ dev = [
bap = "browseragentprotocol.cli:main"
[project.urls]
-Homepage = "https://github.com/anthropics/browser-agent-protocol"
-Documentation = "https://github.com/anthropics/browser-agent-protocol#readme"
-Repository = "https://github.com/anthropics/browser-agent-protocol"
-Issues = "https://github.com/anthropics/browser-agent-protocol/issues"
+Homepage = "https://github.com/browseragentprotocol/bap"
+Documentation = "https://github.com/browseragentprotocol/bap#readme"
+Repository = "https://github.com/browseragentprotocol/bap"
+Issues = "https://github.com/browseragentprotocol/bap/issues"
[build-system]
requires = ["hatchling"]
diff --git a/packages/python-sdk/src/browseragentprotocol/__init__.py b/packages/python-sdk/src/browseragentprotocol/__init__.py
index 64b1a88..65fc2d0 100644
--- a/packages/python-sdk/src/browseragentprotocol/__init__.py
+++ b/packages/python-sdk/src/browseragentprotocol/__init__.py
@@ -54,7 +54,7 @@ async def main():
```
"""
-__version__ = "0.1.0a1"
+__version__ = "0.2.0"
# Main client classes
from browseragentprotocol.client import BAPClient
diff --git a/packages/python-sdk/src/browseragentprotocol/client.py b/packages/python-sdk/src/browseragentprotocol/client.py
index 0f4aebb..cd3be9f 100644
--- a/packages/python-sdk/src/browseragentprotocol/client.py
+++ b/packages/python-sdk/src/browseragentprotocol/client.py
@@ -108,7 +108,7 @@ def __init__(
*,
token: str | None = None,
name: str = "bap-client-python",
- version: str = "0.1.0",
+ version: str = "0.2.0",
timeout: float = 30.0,
events: list[str] | None = None,
):
@@ -274,6 +274,7 @@ def capabilities(self) -> dict[str, Any] | None:
async def launch(
self,
browser: Literal["chromium", "firefox", "webkit"] | None = None,
+ channel: str | None = None,
headless: bool | None = None,
args: list[str] | None = None,
**kwargs: Any,
@@ -283,6 +284,7 @@ async def launch(
Args:
browser: Browser type (chromium, firefox, webkit)
+ channel: Playwright channel (e.g. "chrome", "msedge")
headless: Run in headless mode
args: Additional browser arguments
**kwargs: Additional launch options
@@ -293,6 +295,8 @@ async def launch(
params: dict[str, Any] = {}
if browser is not None:
params["browser"] = browser
+ if channel is not None:
+ params["channel"] = channel
if headless is not None:
params["headless"] = headless
if args is not None:
diff --git a/packages/python-sdk/src/browseragentprotocol/context.py b/packages/python-sdk/src/browseragentprotocol/context.py
index 0fcba35..40f1b0c 100644
--- a/packages/python-sdk/src/browseragentprotocol/context.py
+++ b/packages/python-sdk/src/browseragentprotocol/context.py
@@ -17,7 +17,7 @@ async def bap_client(
*,
token: str | None = None,
name: str = "bap-client-python",
- version: str = "0.1.0",
+ version: str = "0.2.0",
timeout: float = 30.0,
events: list[str] | None = None,
browser: Literal["chromium", "firefox", "webkit"] | None = None,
diff --git a/packages/python-sdk/src/browseragentprotocol/sync_client.py b/packages/python-sdk/src/browseragentprotocol/sync_client.py
index 379305f..6c1fb58 100644
--- a/packages/python-sdk/src/browseragentprotocol/sync_client.py
+++ b/packages/python-sdk/src/browseragentprotocol/sync_client.py
@@ -74,7 +74,7 @@ def __init__(
*,
token: str | None = None,
name: str = "bap-client-python-sync",
- version: str = "0.1.0",
+ version: str = "0.2.0",
timeout: float = 30.0,
events: list[str] | None = None,
):
@@ -151,13 +151,14 @@ def capabilities(self) -> dict[str, Any] | None:
def launch(
self,
browser: Literal["chromium", "firefox", "webkit"] | None = None,
+ channel: str | None = None,
headless: bool | None = None,
args: list[str] | None = None,
**kwargs: Any,
) -> BrowserLaunchResult:
"""Launch a browser instance."""
return self._run(
- self._async_client.launch(browser=browser, headless=headless, args=args, **kwargs)
+ self._async_client.launch(browser=browser, channel=channel, headless=headless, args=args, **kwargs)
)
def close_browser(self, browser_id: str | None = None) -> None:
diff --git a/packages/python-sdk/src/browseragentprotocol/types/methods.py b/packages/python-sdk/src/browseragentprotocol/types/methods.py
index 9da65e6..c27a342 100644
--- a/packages/python-sdk/src/browseragentprotocol/types/methods.py
+++ b/packages/python-sdk/src/browseragentprotocol/types/methods.py
@@ -84,6 +84,7 @@ class BrowserLaunchParams(BaseModel):
"""Parameters for browser/launch."""
browser: Literal["chromium", "firefox", "webkit"] | None = None
+ channel: str | None = None
headless: bool | None = None
args: list[str] | None = None
env: dict[str, str] | None = None
diff --git a/packages/python-sdk/src/browseragentprotocol/types/protocol.py b/packages/python-sdk/src/browseragentprotocol/types/protocol.py
index 3f449d4..cb3deb8 100644
--- a/packages/python-sdk/src/browseragentprotocol/types/protocol.py
+++ b/packages/python-sdk/src/browseragentprotocol/types/protocol.py
@@ -12,7 +12,7 @@
# Protocol Version
# =============================================================================
-BAP_VERSION = "0.1.0"
+BAP_VERSION = "0.2.0"
# =============================================================================
# Request ID
diff --git a/packages/server-playwright/README.md b/packages/server-playwright/README.md
index cffb3a4..eb8dee7 100644
--- a/packages/server-playwright/README.md
+++ b/packages/server-playwright/README.md
@@ -81,8 +81,8 @@ await client.close();
### With MCP (for AI agents)
```bash
-# Add to Claude Code
-claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp
+# Add to any MCP-compatible client via CLI
+npx @browseragentprotocol/mcp
```
## Programmatic Usage
diff --git a/packages/server-playwright/src/cli.ts b/packages/server-playwright/src/cli.ts
index b28ddde..be5bf19 100644
--- a/packages/server-playwright/src/cli.ts
+++ b/packages/server-playwright/src/cli.ts
@@ -76,7 +76,7 @@ function parseArgs(): Partial {
process.exit(0);
} else if (arg === "--version" || arg === "-v") {
console.log(
- `${icons.server} BAP Playwright Server ${pc.dim("v0.1.0-alpha.1")}`
+ `${icons.server} BAP Playwright Server ${pc.dim("v0.2.0")}`
);
process.exit(0);
}
@@ -167,7 +167,7 @@ async function main(): Promise {
console.log(
banner({
title: "BAP Playwright Server",
- version: "0.1.0-alpha.1",
+ version: "0.2.0",
subtitle: "Browser Agent Protocol",
})
);
diff --git a/packages/server-playwright/src/server.ts b/packages/server-playwright/src/server.ts
index 6c97a99..4f18856 100644
--- a/packages/server-playwright/src/server.ts
+++ b/packages/server-playwright/src/server.ts
@@ -151,7 +151,7 @@ const ScopeProfiles = {
readonly: ['page:read', 'observe:*'] as BAPScope[],
standard: [
'browser:launch', 'browser:close', 'page:*',
- 'action:click', 'action:type', 'action:fill', 'action:scroll', 'action:select',
+ 'action:*',
'observe:*', 'emulate:viewport',
] as BAPScope[],
full: ['browser:*', 'page:*', 'action:*', 'observe:*', 'emulate:*', 'trace:*'] as BAPScope[],
@@ -396,6 +396,8 @@ export interface BAPServerOptions {
host?: string;
/** Default browser type */
defaultBrowser?: "chromium" | "firefox" | "webkit";
+ /** Default Playwright channel (e.g. "chrome", "msedge") */
+ defaultChannel?: string;
/** Default headless mode */
headless?: boolean;
/** Enable debug logging */
@@ -477,9 +479,10 @@ const BLOCKED_BROWSER_ARGS = [
/**
* SECURE-BY-DEFAULT: All security options enabled by default
*/
-const DEFAULT_OPTIONS: Required> & {
+const DEFAULT_OPTIONS: Required> & {
authToken: string | undefined;
authTokenEnvVar: string;
+ defaultChannel: string | undefined;
security: Required;
limits: Required;
authorization: Required;
@@ -489,6 +492,7 @@ const DEFAULT_OPTIONS: Required 0 ? sanitizedArgs : undefined,
proxy: params.proxy,
downloadsPath: validatedDownloadsPath,
});
// Create the default context
- const defaultContext = await state.browser.newContext();
+ // Force deviceScaleFactor: 1 for consistent screenshot sizes across platforms
+ // (retina Macs default to 2x, which doubles pixel count and inflates payloads)
+ const defaultContext = await state.browser.newContext({
+ deviceScaleFactor: 1,
+ });
const version = state.browser.version();
// Use crypto.randomUUID for unique IDs
const contextId = `ctx-${randomUUID().slice(0, 8)}`;
@@ -1872,16 +1883,19 @@ export class BAPPlaywrightServer extends EventEmitter {
this.checkRateLimit(state, 'screenshot');
// Playwright only supports "png" and "jpeg" for screenshots
+ // Default to JPEG — ~60% smaller payloads, reducing LLM token cost
const screenshotType = (options?.format === "jpeg" || options?.format === "png")
? options.format
- : "png";
+ : "jpeg";
const buffer = await page.screenshot({
fullPage: options?.fullPage,
clip: options?.clip,
type: screenshotType,
- quality: options?.quality,
- scale: options?.scale,
+ quality: options?.quality ?? (screenshotType === "jpeg" ? 80 : undefined),
+ // Default to CSS scale to ensure consistent 1x screenshots regardless
+ // of device pixel ratio (prevents 2x images on retina displays)
+ scale: options?.scale ?? "css",
});
// Parse image dimensions from the buffer
@@ -1890,14 +1904,14 @@ export class BAPPlaywrightServer extends EventEmitter {
let width: number;
let height: number;
- const format = options?.format ?? "png";
+ const format = screenshotType;
if (format === "png" && buffer[0] === 0x89 && buffer[1] === 0x50) {
// PNG: Read dimensions from IHDR chunk (offset 16 for width, 20 for height)
width = buffer.readUInt32BE(16);
height = buffer.readUInt32BE(20);
- } else if ((format === "jpeg" || format === "webp") && buffer.length > 0) {
- // For JPEG/WebP, fall back to viewport dimensions or clip
+ } else if (format === "jpeg" && buffer.length > 0) {
+ // For JPEG, fall back to viewport dimensions or clip
// (Parsing JPEG headers is complex, use viewport as approximation)
const viewport = page.viewportSize() ?? { width: 1280, height: 720 };
if (options?.clip) {
@@ -2679,12 +2693,19 @@ export class BAPPlaywrightServer extends EventEmitter {
// Screenshot (with optional annotation)
if (params.includeScreenshot || params.annotateScreenshot) {
const viewport = page.viewportSize();
- let buffer = await page.screenshot({ type: "png" });
+ // Use JPEG by default for ~60% smaller payloads (less LLM token cost)
+ // Annotations require PNG for sharp badge rendering
+ const useAnnotation = params.annotateScreenshot && interactiveElements && interactiveElements.length > 0;
+ const obsFormat = useAnnotation ? "png" as const : "jpeg" as const;
+ let buffer = await page.screenshot({
+ type: obsFormat,
+ quality: obsFormat === "jpeg" ? 80 : undefined,
+ });
let annotated = false;
let annotationMap: AnnotationMapping[] | undefined;
// Apply annotation if requested
- if (params.annotateScreenshot && interactiveElements && interactiveElements.length > 0) {
+ if (useAnnotation && interactiveElements) {
const annotationOpts: AnnotationOptions = typeof params.annotateScreenshot === 'object'
? params.annotateScreenshot
: { enabled: true };
@@ -2704,7 +2725,7 @@ export class BAPPlaywrightServer extends EventEmitter {
result.screenshot = {
data: buffer.toString("base64"),
- format: "png",
+ format: obsFormat,
width: viewport?.width ?? 0,
height: viewport?.height ?? 0,
annotated,
@@ -2919,123 +2940,480 @@ export class BAPPlaywrightServer extends EventEmitter {
}
/**
- * Extract data from content based on instruction and schema
- * This is a basic implementation - a full implementation would use LLM
+ * Extract data from content based on instruction and schema.
+ *
+ * Strategy:
+ * 1. Scope to the main content area (skip nav/header/footer).
+ * 2. For lists: find repeating item containers, then use schema
+ * property names to locate child elements within each item.
+ * 3. For objects: search for labeled values.
+ * 4. Coerce types based on schema (string, number, boolean).
*/
private async extractDataFromContent(
page: PlaywrightPage,
- content: string,
- _instruction: string, // Used for future LLM-based extraction
+ _content: string,
+ _instruction: string,
schema: { type: string; properties?: Record; items?: unknown },
mode: string,
includeSourceRefs: boolean
): Promise<{ data: unknown; sources?: { ref: string; selector: BAPSelector; text?: string }[]; confidence: number }> {
- // Basic extraction logic based on common patterns
- // This extracts data by finding elements that match the schema structure
-
const sources: { ref: string; selector: BAPSelector; text?: string }[] = [];
+ // ── Step 1: Scope to main content area ──────────────────────────
+ // Try semantic landmarks first, fall back to body
+ const contentRoot = await this.findContentRoot(page);
+
if (schema.type === "array" || mode === "list") {
- // Extract list of items
- const items: unknown[] = [];
+ const items = await this.extractList(
+ page, contentRoot, schema, includeSourceRefs, sources
+ );
+ return {
+ data: items,
+ sources: includeSourceRefs ? sources : undefined,
+ confidence: items.length > 0 ? 0.8 : 0.3,
+ };
+ }
- // Try to find list-like elements based on the instruction
- const listSelectors = [
- 'ul li', 'ol li', '[role="listitem"]', 'tr', '.item', '.card', '[class*="item"]', '[class*="card"]'
- ];
+ if (mode === "table") {
+ const rows = await this.extractTable(
+ page, contentRoot, schema, includeSourceRefs, sources
+ );
+ return {
+ data: rows,
+ sources: includeSourceRefs ? sources : undefined,
+ confidence: rows.length > 0 ? 0.8 : 0.3,
+ };
+ }
+
+ if (schema.type === "object" && schema.properties) {
+ const result = await this.extractObject(
+ page, contentRoot, schema, includeSourceRefs, sources
+ );
+ return {
+ data: result.data,
+ sources: includeSourceRefs ? sources : undefined,
+ confidence: result.confidence,
+ };
+ }
+
+ // Default: return scoped text content
+ const text = await contentRoot.textContent() ?? "";
+ return {
+ data: text.trim().slice(0, 5000),
+ sources: includeSourceRefs ? sources : undefined,
+ confidence: 0.5,
+ };
+ }
+
+ /**
+ * Find the main content area of the page, skipping nav/header/footer.
+ * Returns a Locator scoped to the best content root.
+ */
+ private async findContentRoot(page: PlaywrightPage) {
+ // Try semantic landmarks in priority order
+ const candidates = ['main', '[role="main"]', '#content', '.content', '#main', '.page', '.container', '[role="document"]'];
+ for (const sel of candidates) {
+ try {
+ const loc = page.locator(sel).first();
+ if (await loc.count() > 0) {
+ // Verify it has substantial content (not just a wrapper with nav inside)
+ const text = await loc.textContent() ?? "";
+ if (text.trim().length > 100) return loc;
+ }
+ } catch { /* continue */ }
+ }
+ // Fallback: body
+ return page.locator('body');
+ }
+
+ /**
+ * Extract a list of items from repeating elements.
+ * Uses schema property names to locate child values within each item container.
+ */
+ private async extractList(
+ _page: PlaywrightPage,
+ root: ReturnType,
+ schema: { items?: unknown },
+ includeSourceRefs: boolean,
+ sources: { ref: string; selector: BAPSelector; text?: string }[]
+ ): Promise {
+ const itemSchema = schema.items as { type?: string; properties?: Record } | undefined;
+ const isObjectItems = itemSchema?.type === 'object' && itemSchema.properties;
+
+ // ── Find the best repeating container ────────────────────────────
+ // Selectors are ordered by semantic priority: article > role > class-name > generic.
+ // Use the FIRST selector with 2+ matches rather than the one with the most,
+ // because generic selectors (ul li) often match sidebar/nav noise.
+ const containerSelectors = [
+ 'article', '[role="listitem"]',
+ '.product', '.card', '.item', '.listing', '.result', '.entry', '.post',
+ '[class*="product"]', '[class*="card"]', '[class*="item"]',
+ 'table tbody tr',
+ 'ol li', 'ul li',
+ ];
+
+ let bestSelector = '';
+ let bestCount = 0;
+
+ for (const sel of containerSelectors) {
+ try {
+ const count = await root.locator(sel).count();
+ if (count >= 2) {
+ bestSelector = sel;
+ bestCount = count;
+ break; // First semantic match wins
+ }
+ } catch { /* continue */ }
+ }
+
+ if (!bestSelector || bestCount === 0) return [];
- for (const selector of listSelectors) {
+ const elements = await root.locator(bestSelector).all();
+ const items: unknown[] = [];
+ const limit = Math.min(elements.length, 100);
+
+ for (let i = 0; i < limit; i++) {
+ const el = elements[i]!;
+
+ // Skip elements that are likely not visible or too small
+ try {
+ const box = await el.boundingBox();
+ if (box && (box.width < 10 || box.height < 10)) continue;
+ } catch { /* proceed anyway */ }
+
+ if (isObjectItems && itemSchema?.properties) {
+ // ── Schema-aware extraction: match property names to child elements ──
+ const obj = await this.extractPropertiesFromElement(el, itemSchema.properties);
+ // Only include if at least one property has a non-empty value
+ const hasValue = Object.values(obj).some(v => v !== null && v !== undefined && v !== '');
+ if (hasValue) {
+ items.push(obj);
+ if (includeSourceRefs) {
+ sources.push({
+ ref: `@s${items.length}`,
+ selector: { type: 'css', value: `${bestSelector}:nth-child(${i + 1})` },
+ text: Object.values(obj).filter(Boolean).join(' | ').slice(0, 100),
+ });
+ }
+ }
+ } else {
+ // Simple string items
+ const text = await el.textContent();
+ if (text?.trim()) {
+ items.push(text.trim());
+ if (includeSourceRefs) {
+ sources.push({
+ ref: `@s${items.length}`,
+ selector: { type: 'css', value: `${bestSelector}:nth-child(${i + 1})` },
+ text: text.trim().slice(0, 100),
+ });
+ }
+ }
+ }
+ }
+
+ // ── Fallback: if schema-aware extraction produced 0 items from matched ──
+ // elements, retry with text-based extraction (extract each property by
+ // splitting each container's inner text). This handles cases where CSS
+ // class names don't align with schema property names.
+ if (items.length === 0 && isObjectItems && itemSchema?.properties && elements.length > 0) {
+ const propNames = Object.keys(itemSchema.properties);
+ for (let i = 0; i < limit; i++) {
+ const el = elements[i]!;
try {
- const elements = await page.locator(selector).all();
- if (elements.length > 0) {
- for (let i = 0; i < Math.min(elements.length, 50); i++) {
- const el = elements[i];
- const text = await el.textContent();
- if (text && text.trim()) {
- if (schema.items && typeof schema.items === 'object' && 'type' in schema.items) {
- if ((schema.items as { type: string }).type === 'string') {
- items.push(text.trim());
- } else if ((schema.items as { type: string }).type === 'object') {
- // Try to extract object structure
- items.push({ text: text.trim() });
- }
+ const box = await el.boundingBox();
+ if (box && (box.width < 10 || box.height < 10)) continue;
+ } catch { /* proceed */ }
+
+ const fullText = await el.textContent() ?? '';
+ if (!fullText.trim()) continue;
+
+ const obj: Record = {};
+ // Try to extract known patterns from the full text
+ for (const key of propNames) {
+ const kl = key.toLowerCase();
+ if (kl === 'title' || kl === 'name') {
+ // First link's title attribute, or first heading text
+ try {
+ const heading = el.locator('h1, h2, h3, h4, h5, h6').first();
+ if (await heading.count() > 0) {
+ const link = heading.locator('a').first();
+ if (await link.count() > 0) {
+ obj[key] = await link.getAttribute('title') ?? await link.textContent() ?? null;
} else {
- items.push(text.trim());
+ obj[key] = await heading.textContent() ?? null;
}
+ }
+ } catch { /* skip */ }
+ } else if (kl === 'price' || kl === 'cost' || kl === 'amount') {
+ const priceMatch = fullText.match(/[$€£¥]\s*[\d,.]+|[\d,.]+\s*[$€£¥]/);
+ if (priceMatch) obj[key] = priceMatch[0].trim();
+ } else if (kl === 'url' || kl === 'link' || kl === 'href') {
+ try {
+ const link = el.locator('a').first();
+ if (await link.count() > 0) obj[key] = await link.getAttribute('href');
+ } catch { /* skip */ }
+ } else if (kl === 'rating') {
+ // Try star-rating class pattern (e.g., "star-rating Three")
+ try {
+ const ratingEl = el.locator('[class*="rating"], [class*="star"]').first();
+ if (await ratingEl.count() > 0) {
+ const cls = await ratingEl.getAttribute('class') ?? '';
+ const parts = cls.split(/\s+/).filter(c => !c.toLowerCase().includes('rating') && !c.toLowerCase().includes('star') && c.length > 0);
+ if (parts.length > 0) obj[key] = parts[parts.length - 1];
+ }
+ } catch { /* skip */ }
+ } else if (kl === 'availability' || kl === 'stock' || kl === 'status') {
+ try {
+ const stockEl = el.locator('[class*="avail"], [class*="stock"], .availability, .stock').first();
+ if (await stockEl.count() > 0) {
+ obj[key] = (await stockEl.textContent())?.trim() ?? null;
+ }
+ } catch { /* skip */ }
+ }
+ }
- if (includeSourceRefs) {
- sources.push({
- ref: `@s${i + 1}`,
- selector: { type: 'css', value: `${selector}:nth-child(${i + 1})` },
- text: text.trim().slice(0, 100),
- });
- }
+ const hasValue = Object.values(obj).some(v => v !== null && v !== undefined && v !== '');
+ if (hasValue) {
+ items.push(obj);
+ if (includeSourceRefs) {
+ sources.push({
+ ref: `@s${items.length}`,
+ selector: { type: 'css', value: `${bestSelector}:nth-child(${i + 1})` },
+ text: Object.values(obj).filter(Boolean).join(' | ').slice(0, 100),
+ });
+ }
+ }
+ }
+ }
+
+ return items;
+ }
+
+ /**
+ * Extract property values from a single element container.
+ * For each schema property, tries class-name matching, then attribute
+ * matching, then falls back to positional heuristics.
+ */
+ private async extractPropertiesFromElement(
+ el: ReturnType,
+ properties: Record
+ ): Promise> {
+ const result: Record = {};
+
+ for (const [key, propSchema] of Object.entries(properties)) {
+ const keyLower = key.toLowerCase();
+ let value: string | null = null;
+
+ // Strategy 1: Find child element whose class or tag contains the property name
+ const classSelectors = [
+ `[class*="${keyLower}"]`,
+ `[data-${keyLower}]`,
+ `.${keyLower}`,
+ ];
+
+ for (const sel of classSelectors) {
+ try {
+ const child = el.locator(sel).first();
+ if (await child.count() > 0) {
+ // For links, prefer title attribute (common for truncated titles)
+ if (keyLower === 'title' || keyLower === 'name') {
+ value = await child.getAttribute('title') ?? await child.textContent();
+ } else {
+ value = await child.textContent();
+ }
+ // If text is empty, try extracting from class name (e.g. "star-rating Three" → "Three")
+ if (!value?.trim()) {
+ const cls = await child.getAttribute('class') ?? '';
+ const clsParts = cls.split(/\s+/).filter(c => !c.includes(keyLower) && c.length > 0);
+ if (clsParts.length > 0) value = clsParts[clsParts.length - 1] ?? null;
+ }
+ if (value?.trim()) break;
+ }
+ } catch { /* continue */ }
+ }
+
+ // Strategy 2: For known property patterns, try specific selectors
+ if (!value?.trim()) {
+ try {
+ if (keyLower === 'title' || keyLower === 'name') {
+ // Headings, links with title attribute
+ for (const sel of ['h1 a', 'h2 a', 'h3 a', 'h4 a', 'h1', 'h2', 'h3', 'h4', 'a[title]']) {
+ const child = el.locator(sel).first();
+ if (await child.count() > 0) {
+ value = await child.getAttribute('title') ?? await child.textContent();
+ if (value?.trim()) break;
}
}
- if (items.length > 0) break;
+ } else if (keyLower === 'price' || keyLower === 'cost' || keyLower === 'amount') {
+ // Price patterns
+ const text = await el.textContent() ?? '';
+ const priceMatch = text.match(/[$€£¥]\s*[\d,.]+|[\d,.]+\s*[$€£¥]/);
+ if (priceMatch) value = priceMatch[0].trim();
+ } else if (keyLower === 'url' || keyLower === 'link' || keyLower === 'href') {
+ const link = el.locator('a').first();
+ if (await link.count() > 0) {
+ value = await link.getAttribute('href');
+ }
+ } else if (keyLower === 'image' || keyLower === 'img' || keyLower === 'thumbnail') {
+ const img = el.locator('img').first();
+ if (await img.count() > 0) {
+ value = await img.getAttribute('src');
+ }
}
- } catch {
- // Continue to next selector
- }
+ } catch { /* continue */ }
}
- return { data: items, sources: includeSourceRefs ? sources : undefined, confidence: items.length > 0 ? 0.7 : 0.3 };
+ // Coerce type
+ const trimmed = value?.trim() ?? null;
+ if (trimmed === null) {
+ result[key] = null;
+ } else if (propSchema.type === 'number') {
+ const num = parseFloat(trimmed.replace(/[^0-9.-]/g, ''));
+ result[key] = isNaN(num) ? trimmed : num;
+ } else if (propSchema.type === 'boolean') {
+ result[key] = ['true', 'yes', '1', 'in stock', 'available'].includes(trimmed.toLowerCase());
+ } else {
+ result[key] = trimmed;
+ }
}
- if (schema.type === "object" && schema.properties) {
- // Extract object with properties
- const result: Record = {};
- const properties = schema.properties as Record;
-
- for (const [key, propSchema] of Object.entries(properties)) {
- // Try to find content matching this property
- const searchTerms = [key, propSchema.description].filter(Boolean);
-
- for (const term of searchTerms) {
- if (!term) continue;
-
- // Look for labels or headings containing the term
- const labelSelectors = [
- `label:has-text("${term}")`,
- `th:has-text("${term}")`,
- `dt:has-text("${term}")`,
- `[class*="${term.toLowerCase()}"]`,
- ];
-
- for (const selector of labelSelectors) {
- try {
- const label = await page.locator(selector).first();
- if (await label.count() > 0) {
- // Try to find associated value
- const parent = label.locator('..');
- const siblingText = await parent.textContent();
- if (siblingText) {
- const value = siblingText.replace(new RegExp(term, 'gi'), '').trim();
- if (value) {
- result[key] = propSchema.type === 'number' ? parseFloat(value) || value : value;
- break;
+ return result;
+ }
+
+ /**
+ * Extract tabular data from an HTML table.
+ */
+ private async extractTable(
+ _page: PlaywrightPage,
+ root: ReturnType,
+ schema: { items?: unknown },
+ includeSourceRefs: boolean,
+ sources: { ref: string; selector: BAPSelector; text?: string }[]
+ ): Promise {
+ const rows: unknown[] = [];
+ const itemSchema = schema.items as { properties?: Record } | undefined;
+
+ try {
+ // Find table headers to map columns
+ const headers: string[] = [];
+ const thElements = await root.locator('table th').all();
+ for (const th of thElements) {
+ headers.push((await th.textContent() ?? '').trim().toLowerCase());
+ }
+
+ // Extract rows
+ const trElements = await root.locator('table tbody tr').all();
+ const limit = Math.min(trElements.length, 100);
+
+ for (let i = 0; i < limit; i++) {
+ const tr = trElements[i]!;
+ const cells = await tr.locator('td').all();
+ const obj: Record = {};
+
+ if (itemSchema?.properties) {
+ // Map schema properties to table columns by name
+ for (const [key, propSchema] of Object.entries(itemSchema.properties)) {
+ const colIdx = headers.findIndex(h => h.includes(key.toLowerCase()));
+ if (colIdx >= 0 && colIdx < cells.length) {
+ const text = (await cells[colIdx]!.textContent() ?? '').trim();
+ obj[key] = propSchema.type === 'number' ? (parseFloat(text.replace(/[^0-9.-]/g, '')) || text) : text;
+ }
+ }
+ } else {
+ // No schema properties — use headers as keys
+ for (let c = 0; c < cells.length; c++) {
+ const key = c < headers.length ? headers[c]! : `col${c}`;
+ obj[key] = (await cells[c]!.textContent() ?? '').trim();
+ }
+ }
+
+ if (Object.values(obj).some(v => v !== null && v !== undefined && v !== '')) {
+ rows.push(obj);
+ if (includeSourceRefs) {
+ sources.push({
+ ref: `@s${rows.length}`,
+ selector: { type: 'css', value: `table tbody tr:nth-child(${i + 1})` },
+ });
+ }
+ }
+ }
+ } catch { /* table extraction failed */ }
+
+ return rows;
+ }
+
+ /**
+ * Extract a single object from page content.
+ */
+ private async extractObject(
+ page: PlaywrightPage,
+ root: ReturnType,
+ schema: { properties?: Record },
+ includeSourceRefs: boolean,
+ sources: { ref: string; selector: BAPSelector; text?: string }[]
+ ): Promise<{ data: Record; confidence: number }> {
+ const result: Record = {};
+ const properties = schema.properties as Record;
+
+ for (const [key, propSchema] of Object.entries(properties)) {
+ const searchTerms = [key, propSchema.description].filter(Boolean);
+
+ for (const term of searchTerms) {
+ if (!term) continue;
+
+ const labelSelectors = [
+ `label:has-text("${term}")`,
+ `th:has-text("${term}")`,
+ `dt:has-text("${term}")`,
+ `[class*="${term.toLowerCase()}"]`,
+ ];
+
+ for (const selector of labelSelectors) {
+ try {
+ const label = root.locator(selector).first();
+ if (await label.count() > 0) {
+ const parent = label.locator('..');
+ const siblingText = await parent.textContent();
+ if (siblingText) {
+ const value = siblingText.replace(new RegExp(term, 'gi'), '').trim();
+ if (value) {
+ result[key] = propSchema.type === 'number' ? parseFloat(value) || value : value;
+ if (includeSourceRefs) {
+ sources.push({
+ ref: `@s${Object.keys(result).length}`,
+ selector: { type: 'css', value: selector },
+ text: value.slice(0, 100),
+ });
}
+ break;
}
}
- } catch {
- // Continue
}
- }
+ } catch { /* continue */ }
}
}
+ }
- return {
- data: Object.keys(result).length > 0 ? result : { raw: content.slice(0, 1000) },
- sources: includeSourceRefs ? sources : undefined,
- confidence: Object.keys(result).length > 0 ? 0.6 : 0.2
- };
+ // Fallback for meta-based extraction (og:title, meta description, etc.)
+ if (Object.keys(result).length === 0) {
+ try {
+ for (const key of Object.keys(properties)) {
+ if (key === 'title' || key === 'name') {
+ result[key] = await page.title();
+ } else if (key === 'description') {
+ const desc = await page.locator('meta[name="description"]').getAttribute('content');
+ if (desc) result[key] = desc;
+ } else if (key === 'url') {
+ result[key] = page.url();
+ }
+ }
+ } catch { /* continue */ }
}
- // Default: return text content
return {
- data: content.trim().slice(0, 5000),
- sources: includeSourceRefs ? sources : undefined,
- confidence: 0.5
+ data: Object.keys(result).length > 0 ? result : { raw: (await root.textContent() ?? "").slice(0, 1000) },
+ confidence: Object.keys(result).length > 0 ? 0.7 : 0.2,
};
}
diff --git a/skills/bap-browser/LICENSE.txt b/skills/bap-browser/LICENSE.txt
new file mode 100644
index 0000000..5fe2868
--- /dev/null
+++ b/skills/bap-browser/LICENSE.txt
@@ -0,0 +1,208 @@
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to the Licensor for inclusion in the Work by the copyright
+ owner or by an individual or Legal Entity authorized to submit on behalf
+ of the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+---
+
+MIT License
+
+Copyright (c) 2024-2025 Model Context Protocol a Series of LF Projects, LLC.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+---
+
+Creative Commons Attribution 4.0 International (CC-BY-4.0)
+
+Documentation in this project (excluding specifications) is licensed under
+CC-BY-4.0. See https://creativecommons.org/licenses/by/4.0/legalcode for
+the full license text.
\ No newline at end of file
diff --git a/skills/bap-browser/SKILL.md b/skills/bap-browser/SKILL.md
new file mode 100644
index 0000000..736ab42
--- /dev/null
+++ b/skills/bap-browser/SKILL.md
@@ -0,0 +1,241 @@
+---
+name: bap-browser
+description: "AI-powered browser automation via Browser Agent Protocol. Use when the user wants to visit a website, open a webpage, go to a URL, search on Google, look something up online, check a website, read a webpage, book a flight, order food, buy something online, check email or weather, download a file, compare prices, find product reviews, take a screenshot of a page, scrape or extract data from a site, monitor a webpage for changes, test a web application, automate web tasks, interact with a web page, log in to a site, submit or fill out a form, shop online, sign up for a service, browse the web, research a topic online, check stock prices, track a package, read the news, post on social media, or any task that requires controlling a web browser. Provides semantic selectors, batched multi-step actions, and structured data extraction for fast, token-efficient browser automation."
+license: See LICENSE.txt (Apache-2.0)
+---
+
+# BAP Browser Automation
+
+You have BAP (Browser Agent Protocol) tools available. BAP wraps a real browser and exposes it through semantic, AI-native APIs. This document defines how to use them well.
+
+## Quick Start
+
+For most browser tasks, you only need three tools:
+
+1. **`navigate`** — open a URL
+2. **`observe`** — see what's on the page (returns interactive elements with stable refs)
+3. **`act`** — batch multiple interactions into a single call
+
+```
+navigate({ url: "https://example.com/login" })
+observe({ includeScreenshot: true })
+act({
+ steps: [
+ { action: "action/fill", selector: "@e1", value: "user@example.com" },
+ { action: "action/fill", selector: "@e2", value: "password123" },
+ { action: "action/click", selector: "role:button:Sign in" }
+ ]
+})
+```
+
+Read on for the full tool reference, selector guide, and advanced patterns.
+
+## Decision: Which Tool?
+
+**I need to open a page** → `navigate`
+
+**I need to understand what's on the page:**
+- I want interactive elements with stable refs → `observe` (set `includeScreenshot: true` for visual context)
+- I want the page structure cheaply → `aria_snapshot` (preferred — ~80% fewer tokens than `accessibility`)
+- I want to read article/body text → `content` with `format: "markdown"`
+- I want a visual capture → `screenshot`
+
+**I need to interact with something:**
+- Single click → `click`
+- Fill a form field (replaces content) → `fill`
+- Type character-by-character (autocomplete, search-as-you-type) → `type`
+- Press Enter/Tab/Escape/keyboard shortcut → `press`
+- Select from dropdown → `select`
+- Scroll to reveal more → `scroll`
+- Trigger hover menu → `hover`
+
+**I need to do multiple things at once** → `act` (batch 2–50 steps, single round-trip)
+
+**I need structured data from the page** → `extract` (give it a JSON schema)
+
+**I need to check an element's state** → `element` (visible? enabled? checked? value?)
+
+**I need to manage tabs** → `pages` / `activate_page` / `close_page`
+
+**I need to go back/forward/reload** → `go_back` / `go_forward` / `reload`
+
+## Selectors
+
+Every interaction tool takes a `selector` parameter. Use this priority:
+
+```
+role:button:Submit ← Best. ARIA role + accessible name. Survives redesigns.
+text:Sign in ← Visible text content.
+label:Email address ← Form label association.
+placeholder:Search... ← Input placeholder text.
+testId:submit-btn ← data-testid attribute.
+ref:@e3 (or just @e3) ← Stable ref from a prior observe call.
+css:.btn-primary ← Last resort. Fragile.
+#element-id ← Shorthand for CSS ID selector.
+```
+
+**Rules:**
+- Always prefer `role:` for buttons, links, inputs, checkboxes. They survive DOM changes.
+- Use `text:` when there's no clear ARIA role.
+- Never copy CSS selectors from page source. They break across deployments.
+- If you don't know what selectors are available, call `observe` first and use the returned refs.
+
+## The Observe → Act Pattern
+
+For any multi-step interaction on a page you haven't seen yet:
+
+**Step 1: Observe.**
+```
+observe({ includeScreenshot: true, maxElements: 30 })
+```
+Returns interactive elements with stable refs (`@e1`, `@e2`, ...) and optional annotated screenshot. Now you know exactly what's on the page.
+
+**Step 2: Act.**
+Batch all your actions into one call:
+```
+act({
+ steps: [
+ { action: "action/fill", selector: "@e1", value: "user@example.com" },
+ { action: "action/fill", selector: "@e2", value: "hunter2" },
+ { action: "action/click", selector: "role:button:Sign in" }
+ ]
+})
+```
+
+This pattern turns 4+ round-trips into 2. Use it.
+
+## Efficiency Rules
+
+1. **`aria_snapshot` over `accessibility`.** Same structure, ~80% fewer tokens.
+2. **`observe` with `maxElements`.** Default is 50. Set it lower when you can: `maxElements: 20`.
+3. **`observe` with `filterRoles`.** Focus: `filterRoles: ["button", "link", "textbox"]`.
+4. **`act` over individual calls.** A login flow is 1 `act`, not 3 separate fill/click calls.
+5. **`extract` over manual parsing.** Define a JSON schema. Let BAP extract. Don't scrape HTML.
+6. **`content({ format: "markdown" })` over screenshots for text.** Markdown is compact and parseable.
+7. **`fill` over `type` for form fields.** `fill` clears and sets; `type` sends keystrokes one at a time.
+
+## Recipes
+
+### Login
+```
+act({
+ steps: [
+ { action: "page/navigate", url: "https://app.example.com/login" },
+ { action: "action/fill", selector: "label:Email", value: "user@example.com" },
+ { action: "action/fill", selector: "label:Password", value: "password123" },
+ { action: "action/click", selector: "role:button:Sign in" }
+ ]
+})
+```
+
+### Extract a table of data
+```
+navigate({ url: "https://store.example.com/products" })
+extract({
+ instruction: "Extract product listings",
+ mode: "list",
+ schema: {
+ type: "array",
+ items: {
+ type: "object",
+ properties: {
+ name: { type: "string" },
+ price: { type: "number" },
+ inStock: { type: "boolean" }
+ }
+ }
+ }
+})
+```
+
+### Read an article
+```
+navigate({ url: "https://blog.example.com/post", waitUntil: "networkidle" })
+content({ format: "markdown" })
+```
+
+### Complex form with observe
+```
+observe({ filterRoles: ["textbox", "combobox", "checkbox"] })
+act({
+ steps: [
+ { action: "action/fill", selector: "@e1", value: "Jane Doe" },
+ { action: "action/fill", selector: "@e2", value: "jane@example.com" },
+ { action: "action/select", selector: "@e3", value: "Canada" },
+ { action: "action/check", selector: "@e4" },
+ { action: "action/click", selector: "role:button:Submit" }
+ ]
+})
+```
+
+### Search with autocomplete
+```
+type({ selector: "role:combobox:Search", text: "browser agent", delay: 100 })
+press({ key: "ArrowDown" })
+press({ key: "Enter" })
+```
+
+### Google search
+```
+navigate({ url: "https://www.google.com" })
+act({
+ steps: [
+ { action: "action/fill", selector: "role:combobox:Search", value: "best noise cancelling headphones 2025" },
+ { action: "action/click", selector: "role:button:Google Search" }
+ ]
+})
+content({ format: "markdown" })
+```
+
+### Compare prices across sites
+```
+navigate({ url: "https://store-a.example.com/product" })
+extract({
+ instruction: "Extract the product name and price",
+ mode: "single",
+ schema: {
+ type: "object",
+ properties: {
+ name: { type: "string" },
+ price: { type: "number" },
+ currency: { type: "string" }
+ }
+ }
+})
+navigate({ url: "https://store-b.example.com/product" })
+extract({
+ instruction: "Extract the product name and price",
+ mode: "single",
+ schema: {
+ type: "object",
+ properties: {
+ name: { type: "string" },
+ price: { type: "number" },
+ currency: { type: "string" }
+ }
+ }
+})
+```
+
+## Error Recovery
+
+| Problem | Fix |
+|---------|-----|
+| Element not found | `observe` the page again — the DOM changed. Use fresh refs. |
+| Navigation timeout | Use `waitUntil: "domcontentloaded"` instead of `"networkidle"`. |
+| Stale ref | Refs persist within a page but invalidate after navigation. Re-observe. |
+| Click intercepted | `scroll` to the element first, or use `press({ key: "Enter", selector: "..." })`. |
+| Page loaded but blank | Wait, then `reload`. Some SPAs hydrate slowly. |
+
+## Do Not
+
+- Use CSS selectors copied from browser DevTools. They break.
+- Call `accessibility` when `aria_snapshot` works. Wastes tokens.
+- Make individual click/fill calls when `act` can batch them.
+- Take a screenshot to read text. Use `content({ format: "markdown" })`.
+- Skip `observe` on pages you haven't seen. You'll guess wrong.
+- Parse raw HTML. Use `extract` with a schema.
+
+---
+
+For advanced patterns (multi-tab workflows, waiting strategies, annotated screenshots, nested extraction, batched error handling), see [references/REFERENCE.md](references/REFERENCE.md).
diff --git a/skills/bap-browser/references/REFERENCE.md b/skills/bap-browser/references/REFERENCE.md
new file mode 100644
index 0000000..d86f4ea
--- /dev/null
+++ b/skills/bap-browser/references/REFERENCE.md
@@ -0,0 +1,78 @@
+# Advanced BAP Patterns
+
+## Multi-tab workflows
+
+Use `pages` to list all open tabs, `activate_page` to switch between them, and `close_page` to clean up. Useful for comparing content across tabs or handling pop-ups.
+
+```
+navigate({ url: "https://a.example.com" })
+navigate({ url: "https://b.example.com" }) // opens in new tab
+pages() // returns [{id, url}, ...]
+activate_page({ pageId: "page-1" }) // switch back to first tab
+```
+
+## Waiting strategies
+
+The `waitUntil` parameter on `navigate` controls when the page is considered loaded:
+
+| Value | When to use |
+|-------|-------------|
+| `"load"` | Default. Fine for most pages. |
+| `"domcontentloaded"` | Faster. Use when you don't need images/fonts. |
+| `"networkidle"` | Slowest but most complete. Use for SPAs that fetch data after load. |
+
+If a page renders dynamically after navigation, use `observe` or `aria_snapshot` with a short delay rather than relying on `networkidle`.
+
+## Annotated screenshots (Set-of-Marks)
+
+`observe` supports `annotateScreenshot: true` which overlays numbered markers on each interactive element. Useful for visual debugging or confirming which element a ref points to.
+
+```
+observe({ includeScreenshot: true, annotateScreenshot: true, maxElements: 20 })
+```
+
+The returned screenshot will have numbered badges corresponding to element refs.
+
+## Nested extraction with complex schemas
+
+`extract` supports deeply nested JSON schemas. Use `mode: "single"` for a single object, `mode: "list"` for arrays, or `mode: "table"` for tabular data.
+
+```
+extract({
+ instruction: "Extract job listings with company details",
+ mode: "list",
+ schema: {
+ type: "array",
+ items: {
+ type: "object",
+ properties: {
+ title: { type: "string" },
+ company: {
+ type: "object",
+ properties: {
+ name: { type: "string" },
+ location: { type: "string" }
+ }
+ },
+ salary: { type: "number" },
+ remote: { type: "boolean" }
+ }
+ }
+ }
+})
+```
+
+## Error handling in batched actions
+
+`act` accepts `stopOnFirstError` (default: `true`). Set to `false` if you want to continue executing steps even when one fails — useful for best-effort form fills where some fields may not exist.
+
+```
+act({
+ stopOnFirstError: false,
+ steps: [
+ { action: "action/fill", selector: "label:First name", value: "Jane" },
+ { action: "action/fill", selector: "label:Middle name", value: "A." }, // may not exist
+ { action: "action/fill", selector: "label:Last name", value: "Doe" }
+ ]
+})
+```