diff --git a/.changeset/v020-release.md b/.changeset/v020-release.md new file mode 100644 index 0000000..53e17c1 --- /dev/null +++ b/.changeset/v020-release.md @@ -0,0 +1,9 @@ +--- +"@browseragentprotocol/protocol": minor +"@browseragentprotocol/logger": minor +"@browseragentprotocol/client": minor +"@browseragentprotocol/server-playwright": minor +"@browseragentprotocol/mcp": minor +--- + +v0.2.0 — browser selection, clean tool names, smarter extract diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json new file mode 100644 index 0000000..fd04a4a --- /dev/null +++ b/.claude-plugin/plugin.json @@ -0,0 +1,7 @@ +{ + "name": "BAPBrowser", + "description": "AI-optimized browser automation with semantic selectors, smart observations, and multi-step actions. Navigate, click, fill forms, extract data, and take screenshots across Chrome, Firefox, and WebKit.", + "author": { + "name": "Browser Agent Protocol" + } +} diff --git a/.gitignore b/.gitignore index 8bc0fef..1a20a60 100644 --- a/.gitignore +++ b/.gitignore @@ -33,3 +33,6 @@ coverage/ # Claude Code instructions (local only) CLAUDE.md docs/plans/ + +# Internal roadmap (local only) +ROADMAP.md diff --git a/.mcp.json b/.mcp.json new file mode 100644 index 0000000..fcf645a --- /dev/null +++ b/.mcp.json @@ -0,0 +1,8 @@ +{ + "mcpServers": { + "bap-browser": { + "command": "npx", + "args": ["-y", "@browseragentprotocol/mcp@0.2.0"] + } + } +} diff --git a/README.md b/README.md index 10bdac1..d880a86 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,11 @@ # Browser Agent Protocol (BAP) -[![npm version](https://badge.fury.io/js/@browseragentprotocol%2Fprotocol.svg)](https://www.npmjs.com/package/@browseragentprotocol/protocol) +[![npm version](https://badge.fury.io/js/@browseragentprotocol%2Fmcp.svg)](https://www.npmjs.com/package/@browseragentprotocol/mcp) [![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) An open standard for AI agents to interact with web browsers. -> **v0.1.0:** First public release. APIs may evolve based on feedback. +> **v0.2.0:** Renamed MCP tools, auto-reconnect, multi-context support, streaming, and more. APIs may evolve based on feedback. ## Overview @@ -49,69 +49,104 @@ BAP (Browser Agent Protocol) provides a standardized way for AI agents to contro ## Quick Start -### Using with MCP (Recommended for AI Agents) +BAP works with any MCP-compatible client. The server auto-starts — no separate setup needed. -BAP works with any MCP-compatible client including Claude Code, Claude Desktop, OpenAI Codex, and Google Antigravity. +### Claude Code -**Claude Code:** +**MCP server** (one command): ```bash -claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp +claude mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp +``` + +**Plugin** (includes SKILL.md for smarter tool usage): +```bash +claude plugin add --from https://github.com/browseragentprotocol/bap ```

- BAP with Claude Code
+ BAP in Claude Code
Claude Code browsing Hacker News with BAP

-**Claude Desktop** (`claude_desktop_config.json`): +### Claude Desktop + +Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows): + ```json { "mcpServers": { "bap-browser": { "command": "npx", - "args": ["@browseragentprotocol/mcp"] + "args": ["-y", "@browseragentprotocol/mcp"] } } } ``` +Restart Claude Desktop after saving. +

- BAP with Claude Desktop
+ BAP in Claude Desktop
Claude Desktop browsing Hacker News with BAP

-**Codex CLI:** +### Codex CLI + ```bash -codex mcp add bap-browser -- npx @browseragentprotocol/mcp +codex mcp add bap-browser -- npx -y @browseragentprotocol/mcp +``` + +Or add to `~/.codex/config.toml`: + +```toml +[mcp_servers.bap-browser] +command = "npx" +args = ["-y", "@browseragentprotocol/mcp"] ```

- BAP with OpenAI Codex CLI
+ BAP in Codex CLI
Codex CLI browsing Hacker News with BAP

-**Codex Desktop** (`~/.codex/config.toml`): +### Codex Desktop + +Add to `~/.codex/config.toml`: + ```toml [mcp_servers.bap-browser] command = "npx" -args = ["@browseragentprotocol/mcp"] +args = ["-y", "@browseragentprotocol/mcp"] ```

- BAP with OpenAI Codex Desktop
+ BAP in Codex Desktop
Codex Desktop browsing Hacker News with BAP

-> **💡 Tip:** Codex may default to web search. Be explicit: *"Using the bap-browser MCP tools..."* +### Browser Selection + +By default, BAP uses your locally installed Chrome. You can choose a different browser with the `--browser` flag: + +```bash +npx @browseragentprotocol/mcp --browser firefox +``` +| Value | Browser | Notes | +|---|---|---| +| `chrome` (default) | Local Chrome | Falls back to bundled Chromium if not installed | +| `chromium` | Bundled Chromium | Playwright's built-in Chromium | +| `firefox` | Firefox | Requires local Firefox | +| `webkit` | WebKit | Playwright's WebKit engine | +| `edge` | Microsoft Edge | Requires local Edge | -**Antigravity** (`mcp_config.json` via "..." → MCP Store → Manage): +In a JSON MCP config, pass the flag via args: ```json { "mcpServers": { "bap-browser": { "command": "npx", - "args": ["@browseragentprotocol/mcp"] + "args": ["-y", "@browseragentprotocol/mcp", "--browser", "firefox"] } } } @@ -255,7 +290,7 @@ const data = await client.extract({ }); ``` -> **Note:** `agent/extract` (and `bap_extract` in MCP) uses heuristic-based extraction (CSS patterns). For complex pages, consider using `bap_content` to get page content as markdown and extract data yourself. +> **Note:** `agent/extract` (and `extract` in MCP) uses heuristic-based extraction (CSS patterns). For complex pages, consider using `content` to get page content as markdown and extract data yourself. ## Server Options diff --git a/packages/client/src/__tests__/client-methods.test.ts b/packages/client/src/__tests__/client-methods.test.ts index 85e3445..ee2190d 100644 --- a/packages/client/src/__tests__/client-methods.test.ts +++ b/packages/client/src/__tests__/client-methods.test.ts @@ -197,6 +197,43 @@ describe("BAPClient Methods", () => { expect(parsed.params.headless).toBe(true); }); + it("launch() passes channel param correctly", async () => { + const { client, transport } = await createConnectedClient(); + transport.setAutoResponse("browser/launch", { browserId: "browser-2" }); + + const result = await client.launch({ browser: "chromium", channel: "chrome", headless: false }); + + expect(result.browserId).toBe("browser-2"); + + const launchRequest = transport.sentMessages.find((msg) => { + const parsed = JSON.parse(msg); + return parsed.method === "browser/launch"; + }); + expect(launchRequest).toBeDefined(); + const parsed = JSON.parse(launchRequest!); + expect(parsed.params.browser).toBe("chromium"); + expect(parsed.params.channel).toBe("chrome"); + expect(parsed.params.headless).toBe(false); + }); + + it("launch() without channel works (backwards compat)", async () => { + const { client, transport } = await createConnectedClient(); + transport.setAutoResponse("browser/launch", { browserId: "browser-3" }); + + const result = await client.launch({ browser: "firefox" }); + + expect(result.browserId).toBe("browser-3"); + + const launchRequest = transport.sentMessages.find((msg) => { + const parsed = JSON.parse(msg); + return parsed.method === "browser/launch"; + }); + expect(launchRequest).toBeDefined(); + const parsed = JSON.parse(launchRequest!); + expect(parsed.params.browser).toBe("firefox"); + expect(parsed.params.channel).toBeUndefined(); + }); + it("closeBrowser() sends correct request", async () => { const { client, transport } = await createConnectedClient(); transport.setAutoResponse("browser/close", {}); diff --git a/packages/client/src/index.ts b/packages/client/src/index.ts index cac28b5..33c6602 100644 --- a/packages/client/src/index.ts +++ b/packages/client/src/index.ts @@ -376,7 +376,7 @@ export class BAPClient extends EventEmitter { this.options = { token: options.token, name: options.name ?? "bap-client", - version: options.version ?? "0.1.0", + version: options.version ?? "0.2.0", timeout: options.timeout ?? 30000, events: options.events ?? ["page", "console", "network", "dialog"], }; diff --git a/packages/mcp/README.md b/packages/mcp/README.md index c3bf6f1..9ab3e2d 100644 --- a/packages/mcp/README.md +++ b/packages/mcp/README.md @@ -1,100 +1,63 @@ # @browseragentprotocol/mcp -MCP (Model Context Protocol) server for Browser Agent Protocol. Enables AI assistants to control web browsers. - -## Supported Clients - -| Client | Status | -|--------|--------| -| Claude Code | Supported | -| Claude Desktop | Supported | -| OpenAI Codex | Supported | -| Google Antigravity | Supported | -| Any MCP-compatible client | Supported | +MCP (Model Context Protocol) server for Browser Agent Protocol. Gives any MCP-compatible AI agent full browser control. ## Installation -### With Claude Code +### One command — standalone mode ```bash -# Add the BAP browser server -claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp +npx @browseragentprotocol/mcp ``` -That's it! Claude Code can now control browsers. Try asking: *"Go to example.com and take a screenshot"* +This auto-starts a BAP Playwright server and exposes browser tools over MCP stdio. No separate server process needed. -### With OpenAI Codex +### Add to an MCP client +**Claude Code:** ```bash -# Add the BAP browser server -codex mcp add bap-browser -- npx @browseragentprotocol/mcp -``` - -Or add to your `~/.codex/config.toml`: - -```toml -[mcp_servers.bap-browser] -command = "npx" -args = ["@browseragentprotocol/mcp"] +claude mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp ``` -### With Claude Desktop - -Add to your config file: - -**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json` -**Windows**: `%APPDATA%\Claude\claude_desktop_config.json` - +**Claude Desktop** — add to `claude_desktop_config.json`: ```json { "mcpServers": { "bap-browser": { "command": "npx", - "args": ["@browseragentprotocol/mcp"] + "args": ["-y", "@browseragentprotocol/mcp"] } } } ``` -Restart Claude Desktop after saving. - -### With Google Antigravity - -1. Open the MCP Store via the **"..."** dropdown at the top of the editor's agent panel -2. Click **Manage MCP Servers** -3. Click **View raw config** -4. Add the BAP browser server to your `mcp_config.json`: +**Codex CLI:** +```bash +codex mcp add bap-browser -- npx -y @browseragentprotocol/mcp +``` -```json -{ - "mcpServers": { - "bap-browser": { - "command": "npx", - "args": ["@browseragentprotocol/mcp"] - } - } -} +**Codex Desktop** — add to `~/.codex/config.toml`: +```toml +[mcp_servers.bap-browser] +command = "npx" +args = ["-y", "@browseragentprotocol/mcp"] ``` -5. Save and refresh to load the new configuration +### Connect to an existing BAP server -### Standalone +If you already have a BAP Playwright server running, pass `--url` to skip auto-start: ```bash -# Start the MCP server (connects to BAP server on localhost:9222) -npx @browseragentprotocol/mcp - -# With custom BAP server URL -npx @browseragentprotocol/mcp --bap-url ws://localhost:9333 +npx @browseragentprotocol/mcp --url ws://localhost:9222 ``` ## How It Works ``` ┌─────────────┐ MCP ┌─────────────┐ BAP ┌─────────────┐ -│ Claude │ ──────────── │ BAP MCP │ ────────── │ BAP Server │ -│ (or other │ (stdio) │ Server │ (WebSocket)│ (Playwright)│ -│ MCP host) │ │ │ │ │ +│ AI Agent │ ──────────── │ BAP MCP │ ────────── │ BAP Server │ +│ (any MCP │ (stdio) │ Server │ (WebSocket)│ (Playwright)│ +│ client) │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ ▼ @@ -103,61 +66,59 @@ npx @browseragentprotocol/mcp --bap-url ws://localhost:9333 └─────────────┘ ``` -1. Claude sends tool calls via MCP (stdio transport) +1. AI agent sends tool calls via MCP (stdio transport) 2. This package translates them to BAP protocol 3. BAP server controls the browser via Playwright -4. Results flow back to Claude +4. Results flow back to the agent ## Available Tools -When connected, Claude has access to these browser automation tools: - ### Navigation | Tool | Description | |------|-------------| -| `bap_navigate` | Navigate to a URL | -| `bap_go_back` | Navigate back in browser history | -| `bap_go_forward` | Navigate forward in browser history | -| `bap_reload` | Reload the current page | +| `navigate` | Navigate to a URL | +| `go_back` | Navigate back in browser history | +| `go_forward` | Navigate forward in browser history | +| `reload` | Reload the current page | ### Element Interaction | Tool | Description | |------|-------------| -| `bap_click` | Click an element using semantic selectors | -| `bap_type` | Type text character by character (first clicks element) | -| `bap_fill` | Fill a form field (clears existing content first) | -| `bap_press` | Press keyboard keys (Enter, Tab, shortcuts) | -| `bap_select` | Select an option from a dropdown | -| `bap_scroll` | Scroll the page or a specific element | -| `bap_hover` | Hover over an element | +| `click` | Click an element using semantic selectors | +| `type` | Type text character by character (first clicks element) | +| `fill` | Fill a form field (clears existing content first) | +| `press` | Press keyboard keys (Enter, Tab, shortcuts) | +| `select` | Select an option from a dropdown | +| `scroll` | Scroll the page or a specific element | +| `hover` | Hover over an element | ### Observation | Tool | Description | |------|-------------| -| `bap_screenshot` | Take a screenshot of the page | -| `bap_accessibility` | Get the full accessibility tree | -| `bap_aria_snapshot` | Get a token-efficient YAML accessibility snapshot (~80% fewer tokens) | -| `bap_content` | Get page text content as text or markdown | -| `bap_element` | Query element properties (exists, visible, enabled) | +| `screenshot` | Take a screenshot of the page | +| `accessibility` | Get the full accessibility tree | +| `aria_snapshot` | Token-efficient YAML accessibility snapshot (~80% fewer tokens) | +| `content` | Get page text content as text or markdown | +| `element` | Query element properties (exists, visible, enabled) | ### Page Management | Tool | Description | |------|-------------| -| `bap_pages` | List all open pages/tabs | -| `bap_activate_page` | Switch to a different page/tab | -| `bap_close_page` | Close the current page/tab | +| `pages` | List all open pages/tabs | +| `activate_page` | Switch to a different page/tab | +| `close_page` | Close the current page/tab | ### AI Agent Methods | Tool | Description | |------|-------------| -| `bap_observe` | Get AI-optimized page observation with interactive elements and stable refs | -| `bap_act` | Execute a sequence of browser actions in a single call | -| `bap_extract` | Extract structured data from the page using schema and CSS heuristics | +| `observe` | AI-optimized page observation with interactive elements and stable refs | +| `act` | Execute a sequence of browser actions in a single call | +| `extract` | Extract structured data from the page using schema and CSS heuristics | ### Selector Formats @@ -168,73 +129,24 @@ role:button:Submit # ARIA role + accessible name (recommended) text:Sign in # Visible text content label:Email address # Associated label testid:submit-button # data-testid attribute -ref:@submitBtn # Stable element reference from bap_observe +ref:@submitBtn # Stable element reference from observe css:.btn-primary # CSS selector (fallback) xpath://button[@type] # XPath selector (fallback) ``` -## Example Conversations - -**You:** Go to Hacker News and tell me the top 3 stories - -**Claude:** I'll browse to Hacker News and get the top stories for you. - -*[Uses bap_navigate, bap_accessibility]* - -Here are the top 3 stories on Hacker News right now: -1. "Show HN: I built a tool for..." -2. "Why we switched from..." -3. "The future of..." - ---- - -**You:** Fill out the contact form on example.com with my details - -**Claude:** I'll navigate to the contact form and fill it out. - -*[Uses bap_navigate, bap_fill, bap_click]* - -Done! I've filled in the form with your details and submitted it. - ## CLI Options ``` Options: - --bap-url, -u BAP server URL (default: ws://localhost:9222) - --allowed-domains Comma-separated list of allowed domains (e.g., "*.example.com,trusted.org") - --verbose Enable verbose logging - --help Show help -``` - -## Managing the Server - -```bash -# List configured MCP servers -claude mcp list - -# Get details for the BAP browser server -claude mcp get bap-browser - -# Remove the server -claude mcp remove bap-browser - -# Check server status (within Claude Code) -/mcp -``` - -## Configuration Scopes - -When adding the server with Claude Code, you can specify where to store the configuration: - -```bash -# Local scope (default) - only you, only this project -claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp - -# User scope - available to you across all projects -claude mcp add --transport stdio --scope user bap-browser -- npx @browseragentprotocol/mcp - -# Project scope - shared with team via .mcp.json -claude mcp add --transport stdio --scope project bap-browser -- npx @browseragentprotocol/mcp + -b, --browser Browser: chrome (default), chromium, firefox, webkit, edge + -u, --url Connect to existing BAP server (skips auto-start) + -p, --port Port for auto-started server (default: 9222) + --headless Run browser headless (default: true) + --no-headless Visible browser window + --allowed-domains Comma-separated list of allowed domains + -v, --verbose Enable verbose logging to stderr + -h, --help Show help + --version Show version ``` ## Programmatic Usage @@ -248,14 +160,15 @@ const server = new BAPMCPServer({ verbose: true, }); -await server.start(); +await server.run(); ``` ## Requirements - Node.js >= 20.0.0 -- A running BAP server (`npx @browseragentprotocol/server-playwright`) -- An MCP-compatible client (Claude Code, Claude Desktop, OpenAI Codex, etc.) +- An MCP-compatible client + +The BAP Playwright server is auto-started by default. To install Playwright browsers manually: `npx playwright install chromium`. ## Troubleshooting @@ -263,22 +176,23 @@ await server.start(); On native Windows (not WSL), use the `cmd /c` wrapper: -```bash -claude mcp add --transport stdio bap-browser -- cmd /c npx @browseragentprotocol/mcp +```json +{ + "mcpServers": { + "bap-browser": { + "command": "cmd", + "args": ["/c", "npx", "-y", "@browseragentprotocol/mcp"] + } + } +} ``` -**Server not connecting?** - -Make sure the BAP server is running: - -```bash -npx @browseragentprotocol/server-playwright -``` +**Server not starting?** -Then check the MCP server status: +Ensure Playwright browsers are installed: ```bash -/mcp # Within Claude Code +npx playwright install chromium ``` ## License diff --git a/packages/mcp/package.json b/packages/mcp/package.json index 3eeb663..9d2347e 100644 --- a/packages/mcp/package.json +++ b/packages/mcp/package.json @@ -41,8 +41,8 @@ "automation", "ai", "agent", - "claude", - "anthropic" + "playwright", + "web-scraping" ], "scripts": { "build": "tsup src/index.ts src/cli.ts --format cjs,esm --dts --clean", diff --git a/packages/mcp/src/__tests__/standalone.test.ts b/packages/mcp/src/__tests__/standalone.test.ts new file mode 100644 index 0000000..6adc397 --- /dev/null +++ b/packages/mcp/src/__tests__/standalone.test.ts @@ -0,0 +1,181 @@ +import { describe, it, expect } from "vitest"; +import net from "node:net"; + +/** + * Tests for the standalone server management utilities in cli.ts. + * + * Since cli.ts runs as a script (calls main() at module level), we test + * the port-checking logic by reimplementing the core utility functions + * that are used by the standalone server management. + */ + +// --------------------------------------------------------------------------- +// Port detection tests (mirrors isPortInUse from cli.ts) +// --------------------------------------------------------------------------- + +function isPortInUse(port: number, host: string = "localhost"): Promise { + return new Promise((resolve) => { + const socket = net.createConnection({ port, host }); + socket.setTimeout(500); + socket.on("connect", () => { + socket.destroy(); + resolve(true); + }); + socket.on("timeout", () => { + socket.destroy(); + resolve(false); + }); + socket.on("error", () => { + socket.destroy(); + resolve(false); + }); + }); +} + +async function waitForServer( + port: number, + host: string = "localhost", + timeoutMs: number = 2000, + intervalMs: number = 50, +): Promise { + const start = Date.now(); + while (Date.now() - start < timeoutMs) { + if (await isPortInUse(port, host)) { + return; + } + await new Promise((resolve) => setTimeout(resolve, intervalMs)); + } + throw new Error(`Server did not start within ${timeoutMs}ms on port ${port}`); +} + +describe("standalone server utilities", () => { + describe("isPortInUse()", () => { + it("returns false for a port with nothing listening", async () => { + // Use a random high port that's unlikely to be in use + const result = await isPortInUse(59999); + expect(result).toBe(false); + }); + + it("returns true when a server is listening on the port", async () => { + // Start a temporary TCP server + const server = net.createServer(); + const port = await new Promise((resolve) => { + server.listen(0, "localhost", () => { + const addr = server.address(); + if (addr && typeof addr === "object") { + resolve(addr.port); + } + }); + }); + + try { + const result = await isPortInUse(port, "localhost"); + expect(result).toBe(true); + } finally { + server.close(); + } + }); + }); + + describe("waitForServer()", () => { + it("resolves immediately if server is already running", async () => { + const server = net.createServer(); + const port = await new Promise((resolve) => { + server.listen(0, "localhost", () => { + const addr = server.address(); + if (addr && typeof addr === "object") { + resolve(addr.port); + } + }); + }); + + try { + // Should resolve almost instantly + const start = Date.now(); + await waitForServer(port, "localhost", 2000, 50); + const elapsed = Date.now() - start; + expect(elapsed).toBeLessThan(500); + } finally { + server.close(); + } + }); + + it("waits for a server that starts after a delay", async () => { + const server = net.createServer(); + // Find a free port + const port = await new Promise((resolve) => { + const tmp = net.createServer(); + tmp.listen(0, "localhost", () => { + const addr = tmp.address(); + if (addr && typeof addr === "object") { + const p = addr.port; + tmp.close(() => resolve(p)); + } + }); + }); + + // Start the server after 200ms + const startTimer = setTimeout(() => { + server.listen(port, "localhost"); + }, 200); + + try { + await waitForServer(port, "localhost", 3000, 50); + // If we get here, the server was detected + expect(true).toBe(true); + } finally { + clearTimeout(startTimer); + server.close(); + } + }); + + it("throws if server does not start within timeout", async () => { + await expect( + waitForServer(59998, "localhost", 300, 50) + ).rejects.toThrow("Server did not start within 300ms"); + }); + }); + + describe("CLI argument parsing", () => { + it("standalone mode is the default (no --url)", () => { + // When no --url is provided, isStandalone should be true + const url = undefined; + const isStandalone = !url; + expect(isStandalone).toBe(true); + }); + + it("providing --url disables standalone mode", () => { + const url = "ws://remote:9222"; + const isStandalone = !url; + expect(isStandalone).toBe(false); + }); + + it("default port is 9222", () => { + const configPort: number | undefined = undefined; + const port = configPort ?? 9222; + expect(port).toBe(9222); + }); + + it("custom port overrides default", () => { + const configPort: number | undefined = 9333; + const port = configPort ?? 9222; + expect(port).toBe(9333); + }); + + it("browser mapping to server-playwright names", () => { + const browserMap: Record = { + chrome: "chromium", + chromium: "chromium", + firefox: "firefox", + webkit: "webkit", + edge: "chromium", + }; + + expect(browserMap["chrome"]).toBe("chromium"); + expect(browserMap["chromium"]).toBe("chromium"); + expect(browserMap["firefox"]).toBe("firefox"); + expect(browserMap["webkit"]).toBe("webkit"); + expect(browserMap["edge"]).toBe("chromium"); + }); + }); +}); diff --git a/packages/mcp/src/cli.ts b/packages/mcp/src/cli.ts index e9da4a8..17de706 100644 --- a/packages/mcp/src/cli.ts +++ b/packages/mcp/src/cli.ts @@ -3,16 +3,24 @@ * @fileoverview BAP MCP Server CLI * * Run the BAP MCP server from the command line. + * By default, auto-starts a BAP Playwright server (standalone mode). + * Use --url to connect to an existing BAP server instead. * * Usage: - * bap-mcp # Use defaults (ws://localhost:9222) - * bap-mcp --url ws://host:port # Custom BAP server URL + * bap-mcp # Standalone: auto-starts BAP server + * bap-mcp --url ws://host:port # Connect to existing BAP server + * bap-mcp --browser firefox # Use Firefox * bap-mcp --verbose # Enable verbose logging * bap-mcp --allowed-domains example.com,api.example.com */ +import { spawn, type ChildProcess } from "node:child_process"; +import fs from "node:fs"; +import net from "node:net"; +import path from "node:path"; +import { fileURLToPath } from "node:url"; import { Logger, icons, pc } from "@browseragentprotocol/logger"; -import { BAPMCPServer } from "./index.js"; +import { BAPMCPServer, type BrowserChoice } from "./index.js"; // MCP servers should log to stderr to avoid interfering with stdio transport const log = new Logger({ prefix: "BAP MCP", stderr: true }); @@ -23,7 +31,10 @@ const log = new Logger({ prefix: "BAP MCP", stderr: true }); interface CLIArgs { url?: string; + port?: number; + browser?: string; verbose?: boolean; + headless?: boolean; allowedDomains?: string[]; help?: boolean; version?: boolean; @@ -42,8 +53,18 @@ function parseArgs(): CLIArgs { args.version = true; } else if (arg === "--url" || arg === "-u" || arg === "--bap-url") { args.url = argv[++i]; + } else if (arg === "--port" || arg === "-p") { + args.port = parseInt(argv[++i] ?? "9222", 10); + } else if (arg === "--browser" || arg === "-b") { + args.browser = argv[++i]; } else if (arg === "--verbose" || arg === "-v") { args.verbose = true; + } else if (arg === "--headless") { + args.headless = true; + } else if (arg === "--headless=true") { + args.headless = true; + } else if (arg === "--headless=false" || arg === "--no-headless") { + args.headless = false; } else if (arg === "--allowed-domains") { args.allowedDomains = argv[++i]?.split(",").map((d) => d.trim()); } @@ -59,45 +80,232 @@ ${pc.bold("BAP MCP Server")} ${pc.dim("- Browser Agent Protocol as MCP")} ${pc.cyan("USAGE")} ${pc.dim("$")} npx @browseragentprotocol/mcp ${pc.dim("[OPTIONS]")} + By default, auto-starts a local BAP Playwright server (standalone mode). + Pass ${pc.yellow("--url")} to connect to an existing BAP server instead. + ${pc.cyan("OPTIONS")} - ${pc.yellow("-u, --url")} ${pc.dim("")} BAP server WebSocket URL ${pc.dim("(default: ws://localhost:9222)")} + ${pc.yellow("-b, --browser")} ${pc.dim("")} Browser: chrome ${pc.dim("(default)")}, chromium, firefox, webkit, edge + ${pc.yellow("-u, --url")} ${pc.dim("")} Connect to existing BAP server ${pc.dim("(skips auto-start)")} + ${pc.yellow("-p, --port")} ${pc.dim("")} Port for auto-started server ${pc.dim("(default: 9222)")} + ${pc.yellow("--headless")} Run browser in headless mode ${pc.dim("(default: true)")} + ${pc.yellow("--no-headless")} Run with visible browser window ${pc.yellow("-v, --verbose")} Enable verbose logging to stderr ${pc.yellow("--allowed-domains")} ${pc.dim("")} Comma-separated list of allowed domains ${pc.yellow("-h, --help")} Show this help message ${pc.yellow("--version")} Show version ${pc.cyan("EXAMPLES")} - ${pc.dim("# Start with defaults (connect to localhost:9222)")} + ${pc.dim("# Standalone mode (auto-starts server, uses local Chrome)")} ${pc.dim("$")} npx @browseragentprotocol/mcp - ${pc.dim("# Connect to a remote BAP server")} - ${pc.dim("$")} npx @browseragentprotocol/mcp --url ws://192.168.1.100:9222 + ${pc.dim("# Use Firefox")} + ${pc.dim("$")} npx @browseragentprotocol/mcp --browser firefox + + ${pc.dim("# Visible browser window")} + ${pc.dim("$")} npx @browseragentprotocol/mcp --no-headless - ${pc.dim("# Enable verbose logging")} - ${pc.dim("$")} npx @browseragentprotocol/mcp --verbose + ${pc.dim("# Connect to a remote BAP server (skips auto-start)")} + ${pc.dim("$")} npx @browseragentprotocol/mcp --url ws://192.168.1.100:9222 - ${pc.dim("# Restrict to specific domains (security)")} + ${pc.dim("# Domain allowlist")} ${pc.dim("$")} npx @browseragentprotocol/mcp --allowed-domains example.com,api.example.com -${pc.cyan("CLAUDE DESKTOP")} - Add to ${pc.dim("claude_desktop_config.json")}: +${pc.cyan("MCP CLIENT SETUP")} + ${pc.dim("Add to any MCP-compatible client (config examples):")} + ${pc.dim("JSON config:")} ${pc.dim("{")} ${pc.dim('"mcpServers"')}: { ${pc.green('"bap-browser"')}: { "command": "npx", - "args": ["@browseragentprotocol/mcp"] + "args": ["-y", "@browseragentprotocol/mcp"] } } ${pc.dim("}")} -${pc.cyan("CLAUDE CODE")} - ${pc.dim("$")} claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp + ${pc.dim("CLI:")} + ${pc.dim("$")} ${pc.dim("")} mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp -${pc.dim("For more information:")} ${pc.cyan("https://github.com/browseragentprotocol/bap")} +${pc.dim("Docs:")} ${pc.cyan("https://github.com/browseragentprotocol/bap")} `); } +// ============================================================================= +// Standalone Server Management +// ============================================================================= + +/** + * Check if a port is available by attempting a TCP connection. + * Returns true if something is already listening on the port. + */ +function isPortInUse(port: number, host: string = "localhost"): Promise { + return new Promise((resolve) => { + const socket = net.createConnection({ port, host }); + socket.setTimeout(500); + socket.on("connect", () => { + socket.destroy(); + resolve(true); + }); + socket.on("timeout", () => { + socket.destroy(); + resolve(false); + }); + socket.on("error", () => { + socket.destroy(); + resolve(false); + }); + }); +} + +/** + * Wait for a server to become available on the given port. + * Polls with the specified interval until timeout. + */ +async function waitForServer( + port: number, + host: string = "localhost", + timeoutMs: number = 15000, + intervalMs: number = 150, +): Promise { + const start = Date.now(); + + while (Date.now() - start < timeoutMs) { + if (await isPortInUse(port, host)) { + return; + } + await new Promise((resolve) => setTimeout(resolve, intervalMs)); + } + + throw new Error( + `BAP server did not start within ${timeoutMs / 1000}s on port ${port}. ` + + `Ensure Playwright browsers are installed: npx playwright install chromium` + ); +} + +/** + * Resolve the command to start the server-playwright CLI. + * + * In monorepo development, the sibling package's built CLI is used directly + * to avoid npx overhead. In published (npm install) usage, falls back to npx. + */ +function resolveServerCommand(): { command: string; args: string[] } { + try { + const __dirname = path.dirname(fileURLToPath(import.meta.url)); + const siblingCli = path.resolve(__dirname, "../../server-playwright/dist/cli.js"); + + if (fs.existsSync(siblingCli)) { + return { command: "node", args: [siblingCli] }; + } + } catch { + // import.meta.url resolution failed — not in ESM context, fall through + } + + return { command: "npx", args: ["-y", "@browseragentprotocol/server-playwright"] }; +} + +interface StandaloneServerOptions { + port: number; + host: string; + browser: string; + headless: boolean; + verbose: boolean; +} + +/** + * Start the BAP Playwright server as a child process. + * + * If a server is already listening on the target port, reuses it and returns + * null (caller should not attempt to kill it on shutdown). + * + * Otherwise, spawns the server-playwright CLI, waits for it to be ready, + * and returns the ChildProcess handle for lifecycle management. + */ +async function startStandaloneServer( + options: StandaloneServerOptions, +): Promise { + const { port, host, browser, headless, verbose } = options; + + // Reuse an existing server if one is already on this port + if (await isPortInUse(port, host)) { + log.info(`BAP server already running on ${host}:${port}, reusing`); + return null; + } + + log.info(`Starting BAP Playwright server on port ${port}...`); + + const { command, args } = resolveServerCommand(); + const serverArgs = [ + ...args, + "--port", port.toString(), + "--host", host, + headless ? "--headless" : "--no-headless", + ]; + + // Map MCP-level browser names to server-playwright's accepted values + const browserMap: Record = { + chrome: "chromium", + chromium: "chromium", + firefox: "firefox", + webkit: "webkit", + edge: "chromium", + }; + serverArgs.push("--browser", browserMap[browser] ?? "chromium"); + + if (verbose) { + serverArgs.push("--debug"); + } + + const child = spawn(command, serverArgs, { + stdio: ["ignore", "pipe", "pipe"], + detached: false, + env: { ...process.env }, + }); + + // Pipe server output to stderr when verbose (MCP uses stdout for stdio transport) + if (verbose) { + child.stdout?.on("data", (data: Buffer) => { + process.stderr.write(`[BAP Server] ${data.toString()}`); + }); + child.stderr?.on("data", (data: Buffer) => { + process.stderr.write(`[BAP Server] ${data.toString()}`); + }); + } + + child.on("error", (err) => { + log.error("Failed to start BAP server", err); + }); + + child.on("exit", (code, signal) => { + if (code !== null && code !== 0) { + log.error(`BAP server exited with code ${code}`); + } else if (signal && verbose) { + log.info(`BAP server stopped (${signal})`); + } + }); + + // Wait for the server to become available + try { + await waitForServer(port, host); + log.info(`BAP server ready on ws://${host}:${port}`); + } catch (err) { + child.kill("SIGTERM"); + throw err; + } + + return child; +} + +/** + * Kill a child process gracefully (SIGTERM), escalating to SIGKILL after 500ms. + */ +async function killServer(child: ChildProcess): Promise { + child.kill("SIGTERM"); + await new Promise((resolve) => setTimeout(resolve, 500)); + if (!child.killed) { + child.kill("SIGKILL"); + } +} + // ============================================================================= // Main // ============================================================================= @@ -112,54 +320,84 @@ async function main(): Promise { if (args.version) { console.error( - `${icons.connection} BAP MCP Server ${pc.dim("v0.1.0-alpha.1")}` + `${icons.connection} BAP MCP Server ${pc.dim("v0.2.0")}` ); process.exit(0); } - // Set log level based on verbose flag if (args.verbose) { log.setLevel("debug"); } - const server = new BAPMCPServer({ - bapServerUrl: args.url, - verbose: args.verbose, - allowedDomains: args.allowedDomains, - }); + // Determine mode: standalone (auto-start server) vs connect to existing + const isStandalone = !args.url; + const port = args.port ?? 9222; + const host = "localhost"; + const bapServerUrl = args.url ?? `ws://${host}:${port}`; + let serverProcess: ChildProcess | null = null; - // Handle graceful shutdown - const shutdown = async (signal: string) => { - if (args.verbose) { - log.info(`${signal} received, shutting down...`); - } - await server.close(); - if (args.verbose) { - log.success("Server stopped"); + try { + if (isStandalone) { + if (args.verbose) { + log.info("Standalone mode: auto-starting BAP Playwright server"); + } + + serverProcess = await startStandaloneServer({ + port, + host, + browser: args.browser ?? "chrome", + headless: args.headless ?? true, + verbose: args.verbose ?? false, + }); } - process.exit(0); - }; - process.on("SIGINT", () => shutdown("SIGINT")); - process.on("SIGTERM", () => shutdown("SIGTERM")); + const server = new BAPMCPServer({ + bapServerUrl, + browser: args.browser as BrowserChoice | undefined, + headless: args.headless ?? true, + verbose: args.verbose, + allowedDomains: args.allowedDomains, + }); - // Handle uncaught errors - process.on("uncaughtException", (error) => { - log.error("Uncaught exception", error); - process.exit(1); - }); + // Graceful shutdown — clean up MCP server and child process + const shutdown = async (signal: string) => { + if (args.verbose) { + log.info(`${signal} received, shutting down...`); + } - process.on("unhandledRejection", (reason) => { - log.error("Unhandled rejection", reason); - process.exit(1); - }); + await server.close(); + + if (serverProcess) { + await killServer(serverProcess); + } + + if (args.verbose) { + log.success("Server stopped"); + } + process.exit(0); + }; + + process.on("SIGINT", () => shutdown("SIGINT")); + process.on("SIGTERM", () => shutdown("SIGTERM")); + + process.on("uncaughtException", (error) => { + log.error("Uncaught exception", error); + serverProcess?.kill("SIGTERM"); + process.exit(1); + }); + + process.on("unhandledRejection", (reason) => { + log.error("Unhandled rejection", reason); + serverProcess?.kill("SIGTERM"); + process.exit(1); + }); - try { if (args.verbose) { - log.info(`Connecting to BAP server at ${args.url ?? "ws://localhost:9222"}`); + log.info(`Connecting to BAP server at ${bapServerUrl}`); } await server.run(); } catch (error) { + serverProcess?.kill("SIGTERM"); log.error("Failed to start server", error); process.exit(1); } diff --git a/packages/mcp/src/index.ts b/packages/mcp/src/index.ts index 89f9318..ee4e040 100644 --- a/packages/mcp/src/index.ts +++ b/packages/mcp/src/index.ts @@ -1,14 +1,14 @@ /** * @fileoverview BAP MCP Integration * @module @browseragentprotocol/mcp - * @version 0.1.0 + * @version 0.2.0 * * Exposes Browser Agent Protocol as an MCP (Model Context Protocol) server. - * Allows AI agents like Claude to control browsers through standardized MCP tools. + * Allows AI agents to control browsers through standardized MCP tools. * * TODO (MEDIUM): Add input validation on tool arguments before passing to BAP client * TODO (MEDIUM): Enforce session timeout (maxSessionDuration) - currently unused - * TODO (MEDIUM): Add resource cleanup on partial failure in ensureClient() + * TODO (MEDIUM): Add resource cleanup on partial failure in ensureClient() — DONE (v0.2.0) * TODO (LOW): parseSelector should validate empty/whitespace-only strings * TODO (LOW): Consider sanitizing URLs in verbose logging to prevent token leakage */ @@ -45,13 +45,19 @@ import { // Types // ============================================================================= +export type BrowserChoice = "chrome" | "chromium" | "firefox" | "webkit" | "edge"; + export interface BAPMCPServerOptions { /** BAP server URL (default: ws://localhost:9222) */ bapServerUrl?: string; - /** Server name for MCP (default: bap-browser) */ + /** Server name for MCP (default: BAPBrowser) */ name?: string; /** Server version (default: 1.0.0) */ version?: string; + /** Browser to use: chrome (default), chromium, firefox, webkit, edge */ + browser?: BrowserChoice; + /** Run browser in headless mode (default: true) */ + headless?: boolean; /** Enable verbose logging */ verbose?: boolean; /** Allowed domains for navigation (empty = all allowed) */ @@ -195,7 +201,7 @@ function formatSelectorForDisplay(selector: BAPSelector): string { const TOOLS: Tool[] = [ // Navigation { - name: "bap_navigate", + name: "navigate", description: "Navigate the browser to a URL. Use this to open web pages. Returns the page title and URL after navigation.", inputSchema: { @@ -217,7 +223,7 @@ const TOOLS: Tool[] = [ // Element Interaction { - name: "bap_click", + name: "click", description: 'Click an element on the page. Use semantic selectors like "role:button:Submit" or "text:Sign in" for reliability.', inputSchema: { @@ -237,7 +243,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_type", + name: "type", description: "Type text into an input field. First clicks the element, then types the text character by character.", inputSchema: { @@ -260,9 +266,9 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_fill", + name: "fill", description: - "Fill an input field with text (clears existing content first). Faster than bap_type for form filling.", + "Fill an input field with text (clears existing content first). Faster than type for form filling.", inputSchema: { type: "object", properties: { @@ -279,7 +285,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_press", + name: "press", description: "Press a keyboard key. Use for Enter, Tab, Escape, or keyboard shortcuts like Ctrl+A.", inputSchema: { type: "object", @@ -297,7 +303,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_select", + name: "select", description: "Select an option from a dropdown/select element.", inputSchema: { type: "object", @@ -315,7 +321,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_scroll", + name: "scroll", description: "Scroll the page or a specific element.", inputSchema: { type: "object", @@ -337,7 +343,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_hover", + name: "hover", description: "Hover over an element. Useful for triggering hover menus or tooltips.", inputSchema: { type: "object", @@ -353,7 +359,7 @@ const TOOLS: Tool[] = [ // Observations { - name: "bap_screenshot", + name: "screenshot", description: "Take a screenshot of the current page. Returns the image as base64. Use for visual verification.", inputSchema: { @@ -363,11 +369,20 @@ const TOOLS: Tool[] = [ type: "boolean", description: "Capture full page including scrollable content (default: false)", }, + format: { + type: "string", + enum: ["jpeg", "png"], + description: "Image format (default: jpeg). JPEG is ~60% smaller than PNG for typical pages.", + }, + quality: { + type: "number", + description: "JPEG quality 0-100 (default: 80). Ignored for PNG.", + }, }, }, }, { - name: "bap_accessibility", + name: "accessibility", description: "Get the accessibility tree of the page. Returns a structured representation ideal for understanding page layout and finding elements. RECOMMENDED: Use this before interacting with elements.", inputSchema: { @@ -381,7 +396,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_aria_snapshot", + name: "aria_snapshot", description: "Get a token-efficient YAML snapshot of the page accessibility tree. ~80% fewer tokens than full accessibility tree. Best for LLM context.", inputSchema: { @@ -395,7 +410,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_content", + name: "content", description: "Get page content as text or markdown. Useful for reading article content or extracting text.", inputSchema: { @@ -410,7 +425,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_element", + name: "element", description: "Query properties of a specific element. Check if an element exists, is visible, enabled, etc.", inputSchema: { @@ -435,7 +450,7 @@ const TOOLS: Tool[] = [ // Page Management { - name: "bap_pages", + name: "pages", description: "List all open pages/tabs. Returns page IDs and URLs.", inputSchema: { type: "object", @@ -443,21 +458,21 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_activate_page", + name: "activate_page", description: "Switch to a different page/tab by its ID.", inputSchema: { type: "object", properties: { pageId: { type: "string", - description: "ID of the page to activate (from bap_pages)", + description: "ID of the page to activate (from pages)", }, }, required: ["pageId"], }, }, { - name: "bap_close_page", + name: "close_page", description: "Close the current page/tab.", inputSchema: { type: "object", @@ -465,7 +480,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_go_back", + name: "go_back", description: "Navigate back in browser history.", inputSchema: { type: "object", @@ -473,7 +488,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_go_forward", + name: "go_forward", description: "Navigate forward in browser history.", inputSchema: { type: "object", @@ -481,7 +496,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_reload", + name: "reload", description: "Reload the current page.", inputSchema: { type: "object", @@ -491,7 +506,7 @@ const TOOLS: Tool[] = [ // Agent (Composite Actions, Observations, and Data Extraction) { - name: "bap_act", + name: "act", description: `Execute a sequence of browser actions in a single call. Useful for multi-step flows like login, form submission, or navigation sequences. Each step can have conditions and error handling. More efficient than calling actions individually.`, @@ -546,7 +561,7 @@ Each step can have conditions and error handling. More efficient than calling ac }, }, { - name: "bap_observe", + name: "observe", description: `Get an AI-optimized observation of the current page. Returns interactive elements with pre-computed selectors, making it easy to determine what actions are possible. Supports stable element refs and annotated screenshots. @@ -583,10 +598,10 @@ RECOMMENDED: Use this before complex interactions to understand the page.`, }, }, { - name: "bap_extract", + name: "extract", description: `Extract structured data from the current page. Uses CSS heuristics to find lists, tables, and labeled data matching your schema. -Works best with standard HTML patterns (ul/ol, tables, cards). For complex pages, use bap_content instead.`, +Works best with standard HTML patterns (ul/ol, tables, cards). For complex pages, use content instead.`, inputSchema: { type: "object", properties: { @@ -659,6 +674,21 @@ const RESOURCES: Resource[] = [ }, ]; +// ============================================================================= +// Browser Resolution +// ============================================================================= + +function resolveBrowser(browser: BrowserChoice): { browser: "chromium" | "firefox" | "webkit"; channel?: string } { + switch (browser) { + case "chrome": return { browser: "chromium", channel: "chrome" }; + case "chromium": return { browser: "chromium" }; + case "firefox": return { browser: "firefox" }; + case "webkit": return { browser: "webkit" }; + case "edge": return { browser: "chromium", channel: "msedge" }; + default: return { browser: "chromium", channel: "chrome" }; + } +} + // ============================================================================= // BAP MCP Server // ============================================================================= @@ -675,8 +705,10 @@ export class BAPMCPServer { constructor(options: BAPMCPServerOptions = {}) { this.options = { bapServerUrl: options.bapServerUrl ?? "ws://localhost:9222", - name: options.name ?? "bap-browser", + name: options.name ?? "BAPBrowser", version: options.version ?? "1.0.0", + browser: options.browser ?? "chrome", + headless: options.headless ?? true, verbose: options.verbose ?? false, allowedDomains: options.allowedDomains ?? [], maxSessionDuration: options.maxSessionDuration ?? 3600, @@ -730,28 +762,93 @@ export class BAPMCPServer { } /** - * Ensure BAP client is connected + * Ensure BAP client is connected. + * - On first call: creates transport, connects, launches browser + * - On subsequent calls: checks if connection is still alive, reconnects if needed */ private async ensureClient(): Promise { - if (this.client) { - return this.client; + // If we have a client, verify the connection is still alive + if (this.client && this.transport) { + try { + // Lightweight health check — list pages to confirm the connection works + await this.client.listPages(); + return this.client; + } catch { + // Connection is dead — clean up and reconnect + this.log("Connection lost, reconnecting..."); + await this.resetClient(); + } } this.log("Connecting to BAP server:", this.options.bapServerUrl); - this.transport = new WebSocketTransport(this.options.bapServerUrl); + this.transport = new WebSocketTransport(this.options.bapServerUrl, { + autoReconnect: true, + maxReconnectAttempts: 5, + reconnectDelay: 1000, + }); + + // Hook reconnect callbacks for verbose logging + this.transport.onReconnecting = (attempt, max) => { + this.log(`Reconnecting to BAP server (attempt ${attempt}/${max})...`); + }; + this.transport.onReconnected = () => { + this.log("Reconnected to BAP server"); + }; + this.transport.onClose = () => { + this.log("BAP server connection closed"); + }; + this.client = new BAPClient(this.transport); // Connect and initialize the protocol await this.client.connect(); - // Launch browser - await this.client.launch({ headless: false }); + // Launch browser with configured browser/channel + const resolved = resolveBrowser(this.options.browser); + const headless = this.options.headless ?? true; + try { + await this.client.launch({ + browser: resolved.browser, + channel: resolved.channel, + headless, + }); + } catch (err) { + // If a channel was specified (e.g. local Chrome) and it's not found, fall back to bundled Chromium + if (resolved.channel && String(err).includes("Looks like")) { + this.log( + `Local ${this.options.browser} not found, falling back to bundled Chromium` + ); + await this.client.launch({ browser: "chromium", headless }); + } else { + // Clean up on launch failure to avoid leaking the transport + await this.resetClient(); + throw err; + } + } this.log("BAP client connected and browser launched"); return this.client; } + /** + * Reset client and transport state for reconnection + */ + private async resetClient(): Promise { + try { + if (this.client) { + await this.client.close(); + } + } catch { /* ignore cleanup errors */ } + try { + if (this.transport) { + await this.transport.close(); + } + } catch { /* ignore cleanup errors */ } + this.client = null; + this.transport = null; + } + /** * Set up MCP request handlers */ @@ -815,7 +912,7 @@ export class BAPMCPServer { switch (name) { // Navigation - case "bap_navigate": { + case "navigate": { const url = args.url as string; // Security check @@ -853,7 +950,7 @@ export class BAPMCPServer { } // Element Interaction - case "bap_click": { + case "click": { const selector = parseSelector(args.selector as string); const options = args.clickCount ? { clickCount: args.clickCount as number } : undefined; await client.click(selector, options); @@ -862,7 +959,7 @@ export class BAPMCPServer { }; } - case "bap_type": { + case "type": { const selector = parseSelector(args.selector as string); const text = args.text as string; const delay = args.delay as number | undefined; @@ -872,7 +969,7 @@ export class BAPMCPServer { }; } - case "bap_fill": { + case "fill": { const selector = parseSelector(args.selector as string); const value = args.value as string; await client.fill(selector, value); @@ -881,7 +978,7 @@ export class BAPMCPServer { }; } - case "bap_press": { + case "press": { const key = args.key as string; const selector = args.selector ? parseSelector(args.selector as string) : undefined; await client.press(key, selector); @@ -890,7 +987,7 @@ export class BAPMCPServer { }; } - case "bap_select": { + case "select": { const selector = parseSelector(args.selector as string); const value = args.value as string; await client.select(selector, value); @@ -899,7 +996,7 @@ export class BAPMCPServer { }; } - case "bap_scroll": { + case "scroll": { const direction = (args.direction as ScrollDirection) ?? "down"; const amount = (args.amount as number) ?? 500; const selector = args.selector ? parseSelector(args.selector as string) : undefined; @@ -909,7 +1006,7 @@ export class BAPMCPServer { }; } - case "bap_hover": { + case "hover": { const selector = parseSelector(args.selector as string); await client.hover(selector); return { @@ -918,9 +1015,11 @@ export class BAPMCPServer { } // Observations - case "bap_screenshot": { + case "screenshot": { const fullPage = args.fullPage as boolean ?? false; - const result = await client.screenshot({ fullPage }); + const format = (args.format as string) === "png" ? "png" : "jpeg"; + const quality = typeof args.quality === "number" ? args.quality : (format === "jpeg" ? 80 : undefined); + const result = await client.screenshot({ fullPage, format, quality }); return { content: [ { @@ -932,7 +1031,7 @@ export class BAPMCPServer { }; } - case "bap_accessibility": { + case "accessibility": { const interestingOnly = args.interestingOnly as boolean ?? true; const result = await client.accessibility({ interestingOnly }); return { @@ -945,7 +1044,7 @@ export class BAPMCPServer { }; } - case "bap_aria_snapshot": { + case "aria_snapshot": { const selector = args.selector ? parseSelector(args.selector as string) : undefined; const result = await client.ariaSnapshot(selector); return { @@ -958,7 +1057,7 @@ export class BAPMCPServer { }; } - case "bap_content": { + case "content": { const format = (args.format as ContentFormat) ?? "text"; const result = await client.content(format); return { @@ -966,7 +1065,7 @@ export class BAPMCPServer { }; } - case "bap_element": { + case "element": { const selector = parseSelector(args.selector as string); const properties = (args.properties as ElementProperty[]) ?? ["visible", "enabled"]; const result = await client.element(selector, properties); @@ -981,7 +1080,7 @@ export class BAPMCPServer { } // Page Management - case "bap_pages": { + case "pages": { const result = await client.listPages(); const text = result.pages .map((p) => `${p.id === result.activePage ? "* " : " "}${p.id}: ${p.url} (${p.title})`) @@ -991,7 +1090,7 @@ export class BAPMCPServer { }; } - case "bap_activate_page": { + case "activate_page": { const pageId = args.pageId as string; await client.activatePage(pageId); return { @@ -999,28 +1098,28 @@ export class BAPMCPServer { }; } - case "bap_close_page": { + case "close_page": { await client.closePage(); return { content: [{ type: "text", text: "Closed current page" }], }; } - case "bap_go_back": { + case "go_back": { await client.goBack(); return { content: [{ type: "text", text: "Navigated back" }], }; } - case "bap_go_forward": { + case "go_forward": { await client.goForward(); return { content: [{ type: "text", text: "Navigated forward" }], }; } - case "bap_reload": { + case "reload": { await client.reload(); return { content: [{ type: "text", text: "Reloaded page" }], @@ -1028,7 +1127,7 @@ export class BAPMCPServer { } // Agent (Composite Actions, Observations, and Data Extraction) - case "bap_act": { + case "act": { interface InputStep { label?: string; action: string; @@ -1094,7 +1193,7 @@ export class BAPMCPServer { }; } - case "bap_observe": { + case "observe": { const annotate = args.annotateScreenshot as boolean; const result = await client.observe({ includeScreenshot: (args.includeScreenshot as boolean) || annotate, @@ -1166,7 +1265,7 @@ export class BAPMCPServer { return { content }; } - case "bap_extract": { + case "extract": { const result = await client.extract({ instruction: args.instruction as string, schema: args.schema as ExtractionSchema, @@ -1300,14 +1399,7 @@ export class BAPMCPServer { * Close the server and clean up */ async close(): Promise { - if (this.client) { - await this.client.close(); - this.client = null; - } - if (this.transport) { - await this.transport.close(); - this.transport = null; - } + await this.resetClient(); await this.server.close(); this.log("BAP MCP Server closed"); } diff --git a/packages/protocol/src/types/methods.ts b/packages/protocol/src/types/methods.ts index 9d87de5..d9b0a3a 100644 --- a/packages/protocol/src/types/methods.ts +++ b/packages/protocol/src/types/methods.ts @@ -42,6 +42,7 @@ export type ProxyConfig = z.infer; /** browser/launch parameters */ export const BrowserLaunchParamsSchema = z.object({ browser: BrowserTypeSchema.optional(), + channel: z.string().optional(), headless: z.boolean().optional(), args: z.array(z.string()).optional(), proxy: ProxyConfigSchema.optional(), diff --git a/packages/protocol/src/types/protocol.ts b/packages/protocol/src/types/protocol.ts index 1ae91b9..77de263 100644 --- a/packages/protocol/src/types/protocol.ts +++ b/packages/protocol/src/types/protocol.ts @@ -10,7 +10,7 @@ import { z } from "zod"; // ============================================================================= /** Current BAP protocol version */ -export const BAP_VERSION = "0.1.0"; +export const BAP_VERSION = "0.2.0"; // ============================================================================= // JSON-RPC 2.0 Schemas @@ -70,18 +70,18 @@ export const JSONRPCNotificationSchema = z.object({ }); export type JSONRPCNotification = z.infer; -/** Union of all JSON-RPC response types */ +/** Union of all JSON-RPC response types (error first to prevent z.unknown() from swallowing errors) */ export const JSONRPCResponseSchema = z.union([ - JSONRPCSuccessResponseSchema, JSONRPCErrorResponseSchema, + JSONRPCSuccessResponseSchema, ]); export type JSONRPCResponse = z.infer; -/** Union of all JSON-RPC message types */ +/** Union of all JSON-RPC message types (error before success to prevent z.unknown() from swallowing errors) */ export const JSONRPCMessageSchema = z.union([ JSONRPCRequestSchema, - JSONRPCSuccessResponseSchema, JSONRPCErrorResponseSchema, + JSONRPCSuccessResponseSchema, JSONRPCNotificationSchema, ]); export type JSONRPCMessage = z.infer; diff --git a/packages/python-sdk/package.json b/packages/python-sdk/package.json index e28d347..232de4b 100644 --- a/packages/python-sdk/package.json +++ b/packages/python-sdk/package.json @@ -1,6 +1,6 @@ { "name": "@browseragentprotocol/python-client", - "version": "0.1.0", + "version": "0.2.0", "private": true, "description": "Python SDK for Browser Agent Protocol (BAP) - build scripts only", "scripts": { @@ -9,5 +9,10 @@ "lint": "python -m ruff check src/browseragentprotocol || true", "lint:fix": "python -m ruff check --fix src/browseragentprotocol || true", "clean": "rm -rf dist build *.egg-info .mypy_cache .ruff_cache __pycache__ src/**/__pycache__" + }, + "turbo": { + "build": { + "outputs": [] + } } } diff --git a/packages/python-sdk/pyproject.toml b/packages/python-sdk/pyproject.toml index 7010f88..7f7fe45 100644 --- a/packages/python-sdk/pyproject.toml +++ b/packages/python-sdk/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "browser-agent-protocol" -version = "0.1.0a1" +version = "0.2.0" description = "Python SDK for the Browser Agent Protocol (BAP) - control browsers with AI agents" readme = "README.md" license = { text = "Apache-2.0" } @@ -55,10 +55,10 @@ dev = [ bap = "browseragentprotocol.cli:main" [project.urls] -Homepage = "https://github.com/anthropics/browser-agent-protocol" -Documentation = "https://github.com/anthropics/browser-agent-protocol#readme" -Repository = "https://github.com/anthropics/browser-agent-protocol" -Issues = "https://github.com/anthropics/browser-agent-protocol/issues" +Homepage = "https://github.com/browseragentprotocol/bap" +Documentation = "https://github.com/browseragentprotocol/bap#readme" +Repository = "https://github.com/browseragentprotocol/bap" +Issues = "https://github.com/browseragentprotocol/bap/issues" [build-system] requires = ["hatchling"] diff --git a/packages/python-sdk/src/browseragentprotocol/__init__.py b/packages/python-sdk/src/browseragentprotocol/__init__.py index 64b1a88..65fc2d0 100644 --- a/packages/python-sdk/src/browseragentprotocol/__init__.py +++ b/packages/python-sdk/src/browseragentprotocol/__init__.py @@ -54,7 +54,7 @@ async def main(): ``` """ -__version__ = "0.1.0a1" +__version__ = "0.2.0" # Main client classes from browseragentprotocol.client import BAPClient diff --git a/packages/python-sdk/src/browseragentprotocol/client.py b/packages/python-sdk/src/browseragentprotocol/client.py index 0f4aebb..cd3be9f 100644 --- a/packages/python-sdk/src/browseragentprotocol/client.py +++ b/packages/python-sdk/src/browseragentprotocol/client.py @@ -108,7 +108,7 @@ def __init__( *, token: str | None = None, name: str = "bap-client-python", - version: str = "0.1.0", + version: str = "0.2.0", timeout: float = 30.0, events: list[str] | None = None, ): @@ -274,6 +274,7 @@ def capabilities(self) -> dict[str, Any] | None: async def launch( self, browser: Literal["chromium", "firefox", "webkit"] | None = None, + channel: str | None = None, headless: bool | None = None, args: list[str] | None = None, **kwargs: Any, @@ -283,6 +284,7 @@ async def launch( Args: browser: Browser type (chromium, firefox, webkit) + channel: Playwright channel (e.g. "chrome", "msedge") headless: Run in headless mode args: Additional browser arguments **kwargs: Additional launch options @@ -293,6 +295,8 @@ async def launch( params: dict[str, Any] = {} if browser is not None: params["browser"] = browser + if channel is not None: + params["channel"] = channel if headless is not None: params["headless"] = headless if args is not None: diff --git a/packages/python-sdk/src/browseragentprotocol/context.py b/packages/python-sdk/src/browseragentprotocol/context.py index 0fcba35..40f1b0c 100644 --- a/packages/python-sdk/src/browseragentprotocol/context.py +++ b/packages/python-sdk/src/browseragentprotocol/context.py @@ -17,7 +17,7 @@ async def bap_client( *, token: str | None = None, name: str = "bap-client-python", - version: str = "0.1.0", + version: str = "0.2.0", timeout: float = 30.0, events: list[str] | None = None, browser: Literal["chromium", "firefox", "webkit"] | None = None, diff --git a/packages/python-sdk/src/browseragentprotocol/sync_client.py b/packages/python-sdk/src/browseragentprotocol/sync_client.py index 379305f..6c1fb58 100644 --- a/packages/python-sdk/src/browseragentprotocol/sync_client.py +++ b/packages/python-sdk/src/browseragentprotocol/sync_client.py @@ -74,7 +74,7 @@ def __init__( *, token: str | None = None, name: str = "bap-client-python-sync", - version: str = "0.1.0", + version: str = "0.2.0", timeout: float = 30.0, events: list[str] | None = None, ): @@ -151,13 +151,14 @@ def capabilities(self) -> dict[str, Any] | None: def launch( self, browser: Literal["chromium", "firefox", "webkit"] | None = None, + channel: str | None = None, headless: bool | None = None, args: list[str] | None = None, **kwargs: Any, ) -> BrowserLaunchResult: """Launch a browser instance.""" return self._run( - self._async_client.launch(browser=browser, headless=headless, args=args, **kwargs) + self._async_client.launch(browser=browser, channel=channel, headless=headless, args=args, **kwargs) ) def close_browser(self, browser_id: str | None = None) -> None: diff --git a/packages/python-sdk/src/browseragentprotocol/types/methods.py b/packages/python-sdk/src/browseragentprotocol/types/methods.py index 9da65e6..c27a342 100644 --- a/packages/python-sdk/src/browseragentprotocol/types/methods.py +++ b/packages/python-sdk/src/browseragentprotocol/types/methods.py @@ -84,6 +84,7 @@ class BrowserLaunchParams(BaseModel): """Parameters for browser/launch.""" browser: Literal["chromium", "firefox", "webkit"] | None = None + channel: str | None = None headless: bool | None = None args: list[str] | None = None env: dict[str, str] | None = None diff --git a/packages/python-sdk/src/browseragentprotocol/types/protocol.py b/packages/python-sdk/src/browseragentprotocol/types/protocol.py index 3f449d4..cb3deb8 100644 --- a/packages/python-sdk/src/browseragentprotocol/types/protocol.py +++ b/packages/python-sdk/src/browseragentprotocol/types/protocol.py @@ -12,7 +12,7 @@ # Protocol Version # ============================================================================= -BAP_VERSION = "0.1.0" +BAP_VERSION = "0.2.0" # ============================================================================= # Request ID diff --git a/packages/server-playwright/README.md b/packages/server-playwright/README.md index cffb3a4..eb8dee7 100644 --- a/packages/server-playwright/README.md +++ b/packages/server-playwright/README.md @@ -81,8 +81,8 @@ await client.close(); ### With MCP (for AI agents) ```bash -# Add to Claude Code -claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp +# Add to any MCP-compatible client via CLI +npx @browseragentprotocol/mcp ``` ## Programmatic Usage diff --git a/packages/server-playwright/src/cli.ts b/packages/server-playwright/src/cli.ts index b28ddde..be5bf19 100644 --- a/packages/server-playwright/src/cli.ts +++ b/packages/server-playwright/src/cli.ts @@ -76,7 +76,7 @@ function parseArgs(): Partial { process.exit(0); } else if (arg === "--version" || arg === "-v") { console.log( - `${icons.server} BAP Playwright Server ${pc.dim("v0.1.0-alpha.1")}` + `${icons.server} BAP Playwright Server ${pc.dim("v0.2.0")}` ); process.exit(0); } @@ -167,7 +167,7 @@ async function main(): Promise { console.log( banner({ title: "BAP Playwright Server", - version: "0.1.0-alpha.1", + version: "0.2.0", subtitle: "Browser Agent Protocol", }) ); diff --git a/packages/server-playwright/src/server.ts b/packages/server-playwright/src/server.ts index 6c97a99..4f18856 100644 --- a/packages/server-playwright/src/server.ts +++ b/packages/server-playwright/src/server.ts @@ -151,7 +151,7 @@ const ScopeProfiles = { readonly: ['page:read', 'observe:*'] as BAPScope[], standard: [ 'browser:launch', 'browser:close', 'page:*', - 'action:click', 'action:type', 'action:fill', 'action:scroll', 'action:select', + 'action:*', 'observe:*', 'emulate:viewport', ] as BAPScope[], full: ['browser:*', 'page:*', 'action:*', 'observe:*', 'emulate:*', 'trace:*'] as BAPScope[], @@ -396,6 +396,8 @@ export interface BAPServerOptions { host?: string; /** Default browser type */ defaultBrowser?: "chromium" | "firefox" | "webkit"; + /** Default Playwright channel (e.g. "chrome", "msedge") */ + defaultChannel?: string; /** Default headless mode */ headless?: boolean; /** Enable debug logging */ @@ -477,9 +479,10 @@ const BLOCKED_BROWSER_ARGS = [ /** * SECURE-BY-DEFAULT: All security options enabled by default */ -const DEFAULT_OPTIONS: Required> & { +const DEFAULT_OPTIONS: Required> & { authToken: string | undefined; authTokenEnvVar: string; + defaultChannel: string | undefined; security: Required; limits: Required; authorization: Required; @@ -489,6 +492,7 @@ const DEFAULT_OPTIONS: Required 0 ? sanitizedArgs : undefined, proxy: params.proxy, downloadsPath: validatedDownloadsPath, }); // Create the default context - const defaultContext = await state.browser.newContext(); + // Force deviceScaleFactor: 1 for consistent screenshot sizes across platforms + // (retina Macs default to 2x, which doubles pixel count and inflates payloads) + const defaultContext = await state.browser.newContext({ + deviceScaleFactor: 1, + }); const version = state.browser.version(); // Use crypto.randomUUID for unique IDs const contextId = `ctx-${randomUUID().slice(0, 8)}`; @@ -1872,16 +1883,19 @@ export class BAPPlaywrightServer extends EventEmitter { this.checkRateLimit(state, 'screenshot'); // Playwright only supports "png" and "jpeg" for screenshots + // Default to JPEG — ~60% smaller payloads, reducing LLM token cost const screenshotType = (options?.format === "jpeg" || options?.format === "png") ? options.format - : "png"; + : "jpeg"; const buffer = await page.screenshot({ fullPage: options?.fullPage, clip: options?.clip, type: screenshotType, - quality: options?.quality, - scale: options?.scale, + quality: options?.quality ?? (screenshotType === "jpeg" ? 80 : undefined), + // Default to CSS scale to ensure consistent 1x screenshots regardless + // of device pixel ratio (prevents 2x images on retina displays) + scale: options?.scale ?? "css", }); // Parse image dimensions from the buffer @@ -1890,14 +1904,14 @@ export class BAPPlaywrightServer extends EventEmitter { let width: number; let height: number; - const format = options?.format ?? "png"; + const format = screenshotType; if (format === "png" && buffer[0] === 0x89 && buffer[1] === 0x50) { // PNG: Read dimensions from IHDR chunk (offset 16 for width, 20 for height) width = buffer.readUInt32BE(16); height = buffer.readUInt32BE(20); - } else if ((format === "jpeg" || format === "webp") && buffer.length > 0) { - // For JPEG/WebP, fall back to viewport dimensions or clip + } else if (format === "jpeg" && buffer.length > 0) { + // For JPEG, fall back to viewport dimensions or clip // (Parsing JPEG headers is complex, use viewport as approximation) const viewport = page.viewportSize() ?? { width: 1280, height: 720 }; if (options?.clip) { @@ -2679,12 +2693,19 @@ export class BAPPlaywrightServer extends EventEmitter { // Screenshot (with optional annotation) if (params.includeScreenshot || params.annotateScreenshot) { const viewport = page.viewportSize(); - let buffer = await page.screenshot({ type: "png" }); + // Use JPEG by default for ~60% smaller payloads (less LLM token cost) + // Annotations require PNG for sharp badge rendering + const useAnnotation = params.annotateScreenshot && interactiveElements && interactiveElements.length > 0; + const obsFormat = useAnnotation ? "png" as const : "jpeg" as const; + let buffer = await page.screenshot({ + type: obsFormat, + quality: obsFormat === "jpeg" ? 80 : undefined, + }); let annotated = false; let annotationMap: AnnotationMapping[] | undefined; // Apply annotation if requested - if (params.annotateScreenshot && interactiveElements && interactiveElements.length > 0) { + if (useAnnotation && interactiveElements) { const annotationOpts: AnnotationOptions = typeof params.annotateScreenshot === 'object' ? params.annotateScreenshot : { enabled: true }; @@ -2704,7 +2725,7 @@ export class BAPPlaywrightServer extends EventEmitter { result.screenshot = { data: buffer.toString("base64"), - format: "png", + format: obsFormat, width: viewport?.width ?? 0, height: viewport?.height ?? 0, annotated, @@ -2919,123 +2940,480 @@ export class BAPPlaywrightServer extends EventEmitter { } /** - * Extract data from content based on instruction and schema - * This is a basic implementation - a full implementation would use LLM + * Extract data from content based on instruction and schema. + * + * Strategy: + * 1. Scope to the main content area (skip nav/header/footer). + * 2. For lists: find repeating item containers, then use schema + * property names to locate child elements within each item. + * 3. For objects: search for labeled values. + * 4. Coerce types based on schema (string, number, boolean). */ private async extractDataFromContent( page: PlaywrightPage, - content: string, - _instruction: string, // Used for future LLM-based extraction + _content: string, + _instruction: string, schema: { type: string; properties?: Record; items?: unknown }, mode: string, includeSourceRefs: boolean ): Promise<{ data: unknown; sources?: { ref: string; selector: BAPSelector; text?: string }[]; confidence: number }> { - // Basic extraction logic based on common patterns - // This extracts data by finding elements that match the schema structure - const sources: { ref: string; selector: BAPSelector; text?: string }[] = []; + // ── Step 1: Scope to main content area ────────────────────────── + // Try semantic landmarks first, fall back to body + const contentRoot = await this.findContentRoot(page); + if (schema.type === "array" || mode === "list") { - // Extract list of items - const items: unknown[] = []; + const items = await this.extractList( + page, contentRoot, schema, includeSourceRefs, sources + ); + return { + data: items, + sources: includeSourceRefs ? sources : undefined, + confidence: items.length > 0 ? 0.8 : 0.3, + }; + } - // Try to find list-like elements based on the instruction - const listSelectors = [ - 'ul li', 'ol li', '[role="listitem"]', 'tr', '.item', '.card', '[class*="item"]', '[class*="card"]' - ]; + if (mode === "table") { + const rows = await this.extractTable( + page, contentRoot, schema, includeSourceRefs, sources + ); + return { + data: rows, + sources: includeSourceRefs ? sources : undefined, + confidence: rows.length > 0 ? 0.8 : 0.3, + }; + } + + if (schema.type === "object" && schema.properties) { + const result = await this.extractObject( + page, contentRoot, schema, includeSourceRefs, sources + ); + return { + data: result.data, + sources: includeSourceRefs ? sources : undefined, + confidence: result.confidence, + }; + } + + // Default: return scoped text content + const text = await contentRoot.textContent() ?? ""; + return { + data: text.trim().slice(0, 5000), + sources: includeSourceRefs ? sources : undefined, + confidence: 0.5, + }; + } + + /** + * Find the main content area of the page, skipping nav/header/footer. + * Returns a Locator scoped to the best content root. + */ + private async findContentRoot(page: PlaywrightPage) { + // Try semantic landmarks in priority order + const candidates = ['main', '[role="main"]', '#content', '.content', '#main', '.page', '.container', '[role="document"]']; + for (const sel of candidates) { + try { + const loc = page.locator(sel).first(); + if (await loc.count() > 0) { + // Verify it has substantial content (not just a wrapper with nav inside) + const text = await loc.textContent() ?? ""; + if (text.trim().length > 100) return loc; + } + } catch { /* continue */ } + } + // Fallback: body + return page.locator('body'); + } + + /** + * Extract a list of items from repeating elements. + * Uses schema property names to locate child values within each item container. + */ + private async extractList( + _page: PlaywrightPage, + root: ReturnType, + schema: { items?: unknown }, + includeSourceRefs: boolean, + sources: { ref: string; selector: BAPSelector; text?: string }[] + ): Promise { + const itemSchema = schema.items as { type?: string; properties?: Record } | undefined; + const isObjectItems = itemSchema?.type === 'object' && itemSchema.properties; + + // ── Find the best repeating container ──────────────────────────── + // Selectors are ordered by semantic priority: article > role > class-name > generic. + // Use the FIRST selector with 2+ matches rather than the one with the most, + // because generic selectors (ul li) often match sidebar/nav noise. + const containerSelectors = [ + 'article', '[role="listitem"]', + '.product', '.card', '.item', '.listing', '.result', '.entry', '.post', + '[class*="product"]', '[class*="card"]', '[class*="item"]', + 'table tbody tr', + 'ol li', 'ul li', + ]; + + let bestSelector = ''; + let bestCount = 0; + + for (const sel of containerSelectors) { + try { + const count = await root.locator(sel).count(); + if (count >= 2) { + bestSelector = sel; + bestCount = count; + break; // First semantic match wins + } + } catch { /* continue */ } + } + + if (!bestSelector || bestCount === 0) return []; - for (const selector of listSelectors) { + const elements = await root.locator(bestSelector).all(); + const items: unknown[] = []; + const limit = Math.min(elements.length, 100); + + for (let i = 0; i < limit; i++) { + const el = elements[i]!; + + // Skip elements that are likely not visible or too small + try { + const box = await el.boundingBox(); + if (box && (box.width < 10 || box.height < 10)) continue; + } catch { /* proceed anyway */ } + + if (isObjectItems && itemSchema?.properties) { + // ── Schema-aware extraction: match property names to child elements ── + const obj = await this.extractPropertiesFromElement(el, itemSchema.properties); + // Only include if at least one property has a non-empty value + const hasValue = Object.values(obj).some(v => v !== null && v !== undefined && v !== ''); + if (hasValue) { + items.push(obj); + if (includeSourceRefs) { + sources.push({ + ref: `@s${items.length}`, + selector: { type: 'css', value: `${bestSelector}:nth-child(${i + 1})` }, + text: Object.values(obj).filter(Boolean).join(' | ').slice(0, 100), + }); + } + } + } else { + // Simple string items + const text = await el.textContent(); + if (text?.trim()) { + items.push(text.trim()); + if (includeSourceRefs) { + sources.push({ + ref: `@s${items.length}`, + selector: { type: 'css', value: `${bestSelector}:nth-child(${i + 1})` }, + text: text.trim().slice(0, 100), + }); + } + } + } + } + + // ── Fallback: if schema-aware extraction produced 0 items from matched ── + // elements, retry with text-based extraction (extract each property by + // splitting each container's inner text). This handles cases where CSS + // class names don't align with schema property names. + if (items.length === 0 && isObjectItems && itemSchema?.properties && elements.length > 0) { + const propNames = Object.keys(itemSchema.properties); + for (let i = 0; i < limit; i++) { + const el = elements[i]!; try { - const elements = await page.locator(selector).all(); - if (elements.length > 0) { - for (let i = 0; i < Math.min(elements.length, 50); i++) { - const el = elements[i]; - const text = await el.textContent(); - if (text && text.trim()) { - if (schema.items && typeof schema.items === 'object' && 'type' in schema.items) { - if ((schema.items as { type: string }).type === 'string') { - items.push(text.trim()); - } else if ((schema.items as { type: string }).type === 'object') { - // Try to extract object structure - items.push({ text: text.trim() }); - } + const box = await el.boundingBox(); + if (box && (box.width < 10 || box.height < 10)) continue; + } catch { /* proceed */ } + + const fullText = await el.textContent() ?? ''; + if (!fullText.trim()) continue; + + const obj: Record = {}; + // Try to extract known patterns from the full text + for (const key of propNames) { + const kl = key.toLowerCase(); + if (kl === 'title' || kl === 'name') { + // First link's title attribute, or first heading text + try { + const heading = el.locator('h1, h2, h3, h4, h5, h6').first(); + if (await heading.count() > 0) { + const link = heading.locator('a').first(); + if (await link.count() > 0) { + obj[key] = await link.getAttribute('title') ?? await link.textContent() ?? null; } else { - items.push(text.trim()); + obj[key] = await heading.textContent() ?? null; } + } + } catch { /* skip */ } + } else if (kl === 'price' || kl === 'cost' || kl === 'amount') { + const priceMatch = fullText.match(/[$€£¥]\s*[\d,.]+|[\d,.]+\s*[$€£¥]/); + if (priceMatch) obj[key] = priceMatch[0].trim(); + } else if (kl === 'url' || kl === 'link' || kl === 'href') { + try { + const link = el.locator('a').first(); + if (await link.count() > 0) obj[key] = await link.getAttribute('href'); + } catch { /* skip */ } + } else if (kl === 'rating') { + // Try star-rating class pattern (e.g., "star-rating Three") + try { + const ratingEl = el.locator('[class*="rating"], [class*="star"]').first(); + if (await ratingEl.count() > 0) { + const cls = await ratingEl.getAttribute('class') ?? ''; + const parts = cls.split(/\s+/).filter(c => !c.toLowerCase().includes('rating') && !c.toLowerCase().includes('star') && c.length > 0); + if (parts.length > 0) obj[key] = parts[parts.length - 1]; + } + } catch { /* skip */ } + } else if (kl === 'availability' || kl === 'stock' || kl === 'status') { + try { + const stockEl = el.locator('[class*="avail"], [class*="stock"], .availability, .stock').first(); + if (await stockEl.count() > 0) { + obj[key] = (await stockEl.textContent())?.trim() ?? null; + } + } catch { /* skip */ } + } + } - if (includeSourceRefs) { - sources.push({ - ref: `@s${i + 1}`, - selector: { type: 'css', value: `${selector}:nth-child(${i + 1})` }, - text: text.trim().slice(0, 100), - }); - } + const hasValue = Object.values(obj).some(v => v !== null && v !== undefined && v !== ''); + if (hasValue) { + items.push(obj); + if (includeSourceRefs) { + sources.push({ + ref: `@s${items.length}`, + selector: { type: 'css', value: `${bestSelector}:nth-child(${i + 1})` }, + text: Object.values(obj).filter(Boolean).join(' | ').slice(0, 100), + }); + } + } + } + } + + return items; + } + + /** + * Extract property values from a single element container. + * For each schema property, tries class-name matching, then attribute + * matching, then falls back to positional heuristics. + */ + private async extractPropertiesFromElement( + el: ReturnType, + properties: Record + ): Promise> { + const result: Record = {}; + + for (const [key, propSchema] of Object.entries(properties)) { + const keyLower = key.toLowerCase(); + let value: string | null = null; + + // Strategy 1: Find child element whose class or tag contains the property name + const classSelectors = [ + `[class*="${keyLower}"]`, + `[data-${keyLower}]`, + `.${keyLower}`, + ]; + + for (const sel of classSelectors) { + try { + const child = el.locator(sel).first(); + if (await child.count() > 0) { + // For links, prefer title attribute (common for truncated titles) + if (keyLower === 'title' || keyLower === 'name') { + value = await child.getAttribute('title') ?? await child.textContent(); + } else { + value = await child.textContent(); + } + // If text is empty, try extracting from class name (e.g. "star-rating Three" → "Three") + if (!value?.trim()) { + const cls = await child.getAttribute('class') ?? ''; + const clsParts = cls.split(/\s+/).filter(c => !c.includes(keyLower) && c.length > 0); + if (clsParts.length > 0) value = clsParts[clsParts.length - 1] ?? null; + } + if (value?.trim()) break; + } + } catch { /* continue */ } + } + + // Strategy 2: For known property patterns, try specific selectors + if (!value?.trim()) { + try { + if (keyLower === 'title' || keyLower === 'name') { + // Headings, links with title attribute + for (const sel of ['h1 a', 'h2 a', 'h3 a', 'h4 a', 'h1', 'h2', 'h3', 'h4', 'a[title]']) { + const child = el.locator(sel).first(); + if (await child.count() > 0) { + value = await child.getAttribute('title') ?? await child.textContent(); + if (value?.trim()) break; } } - if (items.length > 0) break; + } else if (keyLower === 'price' || keyLower === 'cost' || keyLower === 'amount') { + // Price patterns + const text = await el.textContent() ?? ''; + const priceMatch = text.match(/[$€£¥]\s*[\d,.]+|[\d,.]+\s*[$€£¥]/); + if (priceMatch) value = priceMatch[0].trim(); + } else if (keyLower === 'url' || keyLower === 'link' || keyLower === 'href') { + const link = el.locator('a').first(); + if (await link.count() > 0) { + value = await link.getAttribute('href'); + } + } else if (keyLower === 'image' || keyLower === 'img' || keyLower === 'thumbnail') { + const img = el.locator('img').first(); + if (await img.count() > 0) { + value = await img.getAttribute('src'); + } } - } catch { - // Continue to next selector - } + } catch { /* continue */ } } - return { data: items, sources: includeSourceRefs ? sources : undefined, confidence: items.length > 0 ? 0.7 : 0.3 }; + // Coerce type + const trimmed = value?.trim() ?? null; + if (trimmed === null) { + result[key] = null; + } else if (propSchema.type === 'number') { + const num = parseFloat(trimmed.replace(/[^0-9.-]/g, '')); + result[key] = isNaN(num) ? trimmed : num; + } else if (propSchema.type === 'boolean') { + result[key] = ['true', 'yes', '1', 'in stock', 'available'].includes(trimmed.toLowerCase()); + } else { + result[key] = trimmed; + } } - if (schema.type === "object" && schema.properties) { - // Extract object with properties - const result: Record = {}; - const properties = schema.properties as Record; - - for (const [key, propSchema] of Object.entries(properties)) { - // Try to find content matching this property - const searchTerms = [key, propSchema.description].filter(Boolean); - - for (const term of searchTerms) { - if (!term) continue; - - // Look for labels or headings containing the term - const labelSelectors = [ - `label:has-text("${term}")`, - `th:has-text("${term}")`, - `dt:has-text("${term}")`, - `[class*="${term.toLowerCase()}"]`, - ]; - - for (const selector of labelSelectors) { - try { - const label = await page.locator(selector).first(); - if (await label.count() > 0) { - // Try to find associated value - const parent = label.locator('..'); - const siblingText = await parent.textContent(); - if (siblingText) { - const value = siblingText.replace(new RegExp(term, 'gi'), '').trim(); - if (value) { - result[key] = propSchema.type === 'number' ? parseFloat(value) || value : value; - break; + return result; + } + + /** + * Extract tabular data from an HTML table. + */ + private async extractTable( + _page: PlaywrightPage, + root: ReturnType, + schema: { items?: unknown }, + includeSourceRefs: boolean, + sources: { ref: string; selector: BAPSelector; text?: string }[] + ): Promise { + const rows: unknown[] = []; + const itemSchema = schema.items as { properties?: Record } | undefined; + + try { + // Find table headers to map columns + const headers: string[] = []; + const thElements = await root.locator('table th').all(); + for (const th of thElements) { + headers.push((await th.textContent() ?? '').trim().toLowerCase()); + } + + // Extract rows + const trElements = await root.locator('table tbody tr').all(); + const limit = Math.min(trElements.length, 100); + + for (let i = 0; i < limit; i++) { + const tr = trElements[i]!; + const cells = await tr.locator('td').all(); + const obj: Record = {}; + + if (itemSchema?.properties) { + // Map schema properties to table columns by name + for (const [key, propSchema] of Object.entries(itemSchema.properties)) { + const colIdx = headers.findIndex(h => h.includes(key.toLowerCase())); + if (colIdx >= 0 && colIdx < cells.length) { + const text = (await cells[colIdx]!.textContent() ?? '').trim(); + obj[key] = propSchema.type === 'number' ? (parseFloat(text.replace(/[^0-9.-]/g, '')) || text) : text; + } + } + } else { + // No schema properties — use headers as keys + for (let c = 0; c < cells.length; c++) { + const key = c < headers.length ? headers[c]! : `col${c}`; + obj[key] = (await cells[c]!.textContent() ?? '').trim(); + } + } + + if (Object.values(obj).some(v => v !== null && v !== undefined && v !== '')) { + rows.push(obj); + if (includeSourceRefs) { + sources.push({ + ref: `@s${rows.length}`, + selector: { type: 'css', value: `table tbody tr:nth-child(${i + 1})` }, + }); + } + } + } + } catch { /* table extraction failed */ } + + return rows; + } + + /** + * Extract a single object from page content. + */ + private async extractObject( + page: PlaywrightPage, + root: ReturnType, + schema: { properties?: Record }, + includeSourceRefs: boolean, + sources: { ref: string; selector: BAPSelector; text?: string }[] + ): Promise<{ data: Record; confidence: number }> { + const result: Record = {}; + const properties = schema.properties as Record; + + for (const [key, propSchema] of Object.entries(properties)) { + const searchTerms = [key, propSchema.description].filter(Boolean); + + for (const term of searchTerms) { + if (!term) continue; + + const labelSelectors = [ + `label:has-text("${term}")`, + `th:has-text("${term}")`, + `dt:has-text("${term}")`, + `[class*="${term.toLowerCase()}"]`, + ]; + + for (const selector of labelSelectors) { + try { + const label = root.locator(selector).first(); + if (await label.count() > 0) { + const parent = label.locator('..'); + const siblingText = await parent.textContent(); + if (siblingText) { + const value = siblingText.replace(new RegExp(term, 'gi'), '').trim(); + if (value) { + result[key] = propSchema.type === 'number' ? parseFloat(value) || value : value; + if (includeSourceRefs) { + sources.push({ + ref: `@s${Object.keys(result).length}`, + selector: { type: 'css', value: selector }, + text: value.slice(0, 100), + }); } + break; } } - } catch { - // Continue } - } + } catch { /* continue */ } } } + } - return { - data: Object.keys(result).length > 0 ? result : { raw: content.slice(0, 1000) }, - sources: includeSourceRefs ? sources : undefined, - confidence: Object.keys(result).length > 0 ? 0.6 : 0.2 - }; + // Fallback for meta-based extraction (og:title, meta description, etc.) + if (Object.keys(result).length === 0) { + try { + for (const key of Object.keys(properties)) { + if (key === 'title' || key === 'name') { + result[key] = await page.title(); + } else if (key === 'description') { + const desc = await page.locator('meta[name="description"]').getAttribute('content'); + if (desc) result[key] = desc; + } else if (key === 'url') { + result[key] = page.url(); + } + } + } catch { /* continue */ } } - // Default: return text content return { - data: content.trim().slice(0, 5000), - sources: includeSourceRefs ? sources : undefined, - confidence: 0.5 + data: Object.keys(result).length > 0 ? result : { raw: (await root.textContent() ?? "").slice(0, 1000) }, + confidence: Object.keys(result).length > 0 ? 0.7 : 0.2, }; } diff --git a/skills/bap-browser/LICENSE.txt b/skills/bap-browser/LICENSE.txt new file mode 100644 index 0000000..5fe2868 --- /dev/null +++ b/skills/bap-browser/LICENSE.txt @@ -0,0 +1,208 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to the Licensor for inclusion in the Work by the copyright + owner or by an individual or Legal Entity authorized to submit on behalf + of the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + +--- + +MIT License + +Copyright (c) 2024-2025 Model Context Protocol a Series of LF Projects, LLC. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +--- + +Creative Commons Attribution 4.0 International (CC-BY-4.0) + +Documentation in this project (excluding specifications) is licensed under +CC-BY-4.0. See https://creativecommons.org/licenses/by/4.0/legalcode for +the full license text. \ No newline at end of file diff --git a/skills/bap-browser/SKILL.md b/skills/bap-browser/SKILL.md new file mode 100644 index 0000000..736ab42 --- /dev/null +++ b/skills/bap-browser/SKILL.md @@ -0,0 +1,241 @@ +--- +name: bap-browser +description: "AI-powered browser automation via Browser Agent Protocol. Use when the user wants to visit a website, open a webpage, go to a URL, search on Google, look something up online, check a website, read a webpage, book a flight, order food, buy something online, check email or weather, download a file, compare prices, find product reviews, take a screenshot of a page, scrape or extract data from a site, monitor a webpage for changes, test a web application, automate web tasks, interact with a web page, log in to a site, submit or fill out a form, shop online, sign up for a service, browse the web, research a topic online, check stock prices, track a package, read the news, post on social media, or any task that requires controlling a web browser. Provides semantic selectors, batched multi-step actions, and structured data extraction for fast, token-efficient browser automation." +license: See LICENSE.txt (Apache-2.0) +--- + +# BAP Browser Automation + +You have BAP (Browser Agent Protocol) tools available. BAP wraps a real browser and exposes it through semantic, AI-native APIs. This document defines how to use them well. + +## Quick Start + +For most browser tasks, you only need three tools: + +1. **`navigate`** — open a URL +2. **`observe`** — see what's on the page (returns interactive elements with stable refs) +3. **`act`** — batch multiple interactions into a single call + +``` +navigate({ url: "https://example.com/login" }) +observe({ includeScreenshot: true }) +act({ + steps: [ + { action: "action/fill", selector: "@e1", value: "user@example.com" }, + { action: "action/fill", selector: "@e2", value: "password123" }, + { action: "action/click", selector: "role:button:Sign in" } + ] +}) +``` + +Read on for the full tool reference, selector guide, and advanced patterns. + +## Decision: Which Tool? + +**I need to open a page** → `navigate` + +**I need to understand what's on the page:** +- I want interactive elements with stable refs → `observe` (set `includeScreenshot: true` for visual context) +- I want the page structure cheaply → `aria_snapshot` (preferred — ~80% fewer tokens than `accessibility`) +- I want to read article/body text → `content` with `format: "markdown"` +- I want a visual capture → `screenshot` + +**I need to interact with something:** +- Single click → `click` +- Fill a form field (replaces content) → `fill` +- Type character-by-character (autocomplete, search-as-you-type) → `type` +- Press Enter/Tab/Escape/keyboard shortcut → `press` +- Select from dropdown → `select` +- Scroll to reveal more → `scroll` +- Trigger hover menu → `hover` + +**I need to do multiple things at once** → `act` (batch 2–50 steps, single round-trip) + +**I need structured data from the page** → `extract` (give it a JSON schema) + +**I need to check an element's state** → `element` (visible? enabled? checked? value?) + +**I need to manage tabs** → `pages` / `activate_page` / `close_page` + +**I need to go back/forward/reload** → `go_back` / `go_forward` / `reload` + +## Selectors + +Every interaction tool takes a `selector` parameter. Use this priority: + +``` +role:button:Submit ← Best. ARIA role + accessible name. Survives redesigns. +text:Sign in ← Visible text content. +label:Email address ← Form label association. +placeholder:Search... ← Input placeholder text. +testId:submit-btn ← data-testid attribute. +ref:@e3 (or just @e3) ← Stable ref from a prior observe call. +css:.btn-primary ← Last resort. Fragile. +#element-id ← Shorthand for CSS ID selector. +``` + +**Rules:** +- Always prefer `role:` for buttons, links, inputs, checkboxes. They survive DOM changes. +- Use `text:` when there's no clear ARIA role. +- Never copy CSS selectors from page source. They break across deployments. +- If you don't know what selectors are available, call `observe` first and use the returned refs. + +## The Observe → Act Pattern + +For any multi-step interaction on a page you haven't seen yet: + +**Step 1: Observe.** +``` +observe({ includeScreenshot: true, maxElements: 30 }) +``` +Returns interactive elements with stable refs (`@e1`, `@e2`, ...) and optional annotated screenshot. Now you know exactly what's on the page. + +**Step 2: Act.** +Batch all your actions into one call: +``` +act({ + steps: [ + { action: "action/fill", selector: "@e1", value: "user@example.com" }, + { action: "action/fill", selector: "@e2", value: "hunter2" }, + { action: "action/click", selector: "role:button:Sign in" } + ] +}) +``` + +This pattern turns 4+ round-trips into 2. Use it. + +## Efficiency Rules + +1. **`aria_snapshot` over `accessibility`.** Same structure, ~80% fewer tokens. +2. **`observe` with `maxElements`.** Default is 50. Set it lower when you can: `maxElements: 20`. +3. **`observe` with `filterRoles`.** Focus: `filterRoles: ["button", "link", "textbox"]`. +4. **`act` over individual calls.** A login flow is 1 `act`, not 3 separate fill/click calls. +5. **`extract` over manual parsing.** Define a JSON schema. Let BAP extract. Don't scrape HTML. +6. **`content({ format: "markdown" })` over screenshots for text.** Markdown is compact and parseable. +7. **`fill` over `type` for form fields.** `fill` clears and sets; `type` sends keystrokes one at a time. + +## Recipes + +### Login +``` +act({ + steps: [ + { action: "page/navigate", url: "https://app.example.com/login" }, + { action: "action/fill", selector: "label:Email", value: "user@example.com" }, + { action: "action/fill", selector: "label:Password", value: "password123" }, + { action: "action/click", selector: "role:button:Sign in" } + ] +}) +``` + +### Extract a table of data +``` +navigate({ url: "https://store.example.com/products" }) +extract({ + instruction: "Extract product listings", + mode: "list", + schema: { + type: "array", + items: { + type: "object", + properties: { + name: { type: "string" }, + price: { type: "number" }, + inStock: { type: "boolean" } + } + } + } +}) +``` + +### Read an article +``` +navigate({ url: "https://blog.example.com/post", waitUntil: "networkidle" }) +content({ format: "markdown" }) +``` + +### Complex form with observe +``` +observe({ filterRoles: ["textbox", "combobox", "checkbox"] }) +act({ + steps: [ + { action: "action/fill", selector: "@e1", value: "Jane Doe" }, + { action: "action/fill", selector: "@e2", value: "jane@example.com" }, + { action: "action/select", selector: "@e3", value: "Canada" }, + { action: "action/check", selector: "@e4" }, + { action: "action/click", selector: "role:button:Submit" } + ] +}) +``` + +### Search with autocomplete +``` +type({ selector: "role:combobox:Search", text: "browser agent", delay: 100 }) +press({ key: "ArrowDown" }) +press({ key: "Enter" }) +``` + +### Google search +``` +navigate({ url: "https://www.google.com" }) +act({ + steps: [ + { action: "action/fill", selector: "role:combobox:Search", value: "best noise cancelling headphones 2025" }, + { action: "action/click", selector: "role:button:Google Search" } + ] +}) +content({ format: "markdown" }) +``` + +### Compare prices across sites +``` +navigate({ url: "https://store-a.example.com/product" }) +extract({ + instruction: "Extract the product name and price", + mode: "single", + schema: { + type: "object", + properties: { + name: { type: "string" }, + price: { type: "number" }, + currency: { type: "string" } + } + } +}) +navigate({ url: "https://store-b.example.com/product" }) +extract({ + instruction: "Extract the product name and price", + mode: "single", + schema: { + type: "object", + properties: { + name: { type: "string" }, + price: { type: "number" }, + currency: { type: "string" } + } + } +}) +``` + +## Error Recovery + +| Problem | Fix | +|---------|-----| +| Element not found | `observe` the page again — the DOM changed. Use fresh refs. | +| Navigation timeout | Use `waitUntil: "domcontentloaded"` instead of `"networkidle"`. | +| Stale ref | Refs persist within a page but invalidate after navigation. Re-observe. | +| Click intercepted | `scroll` to the element first, or use `press({ key: "Enter", selector: "..." })`. | +| Page loaded but blank | Wait, then `reload`. Some SPAs hydrate slowly. | + +## Do Not + +- Use CSS selectors copied from browser DevTools. They break. +- Call `accessibility` when `aria_snapshot` works. Wastes tokens. +- Make individual click/fill calls when `act` can batch them. +- Take a screenshot to read text. Use `content({ format: "markdown" })`. +- Skip `observe` on pages you haven't seen. You'll guess wrong. +- Parse raw HTML. Use `extract` with a schema. + +--- + +For advanced patterns (multi-tab workflows, waiting strategies, annotated screenshots, nested extraction, batched error handling), see [references/REFERENCE.md](references/REFERENCE.md). diff --git a/skills/bap-browser/references/REFERENCE.md b/skills/bap-browser/references/REFERENCE.md new file mode 100644 index 0000000..d86f4ea --- /dev/null +++ b/skills/bap-browser/references/REFERENCE.md @@ -0,0 +1,78 @@ +# Advanced BAP Patterns + +## Multi-tab workflows + +Use `pages` to list all open tabs, `activate_page` to switch between them, and `close_page` to clean up. Useful for comparing content across tabs or handling pop-ups. + +``` +navigate({ url: "https://a.example.com" }) +navigate({ url: "https://b.example.com" }) // opens in new tab +pages() // returns [{id, url}, ...] +activate_page({ pageId: "page-1" }) // switch back to first tab +``` + +## Waiting strategies + +The `waitUntil` parameter on `navigate` controls when the page is considered loaded: + +| Value | When to use | +|-------|-------------| +| `"load"` | Default. Fine for most pages. | +| `"domcontentloaded"` | Faster. Use when you don't need images/fonts. | +| `"networkidle"` | Slowest but most complete. Use for SPAs that fetch data after load. | + +If a page renders dynamically after navigation, use `observe` or `aria_snapshot` with a short delay rather than relying on `networkidle`. + +## Annotated screenshots (Set-of-Marks) + +`observe` supports `annotateScreenshot: true` which overlays numbered markers on each interactive element. Useful for visual debugging or confirming which element a ref points to. + +``` +observe({ includeScreenshot: true, annotateScreenshot: true, maxElements: 20 }) +``` + +The returned screenshot will have numbered badges corresponding to element refs. + +## Nested extraction with complex schemas + +`extract` supports deeply nested JSON schemas. Use `mode: "single"` for a single object, `mode: "list"` for arrays, or `mode: "table"` for tabular data. + +``` +extract({ + instruction: "Extract job listings with company details", + mode: "list", + schema: { + type: "array", + items: { + type: "object", + properties: { + title: { type: "string" }, + company: { + type: "object", + properties: { + name: { type: "string" }, + location: { type: "string" } + } + }, + salary: { type: "number" }, + remote: { type: "boolean" } + } + } + } +}) +``` + +## Error handling in batched actions + +`act` accepts `stopOnFirstError` (default: `true`). Set to `false` if you want to continue executing steps even when one fails — useful for best-effort form fills where some fields may not exist. + +``` +act({ + stopOnFirstError: false, + steps: [ + { action: "action/fill", selector: "label:First name", value: "Jane" }, + { action: "action/fill", selector: "label:Middle name", value: "A." }, // may not exist + { action: "action/fill", selector: "label:Last name", value: "Doe" } + ] +}) +```