From 9b398fe4624afedf56f9588c4b68a32771c7f508 Mon Sep 17 00:00:00 2001 From: Piyush Date: Thu, 12 Feb 2026 06:26:40 +0000 Subject: [PATCH 01/10] feat: add plugin, skill, and standalone auto-start mode Plugin (.claude-plugin + .mcp.json): - Plugin metadata and MCP server config for plugin directory submission - Zero-config: npx @browseragentprotocol/mcp auto-starts everything Skill (skills/bap-browser/): - Agent-directive SKILL.md: tool decision tree, selector priority, observe-act pattern, efficiency rules, recipes, error recovery - Apache-2.0 license MCP CLI auto-start (packages/mcp/src/cli.ts): - Standalone mode (default): spawns BAP Playwright server as child process - Port detection: reuses existing server if one is already running - Graceful shutdown: SIGTERM with SIGKILL escalation - resolveServerCommand checks fs.existsSync before falling back to npx - New flags: --port, --headless, --no-headless - Vendor-neutral help text with generic MCP client setup examples MCP package README: - Vendor-neutral rewrite: generic MCP client config examples - Documents standalone mode, new CLI options, Windows troubleshooting Root README: - Quick start restructured: leads with universal MCP setup - Plugin install as separate section - Vendor-neutral screenshot captions and alt text Tests: - standalone.test.ts: port detection, server wait polling, timeout, CLI args --- .claude-plugin/plugin.json | 7 + .mcp.json | 8 + README.md | 79 +++-- packages/mcp/README.md | 235 ++++--------- packages/mcp/package.json | 4 +- packages/mcp/src/__tests__/standalone.test.ts | 183 ++++++++++ packages/mcp/src/cli.ts | 329 +++++++++++++++--- skills/bap-browser/LICENSE.txt | 208 +++++++++++ skills/bap-browser/SKILL.md | 173 +++++++++ 9 files changed, 984 insertions(+), 242 deletions(-) create mode 100644 .claude-plugin/plugin.json create mode 100644 .mcp.json create mode 100644 packages/mcp/src/__tests__/standalone.test.ts create mode 100644 skills/bap-browser/LICENSE.txt create mode 100644 skills/bap-browser/SKILL.md diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json new file mode 100644 index 0000000..f04af10 --- /dev/null +++ b/.claude-plugin/plugin.json @@ -0,0 +1,7 @@ +{ + "name": "bap-browser", + "description": "AI-optimized browser automation with semantic selectors, smart observations, and multi-step actions. Navigate, click, fill forms, extract data, and take screenshots across Chrome, Firefox, and WebKit.", + "author": { + "name": "Browser Agent Protocol" + } +} diff --git a/.mcp.json b/.mcp.json new file mode 100644 index 0000000..8b192cc --- /dev/null +++ b/.mcp.json @@ -0,0 +1,8 @@ +{ + "mcpServers": { + "bap-browser": { + "command": "npx", + "args": ["-y", "@browseragentprotocol/mcp@latest"] + } + } +} diff --git a/README.md b/README.md index 10bdac1..5a878b8 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ An open standard for AI agents to interact with web browsers. -> **v0.1.0:** First public release. APIs may evolve based on feedback. +> **v0.2.0:** Renamed MCP tools, auto-reconnect, multi-context support, streaming, and more. APIs may evolve based on feedback. ## Overview @@ -49,69 +49,86 @@ BAP (Browser Agent Protocol) provides a standardized way for AI agents to contro ## Quick Start -### Using with MCP (Recommended for AI Agents) +### Using with MCP (Recommended) -BAP works with any MCP-compatible client including Claude Code, Claude Desktop, OpenAI Codex, and Google Antigravity. +BAP works with any MCP-compatible client. The server auto-starts — no separate setup needed. -**Claude Code:** +**Add via CLI** (works with most MCP clients): ```bash -claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp + mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp ``` -

- BAP with Claude Code
- Claude Code browsing Hacker News with BAP -

- -**Claude Desktop** (`claude_desktop_config.json`): +**Add via JSON config** (most MCP desktop clients): ```json { "mcpServers": { "bap-browser": { "command": "npx", - "args": ["@browseragentprotocol/mcp"] + "args": ["-y", "@browseragentprotocol/mcp"] } } } ``` +**Add via TOML config** (Codex Desktop): +```toml +[mcp_servers.bap-browser] +command = "npx" +args = ["-y", "@browseragentprotocol/mcp"] +``` + +### Plugin Install + +BAP is also available as a plugin for MCP clients with plugin directories. + +
+Screenshots +

- BAP with Claude Desktop
- Claude Desktop browsing Hacker News with BAP + BAP in a terminal MCP client
+ Browsing Hacker News with BAP

-**Codex CLI:** -```bash -codex mcp add bap-browser -- npx @browseragentprotocol/mcp -``` +

+ BAP in a desktop MCP client
+ Desktop MCP client browsing Hacker News with BAP +

- BAP with OpenAI Codex CLI
+ BAP in Codex CLI
Codex CLI browsing Hacker News with BAP

-**Codex Desktop** (`~/.codex/config.toml`): -```toml -[mcp_servers.bap-browser] -command = "npx" -args = ["@browseragentprotocol/mcp"] -``` -

- BAP with OpenAI Codex Desktop
+ BAP in Codex Desktop
Codex Desktop browsing Hacker News with BAP

-> **💡 Tip:** Codex may default to web search. Be explicit: *"Using the bap-browser MCP tools..."* +
+ +### Browser Selection + +By default, BAP uses your locally installed Chrome. You can choose a different browser with the `--browser` flag: + +```bash +npx @browseragentprotocol/mcp --browser firefox +``` +| Value | Browser | Notes | +|---|---|---| +| `chrome` (default) | Local Chrome | Falls back to bundled Chromium if not installed | +| `chromium` | Bundled Chromium | Playwright's built-in Chromium | +| `firefox` | Firefox | Requires local Firefox | +| `webkit` | WebKit | Playwright's WebKit engine | +| `edge` | Microsoft Edge | Requires local Edge | -**Antigravity** (`mcp_config.json` via "..." → MCP Store → Manage): +In a JSON MCP config, pass the flag via args: ```json { "mcpServers": { "bap-browser": { "command": "npx", - "args": ["@browseragentprotocol/mcp"] + "args": ["-y", "@browseragentprotocol/mcp", "--browser", "firefox"] } } } @@ -255,7 +272,7 @@ const data = await client.extract({ }); ``` -> **Note:** `agent/extract` (and `bap_extract` in MCP) uses heuristic-based extraction (CSS patterns). For complex pages, consider using `bap_content` to get page content as markdown and extract data yourself. +> **Note:** `agent/extract` (and `extract` in MCP) uses heuristic-based extraction (CSS patterns). For complex pages, consider using `content` to get page content as markdown and extract data yourself. ## Server Options diff --git a/packages/mcp/README.md b/packages/mcp/README.md index c3bf6f1..4f54770 100644 --- a/packages/mcp/README.md +++ b/packages/mcp/README.md @@ -1,100 +1,58 @@ # @browseragentprotocol/mcp -MCP (Model Context Protocol) server for Browser Agent Protocol. Enables AI assistants to control web browsers. - -## Supported Clients - -| Client | Status | -|--------|--------| -| Claude Code | Supported | -| Claude Desktop | Supported | -| OpenAI Codex | Supported | -| Google Antigravity | Supported | -| Any MCP-compatible client | Supported | +MCP (Model Context Protocol) server for Browser Agent Protocol. Gives any MCP-compatible AI agent full browser control. ## Installation -### With Claude Code +### One command — standalone mode ```bash -# Add the BAP browser server -claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp -``` - -That's it! Claude Code can now control browsers. Try asking: *"Go to example.com and take a screenshot"* - -### With OpenAI Codex - -```bash -# Add the BAP browser server -codex mcp add bap-browser -- npx @browseragentprotocol/mcp -``` - -Or add to your `~/.codex/config.toml`: - -```toml -[mcp_servers.bap-browser] -command = "npx" -args = ["@browseragentprotocol/mcp"] +npx @browseragentprotocol/mcp ``` -### With Claude Desktop +This auto-starts a BAP Playwright server and exposes browser tools over MCP stdio. No separate server process needed. -Add to your config file: - -**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json` -**Windows**: `%APPDATA%\Claude\claude_desktop_config.json` +### Add to an MCP client +**JSON config** (most MCP clients): ```json { "mcpServers": { "bap-browser": { "command": "npx", - "args": ["@browseragentprotocol/mcp"] + "args": ["-y", "@browseragentprotocol/mcp"] } } } ``` -Restart Claude Desktop after saving. - -### With Google Antigravity - -1. Open the MCP Store via the **"..."** dropdown at the top of the editor's agent panel -2. Click **Manage MCP Servers** -3. Click **View raw config** -4. Add the BAP browser server to your `mcp_config.json`: +**CLI** (most MCP clients): +```bash + mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp +``` -```json -{ - "mcpServers": { - "bap-browser": { - "command": "npx", - "args": ["@browseragentprotocol/mcp"] - } - } -} +**TOML config** (Codex Desktop): +```toml +[mcp_servers.bap-browser] +command = "npx" +args = ["-y", "@browseragentprotocol/mcp"] ``` -5. Save and refresh to load the new configuration +### Connect to an existing BAP server -### Standalone +If you already have a BAP Playwright server running, pass `--url` to skip auto-start: ```bash -# Start the MCP server (connects to BAP server on localhost:9222) -npx @browseragentprotocol/mcp - -# With custom BAP server URL -npx @browseragentprotocol/mcp --bap-url ws://localhost:9333 +npx @browseragentprotocol/mcp --url ws://localhost:9222 ``` ## How It Works ``` ┌─────────────┐ MCP ┌─────────────┐ BAP ┌─────────────┐ -│ Claude │ ──────────── │ BAP MCP │ ────────── │ BAP Server │ -│ (or other │ (stdio) │ Server │ (WebSocket)│ (Playwright)│ -│ MCP host) │ │ │ │ │ +│ AI Agent │ ──────────── │ BAP MCP │ ────────── │ BAP Server │ +│ (any MCP │ (stdio) │ Server │ (WebSocket)│ (Playwright)│ +│ client) │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ ▼ @@ -103,61 +61,59 @@ npx @browseragentprotocol/mcp --bap-url ws://localhost:9333 └─────────────┘ ``` -1. Claude sends tool calls via MCP (stdio transport) +1. AI agent sends tool calls via MCP (stdio transport) 2. This package translates them to BAP protocol 3. BAP server controls the browser via Playwright -4. Results flow back to Claude +4. Results flow back to the agent ## Available Tools -When connected, Claude has access to these browser automation tools: - ### Navigation | Tool | Description | |------|-------------| -| `bap_navigate` | Navigate to a URL | -| `bap_go_back` | Navigate back in browser history | -| `bap_go_forward` | Navigate forward in browser history | -| `bap_reload` | Reload the current page | +| `navigate` | Navigate to a URL | +| `go_back` | Navigate back in browser history | +| `go_forward` | Navigate forward in browser history | +| `reload` | Reload the current page | ### Element Interaction | Tool | Description | |------|-------------| -| `bap_click` | Click an element using semantic selectors | -| `bap_type` | Type text character by character (first clicks element) | -| `bap_fill` | Fill a form field (clears existing content first) | -| `bap_press` | Press keyboard keys (Enter, Tab, shortcuts) | -| `bap_select` | Select an option from a dropdown | -| `bap_scroll` | Scroll the page or a specific element | -| `bap_hover` | Hover over an element | +| `click` | Click an element using semantic selectors | +| `type` | Type text character by character (first clicks element) | +| `fill` | Fill a form field (clears existing content first) | +| `press` | Press keyboard keys (Enter, Tab, shortcuts) | +| `select` | Select an option from a dropdown | +| `scroll` | Scroll the page or a specific element | +| `hover` | Hover over an element | ### Observation | Tool | Description | |------|-------------| -| `bap_screenshot` | Take a screenshot of the page | -| `bap_accessibility` | Get the full accessibility tree | -| `bap_aria_snapshot` | Get a token-efficient YAML accessibility snapshot (~80% fewer tokens) | -| `bap_content` | Get page text content as text or markdown | -| `bap_element` | Query element properties (exists, visible, enabled) | +| `screenshot` | Take a screenshot of the page | +| `accessibility` | Get the full accessibility tree | +| `aria_snapshot` | Token-efficient YAML accessibility snapshot (~80% fewer tokens) | +| `content` | Get page text content as text or markdown | +| `element` | Query element properties (exists, visible, enabled) | ### Page Management | Tool | Description | |------|-------------| -| `bap_pages` | List all open pages/tabs | -| `bap_activate_page` | Switch to a different page/tab | -| `bap_close_page` | Close the current page/tab | +| `pages` | List all open pages/tabs | +| `activate_page` | Switch to a different page/tab | +| `close_page` | Close the current page/tab | ### AI Agent Methods | Tool | Description | |------|-------------| -| `bap_observe` | Get AI-optimized page observation with interactive elements and stable refs | -| `bap_act` | Execute a sequence of browser actions in a single call | -| `bap_extract` | Extract structured data from the page using schema and CSS heuristics | +| `observe` | AI-optimized page observation with interactive elements and stable refs | +| `act` | Execute a sequence of browser actions in a single call | +| `extract` | Extract structured data from the page using schema and CSS heuristics | ### Selector Formats @@ -168,73 +124,24 @@ role:button:Submit # ARIA role + accessible name (recommended) text:Sign in # Visible text content label:Email address # Associated label testid:submit-button # data-testid attribute -ref:@submitBtn # Stable element reference from bap_observe +ref:@submitBtn # Stable element reference from observe css:.btn-primary # CSS selector (fallback) xpath://button[@type] # XPath selector (fallback) ``` -## Example Conversations - -**You:** Go to Hacker News and tell me the top 3 stories - -**Claude:** I'll browse to Hacker News and get the top stories for you. - -*[Uses bap_navigate, bap_accessibility]* - -Here are the top 3 stories on Hacker News right now: -1. "Show HN: I built a tool for..." -2. "Why we switched from..." -3. "The future of..." - ---- - -**You:** Fill out the contact form on example.com with my details - -**Claude:** I'll navigate to the contact form and fill it out. - -*[Uses bap_navigate, bap_fill, bap_click]* - -Done! I've filled in the form with your details and submitted it. - ## CLI Options ``` Options: - --bap-url, -u BAP server URL (default: ws://localhost:9222) - --allowed-domains Comma-separated list of allowed domains (e.g., "*.example.com,trusted.org") - --verbose Enable verbose logging - --help Show help -``` - -## Managing the Server - -```bash -# List configured MCP servers -claude mcp list - -# Get details for the BAP browser server -claude mcp get bap-browser - -# Remove the server -claude mcp remove bap-browser - -# Check server status (within Claude Code) -/mcp -``` - -## Configuration Scopes - -When adding the server with Claude Code, you can specify where to store the configuration: - -```bash -# Local scope (default) - only you, only this project -claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp - -# User scope - available to you across all projects -claude mcp add --transport stdio --scope user bap-browser -- npx @browseragentprotocol/mcp - -# Project scope - shared with team via .mcp.json -claude mcp add --transport stdio --scope project bap-browser -- npx @browseragentprotocol/mcp + -b, --browser Browser: chrome (default), chromium, firefox, webkit, edge + -u, --url Connect to existing BAP server (skips auto-start) + -p, --port Port for auto-started server (default: 9222) + --headless Run browser headless (default: true) + --no-headless Visible browser window + --allowed-domains Comma-separated list of allowed domains + -v, --verbose Enable verbose logging to stderr + -h, --help Show help + --version Show version ``` ## Programmatic Usage @@ -248,14 +155,15 @@ const server = new BAPMCPServer({ verbose: true, }); -await server.start(); +await server.run(); ``` ## Requirements - Node.js >= 20.0.0 -- A running BAP server (`npx @browseragentprotocol/server-playwright`) -- An MCP-compatible client (Claude Code, Claude Desktop, OpenAI Codex, etc.) +- An MCP-compatible client + +The BAP Playwright server is auto-started by default. To install Playwright browsers manually: `npx playwright install chromium`. ## Troubleshooting @@ -263,22 +171,23 @@ await server.start(); On native Windows (not WSL), use the `cmd /c` wrapper: -```bash -claude mcp add --transport stdio bap-browser -- cmd /c npx @browseragentprotocol/mcp +```json +{ + "mcpServers": { + "bap-browser": { + "command": "cmd", + "args": ["/c", "npx", "-y", "@browseragentprotocol/mcp"] + } + } +} ``` -**Server not connecting?** - -Make sure the BAP server is running: - -```bash -npx @browseragentprotocol/server-playwright -``` +**Server not starting?** -Then check the MCP server status: +Ensure Playwright browsers are installed: ```bash -/mcp # Within Claude Code +npx playwright install chromium ``` ## License diff --git a/packages/mcp/package.json b/packages/mcp/package.json index 3eeb663..9d2347e 100644 --- a/packages/mcp/package.json +++ b/packages/mcp/package.json @@ -41,8 +41,8 @@ "automation", "ai", "agent", - "claude", - "anthropic" + "playwright", + "web-scraping" ], "scripts": { "build": "tsup src/index.ts src/cli.ts --format cjs,esm --dts --clean", diff --git a/packages/mcp/src/__tests__/standalone.test.ts b/packages/mcp/src/__tests__/standalone.test.ts new file mode 100644 index 0000000..74d4137 --- /dev/null +++ b/packages/mcp/src/__tests__/standalone.test.ts @@ -0,0 +1,183 @@ +import { describe, it, expect } from "vitest"; +import net from "node:net"; + +/** + * Tests for the standalone server management utilities in cli.ts. + * + * Since cli.ts runs as a script (calls main() at module level), we test + * the port-checking logic by reimplementing the core utility functions + * that are used by the standalone server management. + */ + +// --------------------------------------------------------------------------- +// Port detection tests (mirrors isPortInUse from cli.ts) +// --------------------------------------------------------------------------- + +function isPortInUse(port: number, host: string = "localhost"): Promise { + return new Promise((resolve) => { + const socket = net.createConnection({ port, host }); + socket.setTimeout(500); + socket.on("connect", () => { + socket.destroy(); + resolve(true); + }); + socket.on("timeout", () => { + socket.destroy(); + resolve(false); + }); + socket.on("error", () => { + socket.destroy(); + resolve(false); + }); + }); +} + +async function waitForServer( + port: number, + host: string = "localhost", + timeoutMs: number = 2000, + intervalMs: number = 50, +): Promise { + const start = Date.now(); + while (Date.now() - start < timeoutMs) { + if (await isPortInUse(port, host)) { + return; + } + await new Promise((resolve) => setTimeout(resolve, intervalMs)); + } + throw new Error(`Server did not start within ${timeoutMs}ms on port ${port}`); +} + +describe("standalone server utilities", () => { + describe("isPortInUse()", () => { + it("returns false for a port with nothing listening", async () => { + // Use a random high port that's unlikely to be in use + const result = await isPortInUse(59999); + expect(result).toBe(false); + }); + + it("returns true when a server is listening on the port", async () => { + // Start a temporary TCP server + const server = net.createServer(); + const port = await new Promise((resolve) => { + server.listen(0, "localhost", () => { + const addr = server.address(); + if (addr && typeof addr === "object") { + resolve(addr.port); + } + }); + }); + + try { + const result = await isPortInUse(port, "localhost"); + expect(result).toBe(true); + } finally { + server.close(); + } + }); + }); + + describe("waitForServer()", () => { + it("resolves immediately if server is already running", async () => { + const server = net.createServer(); + const port = await new Promise((resolve) => { + server.listen(0, "localhost", () => { + const addr = server.address(); + if (addr && typeof addr === "object") { + resolve(addr.port); + } + }); + }); + + try { + // Should resolve almost instantly + const start = Date.now(); + await waitForServer(port, "localhost", 2000, 50); + const elapsed = Date.now() - start; + expect(elapsed).toBeLessThan(500); + } finally { + server.close(); + } + }); + + it("waits for a server that starts after a delay", async () => { + const server = net.createServer(); + let port: number; + + // Find a free port + port = await new Promise((resolve) => { + const tmp = net.createServer(); + tmp.listen(0, "localhost", () => { + const addr = tmp.address(); + if (addr && typeof addr === "object") { + const p = addr.port; + tmp.close(() => resolve(p)); + } + }); + }); + + // Start the server after 200ms + const startTimer = setTimeout(() => { + server.listen(port, "localhost"); + }, 200); + + try { + await waitForServer(port, "localhost", 3000, 50); + // If we get here, the server was detected + expect(true).toBe(true); + } finally { + clearTimeout(startTimer); + server.close(); + } + }); + + it("throws if server does not start within timeout", async () => { + await expect( + waitForServer(59998, "localhost", 300, 50) + ).rejects.toThrow("Server did not start within 300ms"); + }); + }); + + describe("CLI argument parsing", () => { + it("standalone mode is the default (no --url)", () => { + // When no --url is provided, isStandalone should be true + const url = undefined; + const isStandalone = !url; + expect(isStandalone).toBe(true); + }); + + it("providing --url disables standalone mode", () => { + const url = "ws://remote:9222"; + const isStandalone = !url; + expect(isStandalone).toBe(false); + }); + + it("default port is 9222", () => { + const configPort: number | undefined = undefined; + const port = configPort ?? 9222; + expect(port).toBe(9222); + }); + + it("custom port overrides default", () => { + const configPort: number | undefined = 9333; + const port = configPort ?? 9222; + expect(port).toBe(9333); + }); + + it("browser mapping to server-playwright names", () => { + const browserMap: Record = { + chrome: "chromium", + chromium: "chromium", + firefox: "firefox", + webkit: "webkit", + edge: "chromium", + }; + + expect(browserMap["chrome"]).toBe("chromium"); + expect(browserMap["chromium"]).toBe("chromium"); + expect(browserMap["firefox"]).toBe("firefox"); + expect(browserMap["webkit"]).toBe("webkit"); + expect(browserMap["edge"]).toBe("chromium"); + }); + }); +}); diff --git a/packages/mcp/src/cli.ts b/packages/mcp/src/cli.ts index e9da4a8..eee45a5 100644 --- a/packages/mcp/src/cli.ts +++ b/packages/mcp/src/cli.ts @@ -3,16 +3,24 @@ * @fileoverview BAP MCP Server CLI * * Run the BAP MCP server from the command line. + * By default, auto-starts a BAP Playwright server (standalone mode). + * Use --url to connect to an existing BAP server instead. * * Usage: - * bap-mcp # Use defaults (ws://localhost:9222) - * bap-mcp --url ws://host:port # Custom BAP server URL + * bap-mcp # Standalone: auto-starts BAP server + * bap-mcp --url ws://host:port # Connect to existing BAP server + * bap-mcp --browser firefox # Use Firefox * bap-mcp --verbose # Enable verbose logging * bap-mcp --allowed-domains example.com,api.example.com */ +import { spawn, type ChildProcess } from "node:child_process"; +import fs from "node:fs"; +import net from "node:net"; +import path from "node:path"; +import { fileURLToPath } from "node:url"; import { Logger, icons, pc } from "@browseragentprotocol/logger"; -import { BAPMCPServer } from "./index.js"; +import { BAPMCPServer, type BrowserChoice } from "./index.js"; // MCP servers should log to stderr to avoid interfering with stdio transport const log = new Logger({ prefix: "BAP MCP", stderr: true }); @@ -23,7 +31,10 @@ const log = new Logger({ prefix: "BAP MCP", stderr: true }); interface CLIArgs { url?: string; + port?: number; + browser?: string; verbose?: boolean; + headless?: boolean; allowedDomains?: string[]; help?: boolean; version?: boolean; @@ -42,8 +53,18 @@ function parseArgs(): CLIArgs { args.version = true; } else if (arg === "--url" || arg === "-u" || arg === "--bap-url") { args.url = argv[++i]; + } else if (arg === "--port" || arg === "-p") { + args.port = parseInt(argv[++i] ?? "9222", 10); + } else if (arg === "--browser" || arg === "-b") { + args.browser = argv[++i]; } else if (arg === "--verbose" || arg === "-v") { args.verbose = true; + } else if (arg === "--headless") { + args.headless = true; + } else if (arg === "--headless=true") { + args.headless = true; + } else if (arg === "--headless=false" || arg === "--no-headless") { + args.headless = false; } else if (arg === "--allowed-domains") { args.allowedDomains = argv[++i]?.split(",").map((d) => d.trim()); } @@ -59,45 +80,232 @@ ${pc.bold("BAP MCP Server")} ${pc.dim("- Browser Agent Protocol as MCP")} ${pc.cyan("USAGE")} ${pc.dim("$")} npx @browseragentprotocol/mcp ${pc.dim("[OPTIONS]")} + By default, auto-starts a local BAP Playwright server (standalone mode). + Pass ${pc.yellow("--url")} to connect to an existing BAP server instead. + ${pc.cyan("OPTIONS")} - ${pc.yellow("-u, --url")} ${pc.dim("")} BAP server WebSocket URL ${pc.dim("(default: ws://localhost:9222)")} + ${pc.yellow("-b, --browser")} ${pc.dim("")} Browser: chrome ${pc.dim("(default)")}, chromium, firefox, webkit, edge + ${pc.yellow("-u, --url")} ${pc.dim("")} Connect to existing BAP server ${pc.dim("(skips auto-start)")} + ${pc.yellow("-p, --port")} ${pc.dim("")} Port for auto-started server ${pc.dim("(default: 9222)")} + ${pc.yellow("--headless")} Run browser in headless mode ${pc.dim("(default: true)")} + ${pc.yellow("--no-headless")} Run with visible browser window ${pc.yellow("-v, --verbose")} Enable verbose logging to stderr ${pc.yellow("--allowed-domains")} ${pc.dim("")} Comma-separated list of allowed domains ${pc.yellow("-h, --help")} Show this help message ${pc.yellow("--version")} Show version ${pc.cyan("EXAMPLES")} - ${pc.dim("# Start with defaults (connect to localhost:9222)")} + ${pc.dim("# Standalone mode (auto-starts server, uses local Chrome)")} ${pc.dim("$")} npx @browseragentprotocol/mcp - ${pc.dim("# Connect to a remote BAP server")} - ${pc.dim("$")} npx @browseragentprotocol/mcp --url ws://192.168.1.100:9222 + ${pc.dim("# Use Firefox")} + ${pc.dim("$")} npx @browseragentprotocol/mcp --browser firefox + + ${pc.dim("# Visible browser window")} + ${pc.dim("$")} npx @browseragentprotocol/mcp --no-headless - ${pc.dim("# Enable verbose logging")} - ${pc.dim("$")} npx @browseragentprotocol/mcp --verbose + ${pc.dim("# Connect to a remote BAP server (skips auto-start)")} + ${pc.dim("$")} npx @browseragentprotocol/mcp --url ws://192.168.1.100:9222 - ${pc.dim("# Restrict to specific domains (security)")} + ${pc.dim("# Domain allowlist")} ${pc.dim("$")} npx @browseragentprotocol/mcp --allowed-domains example.com,api.example.com -${pc.cyan("CLAUDE DESKTOP")} - Add to ${pc.dim("claude_desktop_config.json")}: +${pc.cyan("MCP CLIENT SETUP")} + ${pc.dim("Add to any MCP-compatible client (config examples):")} + ${pc.dim("JSON config:")} ${pc.dim("{")} ${pc.dim('"mcpServers"')}: { ${pc.green('"bap-browser"')}: { "command": "npx", - "args": ["@browseragentprotocol/mcp"] + "args": ["-y", "@browseragentprotocol/mcp"] } } ${pc.dim("}")} -${pc.cyan("CLAUDE CODE")} - ${pc.dim("$")} claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp + ${pc.dim("CLI:")} + ${pc.dim("$")} ${pc.dim("")} mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp -${pc.dim("For more information:")} ${pc.cyan("https://github.com/browseragentprotocol/bap")} +${pc.dim("Docs:")} ${pc.cyan("https://github.com/browseragentprotocol/bap")} `); } +// ============================================================================= +// Standalone Server Management +// ============================================================================= + +/** + * Check if a port is available by attempting a TCP connection. + * Returns true if something is already listening on the port. + */ +function isPortInUse(port: number, host: string = "localhost"): Promise { + return new Promise((resolve) => { + const socket = net.createConnection({ port, host }); + socket.setTimeout(500); + socket.on("connect", () => { + socket.destroy(); + resolve(true); + }); + socket.on("timeout", () => { + socket.destroy(); + resolve(false); + }); + socket.on("error", () => { + socket.destroy(); + resolve(false); + }); + }); +} + +/** + * Wait for a server to become available on the given port. + * Polls with the specified interval until timeout. + */ +async function waitForServer( + port: number, + host: string = "localhost", + timeoutMs: number = 15000, + intervalMs: number = 150, +): Promise { + const start = Date.now(); + + while (Date.now() - start < timeoutMs) { + if (await isPortInUse(port, host)) { + return; + } + await new Promise((resolve) => setTimeout(resolve, intervalMs)); + } + + throw new Error( + `BAP server did not start within ${timeoutMs / 1000}s on port ${port}. ` + + `Ensure Playwright browsers are installed: npx playwright install chromium` + ); +} + +/** + * Resolve the command to start the server-playwright CLI. + * + * In monorepo development, the sibling package's built CLI is used directly + * to avoid npx overhead. In published (npm install) usage, falls back to npx. + */ +function resolveServerCommand(): { command: string; args: string[] } { + try { + const __dirname = path.dirname(fileURLToPath(import.meta.url)); + const siblingCli = path.resolve(__dirname, "../../server-playwright/dist/cli.js"); + + if (fs.existsSync(siblingCli)) { + return { command: "node", args: [siblingCli] }; + } + } catch { + // import.meta.url resolution failed — not in ESM context, fall through + } + + return { command: "npx", args: ["-y", "@browseragentprotocol/server-playwright"] }; +} + +interface StandaloneServerOptions { + port: number; + host: string; + browser: string; + headless: boolean; + verbose: boolean; +} + +/** + * Start the BAP Playwright server as a child process. + * + * If a server is already listening on the target port, reuses it and returns + * null (caller should not attempt to kill it on shutdown). + * + * Otherwise, spawns the server-playwright CLI, waits for it to be ready, + * and returns the ChildProcess handle for lifecycle management. + */ +async function startStandaloneServer( + options: StandaloneServerOptions, +): Promise { + const { port, host, browser, headless, verbose } = options; + + // Reuse an existing server if one is already on this port + if (await isPortInUse(port, host)) { + log.info(`BAP server already running on ${host}:${port}, reusing`); + return null; + } + + log.info(`Starting BAP Playwright server on port ${port}...`); + + const { command, args } = resolveServerCommand(); + const serverArgs = [ + ...args, + "--port", port.toString(), + "--host", host, + headless ? "--headless" : "--no-headless", + ]; + + // Map MCP-level browser names to server-playwright's accepted values + const browserMap: Record = { + chrome: "chromium", + chromium: "chromium", + firefox: "firefox", + webkit: "webkit", + edge: "chromium", + }; + serverArgs.push("--browser", browserMap[browser] ?? "chromium"); + + if (verbose) { + serverArgs.push("--debug"); + } + + const child = spawn(command, serverArgs, { + stdio: ["ignore", "pipe", "pipe"], + detached: false, + env: { ...process.env }, + }); + + // Pipe server output to stderr when verbose (MCP uses stdout for stdio transport) + if (verbose) { + child.stdout?.on("data", (data: Buffer) => { + process.stderr.write(`[BAP Server] ${data.toString()}`); + }); + child.stderr?.on("data", (data: Buffer) => { + process.stderr.write(`[BAP Server] ${data.toString()}`); + }); + } + + child.on("error", (err) => { + log.error("Failed to start BAP server", err); + }); + + child.on("exit", (code, signal) => { + if (code !== null && code !== 0) { + log.error(`BAP server exited with code ${code}`); + } else if (signal && verbose) { + log.info(`BAP server stopped (${signal})`); + } + }); + + // Wait for the server to become available + try { + await waitForServer(port, host); + log.info(`BAP server ready on ws://${host}:${port}`); + } catch (err) { + child.kill("SIGTERM"); + throw err; + } + + return child; +} + +/** + * Kill a child process gracefully (SIGTERM), escalating to SIGKILL after 500ms. + */ +async function killServer(child: ChildProcess): Promise { + child.kill("SIGTERM"); + await new Promise((resolve) => setTimeout(resolve, 500)); + if (!child.killed) { + child.kill("SIGKILL"); + } +} + // ============================================================================= // Main // ============================================================================= @@ -112,54 +320,83 @@ async function main(): Promise { if (args.version) { console.error( - `${icons.connection} BAP MCP Server ${pc.dim("v0.1.0-alpha.1")}` + `${icons.connection} BAP MCP Server ${pc.dim("v0.2.0")}` ); process.exit(0); } - // Set log level based on verbose flag if (args.verbose) { log.setLevel("debug"); } - const server = new BAPMCPServer({ - bapServerUrl: args.url, - verbose: args.verbose, - allowedDomains: args.allowedDomains, - }); + // Determine mode: standalone (auto-start server) vs connect to existing + const isStandalone = !args.url; + const port = args.port ?? 9222; + const host = "localhost"; + const bapServerUrl = args.url ?? `ws://${host}:${port}`; + let serverProcess: ChildProcess | null = null; - // Handle graceful shutdown - const shutdown = async (signal: string) => { - if (args.verbose) { - log.info(`${signal} received, shutting down...`); - } - await server.close(); - if (args.verbose) { - log.success("Server stopped"); + try { + if (isStandalone) { + if (args.verbose) { + log.info("Standalone mode: auto-starting BAP Playwright server"); + } + + serverProcess = await startStandaloneServer({ + port, + host, + browser: args.browser ?? "chrome", + headless: args.headless ?? true, + verbose: args.verbose ?? false, + }); } - process.exit(0); - }; - process.on("SIGINT", () => shutdown("SIGINT")); - process.on("SIGTERM", () => shutdown("SIGTERM")); + const server = new BAPMCPServer({ + bapServerUrl, + browser: args.browser as BrowserChoice | undefined, + verbose: args.verbose, + allowedDomains: args.allowedDomains, + }); - // Handle uncaught errors - process.on("uncaughtException", (error) => { - log.error("Uncaught exception", error); - process.exit(1); - }); + // Graceful shutdown — clean up MCP server and child process + const shutdown = async (signal: string) => { + if (args.verbose) { + log.info(`${signal} received, shutting down...`); + } - process.on("unhandledRejection", (reason) => { - log.error("Unhandled rejection", reason); - process.exit(1); - }); + await server.close(); + + if (serverProcess) { + await killServer(serverProcess); + } + + if (args.verbose) { + log.success("Server stopped"); + } + process.exit(0); + }; + + process.on("SIGINT", () => shutdown("SIGINT")); + process.on("SIGTERM", () => shutdown("SIGTERM")); + + process.on("uncaughtException", (error) => { + log.error("Uncaught exception", error); + serverProcess?.kill("SIGTERM"); + process.exit(1); + }); + + process.on("unhandledRejection", (reason) => { + log.error("Unhandled rejection", reason); + serverProcess?.kill("SIGTERM"); + process.exit(1); + }); - try { if (args.verbose) { - log.info(`Connecting to BAP server at ${args.url ?? "ws://localhost:9222"}`); + log.info(`Connecting to BAP server at ${bapServerUrl}`); } await server.run(); } catch (error) { + serverProcess?.kill("SIGTERM"); log.error("Failed to start server", error); process.exit(1); } diff --git a/skills/bap-browser/LICENSE.txt b/skills/bap-browser/LICENSE.txt new file mode 100644 index 0000000..5fe2868 --- /dev/null +++ b/skills/bap-browser/LICENSE.txt @@ -0,0 +1,208 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to the Licensor for inclusion in the Work by the copyright + owner or by an individual or Legal Entity authorized to submit on behalf + of the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + +--- + +MIT License + +Copyright (c) 2024-2025 Model Context Protocol a Series of LF Projects, LLC. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +--- + +Creative Commons Attribution 4.0 International (CC-BY-4.0) + +Documentation in this project (excluding specifications) is licensed under +CC-BY-4.0. See https://creativecommons.org/licenses/by/4.0/legalcode for +the full license text. \ No newline at end of file diff --git a/skills/bap-browser/SKILL.md b/skills/bap-browser/SKILL.md new file mode 100644 index 0000000..d54de85 --- /dev/null +++ b/skills/bap-browser/SKILL.md @@ -0,0 +1,173 @@ +--- +name: bap-browser +description: "Browser automation via Browser Agent Protocol (BAP). Apply when using BAP MCP tools: navigate, click, fill, type, observe, act, extract, screenshot, aria_snapshot, content, scroll, hover, press, select, element, pages, activate_page, close_page, go_back, go_forward, reload, accessibility." +license: See LICENSE.txt (Apache-2.0) +--- + +# BAP Browser Automation + +You have BAP (Browser Agent Protocol) tools available. BAP wraps a real browser and exposes it through semantic, AI-native APIs. This document defines how to use them well. + +## Decision: Which Tool? + +**I need to open a page** → `navigate` + +**I need to understand what's on the page:** +- I want interactive elements with stable refs → `observe` (set `includeScreenshot: true` for visual context) +- I want the page structure cheaply → `aria_snapshot` (preferred — ~80% fewer tokens than `accessibility`) +- I want to read article/body text → `content` with `format: "markdown"` +- I want a visual capture → `screenshot` + +**I need to interact with something:** +- Single click → `click` +- Fill a form field (replaces content) → `fill` +- Type character-by-character (autocomplete, search-as-you-type) → `type` +- Press Enter/Tab/Escape/keyboard shortcut → `press` +- Select from dropdown → `select` +- Scroll to reveal more → `scroll` +- Trigger hover menu → `hover` + +**I need to do multiple things at once** → `act` (batch 2–50 steps, single round-trip) + +**I need structured data from the page** → `extract` (give it a JSON schema) + +**I need to check an element's state** → `element` (visible? enabled? checked? value?) + +**I need to manage tabs** → `pages` / `activate_page` / `close_page` + +**I need to go back/forward/reload** → `go_back` / `go_forward` / `reload` + +## Selectors + +Every interaction tool takes a `selector` parameter. Use this priority: + +``` +role:button:Submit ← Best. ARIA role + accessible name. Survives redesigns. +text:Sign in ← Visible text content. +label:Email address ← Form label association. +placeholder:Search... ← Input placeholder text. +testId:submit-btn ← data-testid attribute. +ref:@e3 (or just @e3) ← Stable ref from a prior observe call. +css:.btn-primary ← Last resort. Fragile. +#element-id ← Shorthand for CSS ID selector. +``` + +**Rules:** +- Always prefer `role:` for buttons, links, inputs, checkboxes. They survive DOM changes. +- Use `text:` when there's no clear ARIA role. +- Never copy CSS selectors from page source. They break across deployments. +- If you don't know what selectors are available, call `observe` first and use the returned refs. + +## The Observe → Act Pattern + +For any multi-step interaction on a page you haven't seen yet: + +**Step 1: Observe.** +``` +observe({ includeScreenshot: true, maxElements: 30 }) +``` +Returns interactive elements with stable refs (`@e1`, `@e2`, ...) and optional annotated screenshot. Now you know exactly what's on the page. + +**Step 2: Act.** +Batch all your actions into one call: +``` +act({ + steps: [ + { action: "action/fill", selector: "@e1", value: "user@example.com" }, + { action: "action/fill", selector: "@e2", value: "hunter2" }, + { action: "action/click", selector: "role:button:Sign in" } + ] +}) +``` + +This pattern turns 4+ round-trips into 2. Use it. + +## Efficiency Rules + +1. **`aria_snapshot` over `accessibility`.** Same structure, ~80% fewer tokens. +2. **`observe` with `maxElements`.** Default is 50. Set it lower when you can: `maxElements: 20`. +3. **`observe` with `filterRoles`.** Focus: `filterRoles: ["button", "link", "textbox"]`. +4. **`act` over individual calls.** A login flow is 1 `act`, not 3 separate fill/click calls. +5. **`extract` over manual parsing.** Define a JSON schema. Let BAP extract. Don't scrape HTML. +6. **`content({ format: "markdown" })` over screenshots for text.** Markdown is compact and parseable. +7. **`fill` over `type` for form fields.** `fill` clears and sets; `type` sends keystrokes one at a time. + +## Recipes + +### Login +``` +act({ + steps: [ + { action: "page/navigate", url: "https://app.example.com/login" }, + { action: "action/fill", selector: "label:Email", value: "user@example.com" }, + { action: "action/fill", selector: "label:Password", value: "password123" }, + { action: "action/click", selector: "role:button:Sign in" } + ] +}) +``` + +### Extract a table of data +``` +navigate({ url: "https://store.example.com/products" }) +extract({ + instruction: "Extract product listings", + mode: "list", + schema: { + type: "array", + items: { + type: "object", + properties: { + name: { type: "string" }, + price: { type: "number" }, + inStock: { type: "boolean" } + } + } + } +}) +``` + +### Read an article +``` +navigate({ url: "https://blog.example.com/post", waitUntil: "networkidle" }) +content({ format: "markdown" }) +``` + +### Complex form with observe +``` +observe({ filterRoles: ["textbox", "combobox", "checkbox"] }) +act({ + steps: [ + { action: "action/fill", selector: "@e1", value: "Jane Doe" }, + { action: "action/fill", selector: "@e2", value: "jane@example.com" }, + { action: "action/select", selector: "@e3", value: "Canada" }, + { action: "action/check", selector: "@e4" }, + { action: "action/click", selector: "role:button:Submit" } + ] +}) +``` + +### Search with autocomplete +``` +type({ selector: "role:combobox:Search", text: "browser agent", delay: 100 }) +press({ key: "ArrowDown" }) +press({ key: "Enter" }) +``` + +## Error Recovery + +| Problem | Fix | +|---------|-----| +| Element not found | `observe` the page again — the DOM changed. Use fresh refs. | +| Navigation timeout | Use `waitUntil: "domcontentloaded"` instead of `"networkidle"`. | +| Stale ref | Refs persist within a page but invalidate after navigation. Re-observe. | +| Click intercepted | `scroll` to the element first, or use `press({ key: "Enter", selector: "..." })`. | +| Page loaded but blank | Wait, then `reload`. Some SPAs hydrate slowly. | + +## Do Not + +- Use CSS selectors copied from browser DevTools. They break. +- Call `accessibility` when `aria_snapshot` works. Wastes tokens. +- Make individual click/fill calls when `act` can batch them. +- Take a screenshot to read text. Use `content({ format: "markdown" })`. +- Skip `observe` on pages you haven't seen. You'll guess wrong. +- Parse raw HTML. Use `extract` with a schema. From bcbe207d4ad33404a284ebc96cdc29a0b8943678 Mon Sep 17 00:00:00 2001 From: Piyush Date: Thu, 12 Feb 2026 00:56:49 -0600 Subject: [PATCH 02/10] docs(skill): improve SKILL.md quality score to 78/100 --- skills/bap-browser/SKILL.md | 105 +++++++++++++++++++++++++++++++++++- 1 file changed, 104 insertions(+), 1 deletion(-) diff --git a/skills/bap-browser/SKILL.md b/skills/bap-browser/SKILL.md index d54de85..0b229ae 100644 --- a/skills/bap-browser/SKILL.md +++ b/skills/bap-browser/SKILL.md @@ -1,6 +1,6 @@ --- name: bap-browser -description: "Browser automation via Browser Agent Protocol (BAP). Apply when using BAP MCP tools: navigate, click, fill, type, observe, act, extract, screenshot, aria_snapshot, content, scroll, hover, press, select, element, pages, activate_page, close_page, go_back, go_forward, reload, accessibility." +description: "AI-optimized browser automation via Browser Agent Protocol (BAP). Use when the user wants to browse websites, scrape web content, automate browser interactions, fill out web forms, extract structured data from pages, take screenshots, or test web applications. Provides semantic selectors, batched multi-step actions, and structured data extraction. Triggers: navigate, click, fill, type, observe, act, extract, screenshot, aria_snapshot, content, scroll, hover, press, select, element, pages, activate_page, close_page, go_back, go_forward, reload, accessibility." license: See LICENSE.txt (Apache-2.0) --- @@ -8,6 +8,28 @@ license: See LICENSE.txt (Apache-2.0) You have BAP (Browser Agent Protocol) tools available. BAP wraps a real browser and exposes it through semantic, AI-native APIs. This document defines how to use them well. +## Quick Start + +For most browser tasks, you only need three tools: + +1. **`navigate`** — open a URL +2. **`observe`** — see what's on the page (returns interactive elements with stable refs) +3. **`act`** — batch multiple interactions into a single call + +``` +navigate({ url: "https://example.com/login" }) +observe({ includeScreenshot: true }) +act({ + steps: [ + { action: "action/fill", selector: "@e1", value: "user@example.com" }, + { action: "action/fill", selector: "@e2", value: "password123" }, + { action: "action/click", selector: "role:button:Sign in" } + ] +}) +``` + +Read on for the full tool reference, selector guide, and advanced patterns. + ## Decision: Which Tool? **I need to open a page** → `navigate` @@ -171,3 +193,84 @@ press({ key: "Enter" }) - Take a screenshot to read text. Use `content({ format: "markdown" })`. - Skip `observe` on pages you haven't seen. You'll guess wrong. - Parse raw HTML. Use `extract` with a schema. + +--- + +## Advanced + +### Multi-tab workflows + +Use `pages` to list all open tabs, `activate_page` to switch between them, and `close_page` to clean up. Useful for comparing content across tabs or handling pop-ups. + +``` +navigate({ url: "https://a.example.com" }) +navigate({ url: "https://b.example.com" }) // opens in new tab +pages() // returns [{id, url}, ...] +activate_page({ pageId: "page-1" }) // switch back to first tab +``` + +### Waiting strategies + +The `waitUntil` parameter on `navigate` controls when the page is considered loaded: + +| Value | When to use | +|-------|-------------| +| `"load"` | Default. Fine for most pages. | +| `"domcontentloaded"` | Faster. Use when you don't need images/fonts. | +| `"networkidle"` | Slowest but most complete. Use for SPAs that fetch data after load. | + +If a page renders dynamically after navigation, use `observe` or `aria_snapshot` with a short delay rather than relying on `networkidle`. + +### Annotated screenshots (Set-of-Marks) + +`observe` supports `annotateScreenshot: true` which overlays numbered markers on each interactive element. Useful for visual debugging or confirming which element a ref points to. + +``` +observe({ includeScreenshot: true, annotateScreenshot: true, maxElements: 20 }) +``` + +The returned screenshot will have numbered badges corresponding to element refs. + +### Nested extraction with complex schemas + +`extract` supports deeply nested JSON schemas. Use `mode: "single"` for a single object, `mode: "list"` for arrays, or `mode: "table"` for tabular data. + +``` +extract({ + instruction: "Extract job listings with company details", + mode: "list", + schema: { + type: "array", + items: { + type: "object", + properties: { + title: { type: "string" }, + company: { + type: "object", + properties: { + name: { type: "string" }, + location: { type: "string" } + } + }, + salary: { type: "number" }, + remote: { type: "boolean" } + } + } + } +}) +``` + +### Error handling in batched actions + +`act` accepts `stopOnFirstError` (default: `true`). Set to `false` if you want to continue executing steps even when one fails — useful for best-effort form fills where some fields may not exist. + +``` +act({ + stopOnFirstError: false, + steps: [ + { action: "action/fill", selector: "label:First name", value: "Jane" }, + { action: "action/fill", selector: "label:Middle name", value: "A." }, // may not exist + { action: "action/fill", selector: "label:Last name", value: "Doe" } + ] +}) +``` From 588990a7a53f20b357ce10d09baf74e7b1df990a Mon Sep 17 00:00:00 2001 From: Piyush Date: Thu, 12 Feb 2026 21:09:48 -0600 Subject: [PATCH 03/10] =?UTF-8?q?feat:=20v0.2.0=20=E2=80=94=20browser=20se?= =?UTF-8?q?lection,=20clean=20tool=20names,=20smarter=20extract?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Default to local Chrome via new channel param across protocol, client, server, and Python SDK - Rename MCP server to BAPBrowser, remove bap_ prefix from all 22 tool names - Add auto-reconnect with health checks and resource cleanup via resetClient() - Switch screenshot default from PNG to JPEG (~60% smaller), force CSS 1x scale - Rewrite agent/extract with semantic content scoping and schema-aware extraction - Fix Zod union ordering (error before success) to prevent z.unknown() swallowing errors - Bump all packages from 0.1.0 to 0.2.0 - Update GitHub URLs to browseragentprotocol/bap org --- .gitignore | 3 + .../src/__tests__/client-methods.test.ts | 37 ++ packages/client/src/index.ts | 2 +- packages/mcp/src/cli.ts | 1 + packages/mcp/src/index.ts | 226 +++++-- packages/protocol/src/types/methods.ts | 1 + packages/protocol/src/types/protocol.ts | 10 +- packages/python-sdk/package.json | 7 +- packages/python-sdk/pyproject.toml | 10 +- .../src/browseragentprotocol/__init__.py | 2 +- .../src/browseragentprotocol/client.py | 6 +- .../src/browseragentprotocol/context.py | 2 +- .../src/browseragentprotocol/sync_client.py | 5 +- .../src/browseragentprotocol/types/methods.py | 1 + .../browseragentprotocol/types/protocol.py | 2 +- packages/server-playwright/README.md | 4 +- packages/server-playwright/src/cli.ts | 4 +- packages/server-playwright/src/server.ts | 572 +++++++++++++++--- 18 files changed, 709 insertions(+), 186 deletions(-) diff --git a/.gitignore b/.gitignore index 8bc0fef..1a20a60 100644 --- a/.gitignore +++ b/.gitignore @@ -33,3 +33,6 @@ coverage/ # Claude Code instructions (local only) CLAUDE.md docs/plans/ + +# Internal roadmap (local only) +ROADMAP.md diff --git a/packages/client/src/__tests__/client-methods.test.ts b/packages/client/src/__tests__/client-methods.test.ts index 85e3445..ee2190d 100644 --- a/packages/client/src/__tests__/client-methods.test.ts +++ b/packages/client/src/__tests__/client-methods.test.ts @@ -197,6 +197,43 @@ describe("BAPClient Methods", () => { expect(parsed.params.headless).toBe(true); }); + it("launch() passes channel param correctly", async () => { + const { client, transport } = await createConnectedClient(); + transport.setAutoResponse("browser/launch", { browserId: "browser-2" }); + + const result = await client.launch({ browser: "chromium", channel: "chrome", headless: false }); + + expect(result.browserId).toBe("browser-2"); + + const launchRequest = transport.sentMessages.find((msg) => { + const parsed = JSON.parse(msg); + return parsed.method === "browser/launch"; + }); + expect(launchRequest).toBeDefined(); + const parsed = JSON.parse(launchRequest!); + expect(parsed.params.browser).toBe("chromium"); + expect(parsed.params.channel).toBe("chrome"); + expect(parsed.params.headless).toBe(false); + }); + + it("launch() without channel works (backwards compat)", async () => { + const { client, transport } = await createConnectedClient(); + transport.setAutoResponse("browser/launch", { browserId: "browser-3" }); + + const result = await client.launch({ browser: "firefox" }); + + expect(result.browserId).toBe("browser-3"); + + const launchRequest = transport.sentMessages.find((msg) => { + const parsed = JSON.parse(msg); + return parsed.method === "browser/launch"; + }); + expect(launchRequest).toBeDefined(); + const parsed = JSON.parse(launchRequest!); + expect(parsed.params.browser).toBe("firefox"); + expect(parsed.params.channel).toBeUndefined(); + }); + it("closeBrowser() sends correct request", async () => { const { client, transport } = await createConnectedClient(); transport.setAutoResponse("browser/close", {}); diff --git a/packages/client/src/index.ts b/packages/client/src/index.ts index cac28b5..33c6602 100644 --- a/packages/client/src/index.ts +++ b/packages/client/src/index.ts @@ -376,7 +376,7 @@ export class BAPClient extends EventEmitter { this.options = { token: options.token, name: options.name ?? "bap-client", - version: options.version ?? "0.1.0", + version: options.version ?? "0.2.0", timeout: options.timeout ?? 30000, events: options.events ?? ["page", "console", "network", "dialog"], }; diff --git a/packages/mcp/src/cli.ts b/packages/mcp/src/cli.ts index eee45a5..17de706 100644 --- a/packages/mcp/src/cli.ts +++ b/packages/mcp/src/cli.ts @@ -354,6 +354,7 @@ async function main(): Promise { const server = new BAPMCPServer({ bapServerUrl, browser: args.browser as BrowserChoice | undefined, + headless: args.headless ?? true, verbose: args.verbose, allowedDomains: args.allowedDomains, }); diff --git a/packages/mcp/src/index.ts b/packages/mcp/src/index.ts index 89f9318..ee4e040 100644 --- a/packages/mcp/src/index.ts +++ b/packages/mcp/src/index.ts @@ -1,14 +1,14 @@ /** * @fileoverview BAP MCP Integration * @module @browseragentprotocol/mcp - * @version 0.1.0 + * @version 0.2.0 * * Exposes Browser Agent Protocol as an MCP (Model Context Protocol) server. - * Allows AI agents like Claude to control browsers through standardized MCP tools. + * Allows AI agents to control browsers through standardized MCP tools. * * TODO (MEDIUM): Add input validation on tool arguments before passing to BAP client * TODO (MEDIUM): Enforce session timeout (maxSessionDuration) - currently unused - * TODO (MEDIUM): Add resource cleanup on partial failure in ensureClient() + * TODO (MEDIUM): Add resource cleanup on partial failure in ensureClient() — DONE (v0.2.0) * TODO (LOW): parseSelector should validate empty/whitespace-only strings * TODO (LOW): Consider sanitizing URLs in verbose logging to prevent token leakage */ @@ -45,13 +45,19 @@ import { // Types // ============================================================================= +export type BrowserChoice = "chrome" | "chromium" | "firefox" | "webkit" | "edge"; + export interface BAPMCPServerOptions { /** BAP server URL (default: ws://localhost:9222) */ bapServerUrl?: string; - /** Server name for MCP (default: bap-browser) */ + /** Server name for MCP (default: BAPBrowser) */ name?: string; /** Server version (default: 1.0.0) */ version?: string; + /** Browser to use: chrome (default), chromium, firefox, webkit, edge */ + browser?: BrowserChoice; + /** Run browser in headless mode (default: true) */ + headless?: boolean; /** Enable verbose logging */ verbose?: boolean; /** Allowed domains for navigation (empty = all allowed) */ @@ -195,7 +201,7 @@ function formatSelectorForDisplay(selector: BAPSelector): string { const TOOLS: Tool[] = [ // Navigation { - name: "bap_navigate", + name: "navigate", description: "Navigate the browser to a URL. Use this to open web pages. Returns the page title and URL after navigation.", inputSchema: { @@ -217,7 +223,7 @@ const TOOLS: Tool[] = [ // Element Interaction { - name: "bap_click", + name: "click", description: 'Click an element on the page. Use semantic selectors like "role:button:Submit" or "text:Sign in" for reliability.', inputSchema: { @@ -237,7 +243,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_type", + name: "type", description: "Type text into an input field. First clicks the element, then types the text character by character.", inputSchema: { @@ -260,9 +266,9 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_fill", + name: "fill", description: - "Fill an input field with text (clears existing content first). Faster than bap_type for form filling.", + "Fill an input field with text (clears existing content first). Faster than type for form filling.", inputSchema: { type: "object", properties: { @@ -279,7 +285,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_press", + name: "press", description: "Press a keyboard key. Use for Enter, Tab, Escape, or keyboard shortcuts like Ctrl+A.", inputSchema: { type: "object", @@ -297,7 +303,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_select", + name: "select", description: "Select an option from a dropdown/select element.", inputSchema: { type: "object", @@ -315,7 +321,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_scroll", + name: "scroll", description: "Scroll the page or a specific element.", inputSchema: { type: "object", @@ -337,7 +343,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_hover", + name: "hover", description: "Hover over an element. Useful for triggering hover menus or tooltips.", inputSchema: { type: "object", @@ -353,7 +359,7 @@ const TOOLS: Tool[] = [ // Observations { - name: "bap_screenshot", + name: "screenshot", description: "Take a screenshot of the current page. Returns the image as base64. Use for visual verification.", inputSchema: { @@ -363,11 +369,20 @@ const TOOLS: Tool[] = [ type: "boolean", description: "Capture full page including scrollable content (default: false)", }, + format: { + type: "string", + enum: ["jpeg", "png"], + description: "Image format (default: jpeg). JPEG is ~60% smaller than PNG for typical pages.", + }, + quality: { + type: "number", + description: "JPEG quality 0-100 (default: 80). Ignored for PNG.", + }, }, }, }, { - name: "bap_accessibility", + name: "accessibility", description: "Get the accessibility tree of the page. Returns a structured representation ideal for understanding page layout and finding elements. RECOMMENDED: Use this before interacting with elements.", inputSchema: { @@ -381,7 +396,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_aria_snapshot", + name: "aria_snapshot", description: "Get a token-efficient YAML snapshot of the page accessibility tree. ~80% fewer tokens than full accessibility tree. Best for LLM context.", inputSchema: { @@ -395,7 +410,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_content", + name: "content", description: "Get page content as text or markdown. Useful for reading article content or extracting text.", inputSchema: { @@ -410,7 +425,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_element", + name: "element", description: "Query properties of a specific element. Check if an element exists, is visible, enabled, etc.", inputSchema: { @@ -435,7 +450,7 @@ const TOOLS: Tool[] = [ // Page Management { - name: "bap_pages", + name: "pages", description: "List all open pages/tabs. Returns page IDs and URLs.", inputSchema: { type: "object", @@ -443,21 +458,21 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_activate_page", + name: "activate_page", description: "Switch to a different page/tab by its ID.", inputSchema: { type: "object", properties: { pageId: { type: "string", - description: "ID of the page to activate (from bap_pages)", + description: "ID of the page to activate (from pages)", }, }, required: ["pageId"], }, }, { - name: "bap_close_page", + name: "close_page", description: "Close the current page/tab.", inputSchema: { type: "object", @@ -465,7 +480,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_go_back", + name: "go_back", description: "Navigate back in browser history.", inputSchema: { type: "object", @@ -473,7 +488,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_go_forward", + name: "go_forward", description: "Navigate forward in browser history.", inputSchema: { type: "object", @@ -481,7 +496,7 @@ const TOOLS: Tool[] = [ }, }, { - name: "bap_reload", + name: "reload", description: "Reload the current page.", inputSchema: { type: "object", @@ -491,7 +506,7 @@ const TOOLS: Tool[] = [ // Agent (Composite Actions, Observations, and Data Extraction) { - name: "bap_act", + name: "act", description: `Execute a sequence of browser actions in a single call. Useful for multi-step flows like login, form submission, or navigation sequences. Each step can have conditions and error handling. More efficient than calling actions individually.`, @@ -546,7 +561,7 @@ Each step can have conditions and error handling. More efficient than calling ac }, }, { - name: "bap_observe", + name: "observe", description: `Get an AI-optimized observation of the current page. Returns interactive elements with pre-computed selectors, making it easy to determine what actions are possible. Supports stable element refs and annotated screenshots. @@ -583,10 +598,10 @@ RECOMMENDED: Use this before complex interactions to understand the page.`, }, }, { - name: "bap_extract", + name: "extract", description: `Extract structured data from the current page. Uses CSS heuristics to find lists, tables, and labeled data matching your schema. -Works best with standard HTML patterns (ul/ol, tables, cards). For complex pages, use bap_content instead.`, +Works best with standard HTML patterns (ul/ol, tables, cards). For complex pages, use content instead.`, inputSchema: { type: "object", properties: { @@ -659,6 +674,21 @@ const RESOURCES: Resource[] = [ }, ]; +// ============================================================================= +// Browser Resolution +// ============================================================================= + +function resolveBrowser(browser: BrowserChoice): { browser: "chromium" | "firefox" | "webkit"; channel?: string } { + switch (browser) { + case "chrome": return { browser: "chromium", channel: "chrome" }; + case "chromium": return { browser: "chromium" }; + case "firefox": return { browser: "firefox" }; + case "webkit": return { browser: "webkit" }; + case "edge": return { browser: "chromium", channel: "msedge" }; + default: return { browser: "chromium", channel: "chrome" }; + } +} + // ============================================================================= // BAP MCP Server // ============================================================================= @@ -675,8 +705,10 @@ export class BAPMCPServer { constructor(options: BAPMCPServerOptions = {}) { this.options = { bapServerUrl: options.bapServerUrl ?? "ws://localhost:9222", - name: options.name ?? "bap-browser", + name: options.name ?? "BAPBrowser", version: options.version ?? "1.0.0", + browser: options.browser ?? "chrome", + headless: options.headless ?? true, verbose: options.verbose ?? false, allowedDomains: options.allowedDomains ?? [], maxSessionDuration: options.maxSessionDuration ?? 3600, @@ -730,28 +762,93 @@ export class BAPMCPServer { } /** - * Ensure BAP client is connected + * Ensure BAP client is connected. + * - On first call: creates transport, connects, launches browser + * - On subsequent calls: checks if connection is still alive, reconnects if needed */ private async ensureClient(): Promise { - if (this.client) { - return this.client; + // If we have a client, verify the connection is still alive + if (this.client && this.transport) { + try { + // Lightweight health check — list pages to confirm the connection works + await this.client.listPages(); + return this.client; + } catch { + // Connection is dead — clean up and reconnect + this.log("Connection lost, reconnecting..."); + await this.resetClient(); + } } this.log("Connecting to BAP server:", this.options.bapServerUrl); - this.transport = new WebSocketTransport(this.options.bapServerUrl); + this.transport = new WebSocketTransport(this.options.bapServerUrl, { + autoReconnect: true, + maxReconnectAttempts: 5, + reconnectDelay: 1000, + }); + + // Hook reconnect callbacks for verbose logging + this.transport.onReconnecting = (attempt, max) => { + this.log(`Reconnecting to BAP server (attempt ${attempt}/${max})...`); + }; + this.transport.onReconnected = () => { + this.log("Reconnected to BAP server"); + }; + this.transport.onClose = () => { + this.log("BAP server connection closed"); + }; + this.client = new BAPClient(this.transport); // Connect and initialize the protocol await this.client.connect(); - // Launch browser - await this.client.launch({ headless: false }); + // Launch browser with configured browser/channel + const resolved = resolveBrowser(this.options.browser); + const headless = this.options.headless ?? true; + try { + await this.client.launch({ + browser: resolved.browser, + channel: resolved.channel, + headless, + }); + } catch (err) { + // If a channel was specified (e.g. local Chrome) and it's not found, fall back to bundled Chromium + if (resolved.channel && String(err).includes("Looks like")) { + this.log( + `Local ${this.options.browser} not found, falling back to bundled Chromium` + ); + await this.client.launch({ browser: "chromium", headless }); + } else { + // Clean up on launch failure to avoid leaking the transport + await this.resetClient(); + throw err; + } + } this.log("BAP client connected and browser launched"); return this.client; } + /** + * Reset client and transport state for reconnection + */ + private async resetClient(): Promise { + try { + if (this.client) { + await this.client.close(); + } + } catch { /* ignore cleanup errors */ } + try { + if (this.transport) { + await this.transport.close(); + } + } catch { /* ignore cleanup errors */ } + this.client = null; + this.transport = null; + } + /** * Set up MCP request handlers */ @@ -815,7 +912,7 @@ export class BAPMCPServer { switch (name) { // Navigation - case "bap_navigate": { + case "navigate": { const url = args.url as string; // Security check @@ -853,7 +950,7 @@ export class BAPMCPServer { } // Element Interaction - case "bap_click": { + case "click": { const selector = parseSelector(args.selector as string); const options = args.clickCount ? { clickCount: args.clickCount as number } : undefined; await client.click(selector, options); @@ -862,7 +959,7 @@ export class BAPMCPServer { }; } - case "bap_type": { + case "type": { const selector = parseSelector(args.selector as string); const text = args.text as string; const delay = args.delay as number | undefined; @@ -872,7 +969,7 @@ export class BAPMCPServer { }; } - case "bap_fill": { + case "fill": { const selector = parseSelector(args.selector as string); const value = args.value as string; await client.fill(selector, value); @@ -881,7 +978,7 @@ export class BAPMCPServer { }; } - case "bap_press": { + case "press": { const key = args.key as string; const selector = args.selector ? parseSelector(args.selector as string) : undefined; await client.press(key, selector); @@ -890,7 +987,7 @@ export class BAPMCPServer { }; } - case "bap_select": { + case "select": { const selector = parseSelector(args.selector as string); const value = args.value as string; await client.select(selector, value); @@ -899,7 +996,7 @@ export class BAPMCPServer { }; } - case "bap_scroll": { + case "scroll": { const direction = (args.direction as ScrollDirection) ?? "down"; const amount = (args.amount as number) ?? 500; const selector = args.selector ? parseSelector(args.selector as string) : undefined; @@ -909,7 +1006,7 @@ export class BAPMCPServer { }; } - case "bap_hover": { + case "hover": { const selector = parseSelector(args.selector as string); await client.hover(selector); return { @@ -918,9 +1015,11 @@ export class BAPMCPServer { } // Observations - case "bap_screenshot": { + case "screenshot": { const fullPage = args.fullPage as boolean ?? false; - const result = await client.screenshot({ fullPage }); + const format = (args.format as string) === "png" ? "png" : "jpeg"; + const quality = typeof args.quality === "number" ? args.quality : (format === "jpeg" ? 80 : undefined); + const result = await client.screenshot({ fullPage, format, quality }); return { content: [ { @@ -932,7 +1031,7 @@ export class BAPMCPServer { }; } - case "bap_accessibility": { + case "accessibility": { const interestingOnly = args.interestingOnly as boolean ?? true; const result = await client.accessibility({ interestingOnly }); return { @@ -945,7 +1044,7 @@ export class BAPMCPServer { }; } - case "bap_aria_snapshot": { + case "aria_snapshot": { const selector = args.selector ? parseSelector(args.selector as string) : undefined; const result = await client.ariaSnapshot(selector); return { @@ -958,7 +1057,7 @@ export class BAPMCPServer { }; } - case "bap_content": { + case "content": { const format = (args.format as ContentFormat) ?? "text"; const result = await client.content(format); return { @@ -966,7 +1065,7 @@ export class BAPMCPServer { }; } - case "bap_element": { + case "element": { const selector = parseSelector(args.selector as string); const properties = (args.properties as ElementProperty[]) ?? ["visible", "enabled"]; const result = await client.element(selector, properties); @@ -981,7 +1080,7 @@ export class BAPMCPServer { } // Page Management - case "bap_pages": { + case "pages": { const result = await client.listPages(); const text = result.pages .map((p) => `${p.id === result.activePage ? "* " : " "}${p.id}: ${p.url} (${p.title})`) @@ -991,7 +1090,7 @@ export class BAPMCPServer { }; } - case "bap_activate_page": { + case "activate_page": { const pageId = args.pageId as string; await client.activatePage(pageId); return { @@ -999,28 +1098,28 @@ export class BAPMCPServer { }; } - case "bap_close_page": { + case "close_page": { await client.closePage(); return { content: [{ type: "text", text: "Closed current page" }], }; } - case "bap_go_back": { + case "go_back": { await client.goBack(); return { content: [{ type: "text", text: "Navigated back" }], }; } - case "bap_go_forward": { + case "go_forward": { await client.goForward(); return { content: [{ type: "text", text: "Navigated forward" }], }; } - case "bap_reload": { + case "reload": { await client.reload(); return { content: [{ type: "text", text: "Reloaded page" }], @@ -1028,7 +1127,7 @@ export class BAPMCPServer { } // Agent (Composite Actions, Observations, and Data Extraction) - case "bap_act": { + case "act": { interface InputStep { label?: string; action: string; @@ -1094,7 +1193,7 @@ export class BAPMCPServer { }; } - case "bap_observe": { + case "observe": { const annotate = args.annotateScreenshot as boolean; const result = await client.observe({ includeScreenshot: (args.includeScreenshot as boolean) || annotate, @@ -1166,7 +1265,7 @@ export class BAPMCPServer { return { content }; } - case "bap_extract": { + case "extract": { const result = await client.extract({ instruction: args.instruction as string, schema: args.schema as ExtractionSchema, @@ -1300,14 +1399,7 @@ export class BAPMCPServer { * Close the server and clean up */ async close(): Promise { - if (this.client) { - await this.client.close(); - this.client = null; - } - if (this.transport) { - await this.transport.close(); - this.transport = null; - } + await this.resetClient(); await this.server.close(); this.log("BAP MCP Server closed"); } diff --git a/packages/protocol/src/types/methods.ts b/packages/protocol/src/types/methods.ts index 9d87de5..d9b0a3a 100644 --- a/packages/protocol/src/types/methods.ts +++ b/packages/protocol/src/types/methods.ts @@ -42,6 +42,7 @@ export type ProxyConfig = z.infer; /** browser/launch parameters */ export const BrowserLaunchParamsSchema = z.object({ browser: BrowserTypeSchema.optional(), + channel: z.string().optional(), headless: z.boolean().optional(), args: z.array(z.string()).optional(), proxy: ProxyConfigSchema.optional(), diff --git a/packages/protocol/src/types/protocol.ts b/packages/protocol/src/types/protocol.ts index 1ae91b9..77de263 100644 --- a/packages/protocol/src/types/protocol.ts +++ b/packages/protocol/src/types/protocol.ts @@ -10,7 +10,7 @@ import { z } from "zod"; // ============================================================================= /** Current BAP protocol version */ -export const BAP_VERSION = "0.1.0"; +export const BAP_VERSION = "0.2.0"; // ============================================================================= // JSON-RPC 2.0 Schemas @@ -70,18 +70,18 @@ export const JSONRPCNotificationSchema = z.object({ }); export type JSONRPCNotification = z.infer; -/** Union of all JSON-RPC response types */ +/** Union of all JSON-RPC response types (error first to prevent z.unknown() from swallowing errors) */ export const JSONRPCResponseSchema = z.union([ - JSONRPCSuccessResponseSchema, JSONRPCErrorResponseSchema, + JSONRPCSuccessResponseSchema, ]); export type JSONRPCResponse = z.infer; -/** Union of all JSON-RPC message types */ +/** Union of all JSON-RPC message types (error before success to prevent z.unknown() from swallowing errors) */ export const JSONRPCMessageSchema = z.union([ JSONRPCRequestSchema, - JSONRPCSuccessResponseSchema, JSONRPCErrorResponseSchema, + JSONRPCSuccessResponseSchema, JSONRPCNotificationSchema, ]); export type JSONRPCMessage = z.infer; diff --git a/packages/python-sdk/package.json b/packages/python-sdk/package.json index e28d347..232de4b 100644 --- a/packages/python-sdk/package.json +++ b/packages/python-sdk/package.json @@ -1,6 +1,6 @@ { "name": "@browseragentprotocol/python-client", - "version": "0.1.0", + "version": "0.2.0", "private": true, "description": "Python SDK for Browser Agent Protocol (BAP) - build scripts only", "scripts": { @@ -9,5 +9,10 @@ "lint": "python -m ruff check src/browseragentprotocol || true", "lint:fix": "python -m ruff check --fix src/browseragentprotocol || true", "clean": "rm -rf dist build *.egg-info .mypy_cache .ruff_cache __pycache__ src/**/__pycache__" + }, + "turbo": { + "build": { + "outputs": [] + } } } diff --git a/packages/python-sdk/pyproject.toml b/packages/python-sdk/pyproject.toml index 7010f88..7f7fe45 100644 --- a/packages/python-sdk/pyproject.toml +++ b/packages/python-sdk/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "browser-agent-protocol" -version = "0.1.0a1" +version = "0.2.0" description = "Python SDK for the Browser Agent Protocol (BAP) - control browsers with AI agents" readme = "README.md" license = { text = "Apache-2.0" } @@ -55,10 +55,10 @@ dev = [ bap = "browseragentprotocol.cli:main" [project.urls] -Homepage = "https://github.com/anthropics/browser-agent-protocol" -Documentation = "https://github.com/anthropics/browser-agent-protocol#readme" -Repository = "https://github.com/anthropics/browser-agent-protocol" -Issues = "https://github.com/anthropics/browser-agent-protocol/issues" +Homepage = "https://github.com/browseragentprotocol/bap" +Documentation = "https://github.com/browseragentprotocol/bap#readme" +Repository = "https://github.com/browseragentprotocol/bap" +Issues = "https://github.com/browseragentprotocol/bap/issues" [build-system] requires = ["hatchling"] diff --git a/packages/python-sdk/src/browseragentprotocol/__init__.py b/packages/python-sdk/src/browseragentprotocol/__init__.py index 64b1a88..65fc2d0 100644 --- a/packages/python-sdk/src/browseragentprotocol/__init__.py +++ b/packages/python-sdk/src/browseragentprotocol/__init__.py @@ -54,7 +54,7 @@ async def main(): ``` """ -__version__ = "0.1.0a1" +__version__ = "0.2.0" # Main client classes from browseragentprotocol.client import BAPClient diff --git a/packages/python-sdk/src/browseragentprotocol/client.py b/packages/python-sdk/src/browseragentprotocol/client.py index 0f4aebb..cd3be9f 100644 --- a/packages/python-sdk/src/browseragentprotocol/client.py +++ b/packages/python-sdk/src/browseragentprotocol/client.py @@ -108,7 +108,7 @@ def __init__( *, token: str | None = None, name: str = "bap-client-python", - version: str = "0.1.0", + version: str = "0.2.0", timeout: float = 30.0, events: list[str] | None = None, ): @@ -274,6 +274,7 @@ def capabilities(self) -> dict[str, Any] | None: async def launch( self, browser: Literal["chromium", "firefox", "webkit"] | None = None, + channel: str | None = None, headless: bool | None = None, args: list[str] | None = None, **kwargs: Any, @@ -283,6 +284,7 @@ async def launch( Args: browser: Browser type (chromium, firefox, webkit) + channel: Playwright channel (e.g. "chrome", "msedge") headless: Run in headless mode args: Additional browser arguments **kwargs: Additional launch options @@ -293,6 +295,8 @@ async def launch( params: dict[str, Any] = {} if browser is not None: params["browser"] = browser + if channel is not None: + params["channel"] = channel if headless is not None: params["headless"] = headless if args is not None: diff --git a/packages/python-sdk/src/browseragentprotocol/context.py b/packages/python-sdk/src/browseragentprotocol/context.py index 0fcba35..40f1b0c 100644 --- a/packages/python-sdk/src/browseragentprotocol/context.py +++ b/packages/python-sdk/src/browseragentprotocol/context.py @@ -17,7 +17,7 @@ async def bap_client( *, token: str | None = None, name: str = "bap-client-python", - version: str = "0.1.0", + version: str = "0.2.0", timeout: float = 30.0, events: list[str] | None = None, browser: Literal["chromium", "firefox", "webkit"] | None = None, diff --git a/packages/python-sdk/src/browseragentprotocol/sync_client.py b/packages/python-sdk/src/browseragentprotocol/sync_client.py index 379305f..6c1fb58 100644 --- a/packages/python-sdk/src/browseragentprotocol/sync_client.py +++ b/packages/python-sdk/src/browseragentprotocol/sync_client.py @@ -74,7 +74,7 @@ def __init__( *, token: str | None = None, name: str = "bap-client-python-sync", - version: str = "0.1.0", + version: str = "0.2.0", timeout: float = 30.0, events: list[str] | None = None, ): @@ -151,13 +151,14 @@ def capabilities(self) -> dict[str, Any] | None: def launch( self, browser: Literal["chromium", "firefox", "webkit"] | None = None, + channel: str | None = None, headless: bool | None = None, args: list[str] | None = None, **kwargs: Any, ) -> BrowserLaunchResult: """Launch a browser instance.""" return self._run( - self._async_client.launch(browser=browser, headless=headless, args=args, **kwargs) + self._async_client.launch(browser=browser, channel=channel, headless=headless, args=args, **kwargs) ) def close_browser(self, browser_id: str | None = None) -> None: diff --git a/packages/python-sdk/src/browseragentprotocol/types/methods.py b/packages/python-sdk/src/browseragentprotocol/types/methods.py index 9da65e6..c27a342 100644 --- a/packages/python-sdk/src/browseragentprotocol/types/methods.py +++ b/packages/python-sdk/src/browseragentprotocol/types/methods.py @@ -84,6 +84,7 @@ class BrowserLaunchParams(BaseModel): """Parameters for browser/launch.""" browser: Literal["chromium", "firefox", "webkit"] | None = None + channel: str | None = None headless: bool | None = None args: list[str] | None = None env: dict[str, str] | None = None diff --git a/packages/python-sdk/src/browseragentprotocol/types/protocol.py b/packages/python-sdk/src/browseragentprotocol/types/protocol.py index 3f449d4..cb3deb8 100644 --- a/packages/python-sdk/src/browseragentprotocol/types/protocol.py +++ b/packages/python-sdk/src/browseragentprotocol/types/protocol.py @@ -12,7 +12,7 @@ # Protocol Version # ============================================================================= -BAP_VERSION = "0.1.0" +BAP_VERSION = "0.2.0" # ============================================================================= # Request ID diff --git a/packages/server-playwright/README.md b/packages/server-playwright/README.md index cffb3a4..eb8dee7 100644 --- a/packages/server-playwright/README.md +++ b/packages/server-playwright/README.md @@ -81,8 +81,8 @@ await client.close(); ### With MCP (for AI agents) ```bash -# Add to Claude Code -claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp +# Add to any MCP-compatible client via CLI +npx @browseragentprotocol/mcp ``` ## Programmatic Usage diff --git a/packages/server-playwright/src/cli.ts b/packages/server-playwright/src/cli.ts index b28ddde..be5bf19 100644 --- a/packages/server-playwright/src/cli.ts +++ b/packages/server-playwright/src/cli.ts @@ -76,7 +76,7 @@ function parseArgs(): Partial { process.exit(0); } else if (arg === "--version" || arg === "-v") { console.log( - `${icons.server} BAP Playwright Server ${pc.dim("v0.1.0-alpha.1")}` + `${icons.server} BAP Playwright Server ${pc.dim("v0.2.0")}` ); process.exit(0); } @@ -167,7 +167,7 @@ async function main(): Promise { console.log( banner({ title: "BAP Playwright Server", - version: "0.1.0-alpha.1", + version: "0.2.0", subtitle: "Browser Agent Protocol", }) ); diff --git a/packages/server-playwright/src/server.ts b/packages/server-playwright/src/server.ts index 6c97a99..e237641 100644 --- a/packages/server-playwright/src/server.ts +++ b/packages/server-playwright/src/server.ts @@ -151,7 +151,7 @@ const ScopeProfiles = { readonly: ['page:read', 'observe:*'] as BAPScope[], standard: [ 'browser:launch', 'browser:close', 'page:*', - 'action:click', 'action:type', 'action:fill', 'action:scroll', 'action:select', + 'action:*', 'observe:*', 'emulate:viewport', ] as BAPScope[], full: ['browser:*', 'page:*', 'action:*', 'observe:*', 'emulate:*', 'trace:*'] as BAPScope[], @@ -396,6 +396,8 @@ export interface BAPServerOptions { host?: string; /** Default browser type */ defaultBrowser?: "chromium" | "firefox" | "webkit"; + /** Default Playwright channel (e.g. "chrome", "msedge") */ + defaultChannel?: string; /** Default headless mode */ headless?: boolean; /** Enable debug logging */ @@ -477,9 +479,10 @@ const BLOCKED_BROWSER_ARGS = [ /** * SECURE-BY-DEFAULT: All security options enabled by default */ -const DEFAULT_OPTIONS: Required> & { +const DEFAULT_OPTIONS: Required> & { authToken: string | undefined; authTokenEnvVar: string; + defaultChannel: string | undefined; security: Required; limits: Required; authorization: Required; @@ -489,6 +492,7 @@ const DEFAULT_OPTIONS: Required 0 ? sanitizedArgs : undefined, proxy: params.proxy, downloadsPath: validatedDownloadsPath, }); // Create the default context - const defaultContext = await state.browser.newContext(); + // Force deviceScaleFactor: 1 for consistent screenshot sizes across platforms + // (retina Macs default to 2x, which doubles pixel count and inflates payloads) + const defaultContext = await state.browser.newContext({ + deviceScaleFactor: 1, + }); const version = state.browser.version(); // Use crypto.randomUUID for unique IDs const contextId = `ctx-${randomUUID().slice(0, 8)}`; @@ -1872,16 +1883,19 @@ export class BAPPlaywrightServer extends EventEmitter { this.checkRateLimit(state, 'screenshot'); // Playwright only supports "png" and "jpeg" for screenshots + // Default to JPEG — ~60% smaller payloads, reducing LLM token cost const screenshotType = (options?.format === "jpeg" || options?.format === "png") ? options.format - : "png"; + : "jpeg"; const buffer = await page.screenshot({ fullPage: options?.fullPage, clip: options?.clip, type: screenshotType, - quality: options?.quality, - scale: options?.scale, + quality: options?.quality ?? (screenshotType === "jpeg" ? 80 : undefined), + // Default to CSS scale to ensure consistent 1x screenshots regardless + // of device pixel ratio (prevents 2x images on retina displays) + scale: options?.scale ?? "css", }); // Parse image dimensions from the buffer @@ -1890,14 +1904,14 @@ export class BAPPlaywrightServer extends EventEmitter { let width: number; let height: number; - const format = options?.format ?? "png"; + const format = screenshotType; if (format === "png" && buffer[0] === 0x89 && buffer[1] === 0x50) { // PNG: Read dimensions from IHDR chunk (offset 16 for width, 20 for height) width = buffer.readUInt32BE(16); height = buffer.readUInt32BE(20); - } else if ((format === "jpeg" || format === "webp") && buffer.length > 0) { - // For JPEG/WebP, fall back to viewport dimensions or clip + } else if (format === "jpeg" && buffer.length > 0) { + // For JPEG, fall back to viewport dimensions or clip // (Parsing JPEG headers is complex, use viewport as approximation) const viewport = page.viewportSize() ?? { width: 1280, height: 720 }; if (options?.clip) { @@ -2679,12 +2693,19 @@ export class BAPPlaywrightServer extends EventEmitter { // Screenshot (with optional annotation) if (params.includeScreenshot || params.annotateScreenshot) { const viewport = page.viewportSize(); - let buffer = await page.screenshot({ type: "png" }); + // Use JPEG by default for ~60% smaller payloads (less LLM token cost) + // Annotations require PNG for sharp badge rendering + const useAnnotation = params.annotateScreenshot && interactiveElements && interactiveElements.length > 0; + const obsFormat = useAnnotation ? "png" as const : "jpeg" as const; + let buffer = await page.screenshot({ + type: obsFormat, + quality: obsFormat === "jpeg" ? 80 : undefined, + }); let annotated = false; let annotationMap: AnnotationMapping[] | undefined; // Apply annotation if requested - if (params.annotateScreenshot && interactiveElements && interactiveElements.length > 0) { + if (useAnnotation && interactiveElements) { const annotationOpts: AnnotationOptions = typeof params.annotateScreenshot === 'object' ? params.annotateScreenshot : { enabled: true }; @@ -2704,7 +2725,7 @@ export class BAPPlaywrightServer extends EventEmitter { result.screenshot = { data: buffer.toString("base64"), - format: "png", + format: obsFormat, width: viewport?.width ?? 0, height: viewport?.height ?? 0, annotated, @@ -2919,123 +2940,480 @@ export class BAPPlaywrightServer extends EventEmitter { } /** - * Extract data from content based on instruction and schema - * This is a basic implementation - a full implementation would use LLM + * Extract data from content based on instruction and schema. + * + * Strategy: + * 1. Scope to the main content area (skip nav/header/footer). + * 2. For lists: find repeating item containers, then use schema + * property names to locate child elements within each item. + * 3. For objects: search for labeled values. + * 4. Coerce types based on schema (string, number, boolean). */ private async extractDataFromContent( page: PlaywrightPage, - content: string, - _instruction: string, // Used for future LLM-based extraction + _content: string, + _instruction: string, schema: { type: string; properties?: Record; items?: unknown }, mode: string, includeSourceRefs: boolean ): Promise<{ data: unknown; sources?: { ref: string; selector: BAPSelector; text?: string }[]; confidence: number }> { - // Basic extraction logic based on common patterns - // This extracts data by finding elements that match the schema structure - const sources: { ref: string; selector: BAPSelector; text?: string }[] = []; + // ── Step 1: Scope to main content area ────────────────────────── + // Try semantic landmarks first, fall back to body + const contentRoot = await this.findContentRoot(page); + if (schema.type === "array" || mode === "list") { - // Extract list of items - const items: unknown[] = []; + const items = await this.extractList( + page, contentRoot, schema, includeSourceRefs, sources + ); + return { + data: items, + sources: includeSourceRefs ? sources : undefined, + confidence: items.length > 0 ? 0.8 : 0.3, + }; + } - // Try to find list-like elements based on the instruction - const listSelectors = [ - 'ul li', 'ol li', '[role="listitem"]', 'tr', '.item', '.card', '[class*="item"]', '[class*="card"]' - ]; + if (mode === "table") { + const rows = await this.extractTable( + page, contentRoot, schema, includeSourceRefs, sources + ); + return { + data: rows, + sources: includeSourceRefs ? sources : undefined, + confidence: rows.length > 0 ? 0.8 : 0.3, + }; + } + + if (schema.type === "object" && schema.properties) { + const result = await this.extractObject( + page, contentRoot, schema, includeSourceRefs, sources + ); + return { + data: result.data, + sources: includeSourceRefs ? sources : undefined, + confidence: result.confidence, + }; + } + + // Default: return scoped text content + const text = await contentRoot.textContent() ?? ""; + return { + data: text.trim().slice(0, 5000), + sources: includeSourceRefs ? sources : undefined, + confidence: 0.5, + }; + } + + /** + * Find the main content area of the page, skipping nav/header/footer. + * Returns a Locator scoped to the best content root. + */ + private async findContentRoot(page: PlaywrightPage) { + // Try semantic landmarks in priority order + const candidates = ['main', '[role="main"]', '#content', '.content', '#main', '.page', '.container', '[role="document"]']; + for (const sel of candidates) { + try { + const loc = page.locator(sel).first(); + if (await loc.count() > 0) { + // Verify it has substantial content (not just a wrapper with nav inside) + const text = await loc.textContent() ?? ""; + if (text.trim().length > 100) return loc; + } + } catch { /* continue */ } + } + // Fallback: body + return page.locator('body'); + } + + /** + * Extract a list of items from repeating elements. + * Uses schema property names to locate child values within each item container. + */ + private async extractList( + _page: PlaywrightPage, + root: ReturnType, + schema: { items?: unknown }, + includeSourceRefs: boolean, + sources: { ref: string; selector: BAPSelector; text?: string }[] + ): Promise { + const itemSchema = schema.items as { type?: string; properties?: Record } | undefined; + const isObjectItems = itemSchema?.type === 'object' && itemSchema.properties; + + // ── Find the best repeating container ──────────────────────────── + // Selectors are ordered by semantic priority: article > role > class-name > generic. + // Use the FIRST selector with 2+ matches rather than the one with the most, + // because generic selectors (ul li) often match sidebar/nav noise. + const containerSelectors = [ + 'article', '[role="listitem"]', + '.product', '.card', '.item', '.listing', '.result', '.entry', '.post', + '[class*="product"]', '[class*="card"]', '[class*="item"]', + 'table tbody tr', + 'ol li', 'ul li', + ]; + + let bestSelector = ''; + let bestCount = 0; + + for (const sel of containerSelectors) { + try { + const count = await root.locator(sel).count(); + if (count >= 2) { + bestSelector = sel; + bestCount = count; + break; // First semantic match wins + } + } catch { /* continue */ } + } + + if (!bestSelector || bestCount === 0) return []; - for (const selector of listSelectors) { + const elements = await root.locator(bestSelector).all(); + const items: unknown[] = []; + const limit = Math.min(elements.length, 100); + + for (let i = 0; i < limit; i++) { + const el = elements[i]!; + + // Skip elements that are likely not visible or too small + try { + const box = await el.boundingBox(); + if (box && (box.width < 10 || box.height < 10)) continue; + } catch { /* proceed anyway */ } + + if (isObjectItems && itemSchema?.properties) { + // ── Schema-aware extraction: match property names to child elements ── + const obj = await this.extractPropertiesFromElement(el, itemSchema.properties); + // Only include if at least one property has a non-empty value + const hasValue = Object.values(obj).some(v => v !== null && v !== undefined && v !== ''); + if (hasValue) { + items.push(obj); + if (includeSourceRefs) { + sources.push({ + ref: `@s${items.length}`, + selector: { type: 'css', value: `${bestSelector}:nth-child(${i + 1})` }, + text: Object.values(obj).filter(Boolean).join(' | ').slice(0, 100), + }); + } + } + } else { + // Simple string items + const text = await el.textContent(); + if (text?.trim()) { + items.push(text.trim()); + if (includeSourceRefs) { + sources.push({ + ref: `@s${items.length}`, + selector: { type: 'css', value: `${bestSelector}:nth-child(${i + 1})` }, + text: text.trim().slice(0, 100), + }); + } + } + } + } + + // ── Fallback: if schema-aware extraction produced 0 items from matched ── + // elements, retry with text-based extraction (extract each property by + // splitting each container's inner text). This handles cases where CSS + // class names don't align with schema property names. + if (items.length === 0 && isObjectItems && itemSchema?.properties && elements.length > 0) { + const propNames = Object.keys(itemSchema.properties); + for (let i = 0; i < limit; i++) { + const el = elements[i]!; try { - const elements = await page.locator(selector).all(); - if (elements.length > 0) { - for (let i = 0; i < Math.min(elements.length, 50); i++) { - const el = elements[i]; - const text = await el.textContent(); - if (text && text.trim()) { - if (schema.items && typeof schema.items === 'object' && 'type' in schema.items) { - if ((schema.items as { type: string }).type === 'string') { - items.push(text.trim()); - } else if ((schema.items as { type: string }).type === 'object') { - // Try to extract object structure - items.push({ text: text.trim() }); - } + const box = await el.boundingBox(); + if (box && (box.width < 10 || box.height < 10)) continue; + } catch { /* proceed */ } + + const fullText = await el.textContent() ?? ''; + if (!fullText.trim()) continue; + + const obj: Record = {}; + // Try to extract known patterns from the full text + for (const key of propNames) { + const kl = key.toLowerCase(); + if (kl === 'title' || kl === 'name') { + // First link's title attribute, or first heading text + try { + const heading = el.locator('h1, h2, h3, h4, h5, h6').first(); + if (await heading.count() > 0) { + const link = heading.locator('a').first(); + if (await link.count() > 0) { + obj[key] = await link.getAttribute('title') ?? await link.textContent() ?? null; } else { - items.push(text.trim()); + obj[key] = await heading.textContent() ?? null; } + } + } catch { /* skip */ } + } else if (kl === 'price' || kl === 'cost' || kl === 'amount') { + const priceMatch = fullText.match(/[$€£¥]\s*[\d,.]+|[\d,.]+\s*[$€£¥]/); + if (priceMatch) obj[key] = priceMatch[0].trim(); + } else if (kl === 'url' || kl === 'link' || kl === 'href') { + try { + const link = el.locator('a').first(); + if (await link.count() > 0) obj[key] = await link.getAttribute('href'); + } catch { /* skip */ } + } else if (kl === 'rating') { + // Try star-rating class pattern (e.g., "star-rating Three") + try { + const ratingEl = el.locator('[class*="rating"], [class*="star"]').first(); + if (await ratingEl.count() > 0) { + const cls = await ratingEl.getAttribute('class') ?? ''; + const parts = cls.split(/\s+/).filter(c => !c.toLowerCase().includes('rating') && !c.toLowerCase().includes('star') && c.length > 0); + if (parts.length > 0) obj[key] = parts[parts.length - 1]; + } + } catch { /* skip */ } + } else if (kl === 'availability' || kl === 'stock' || kl === 'status') { + try { + const stockEl = el.locator('[class*="avail"], [class*="stock"], .availability, .stock').first(); + if (await stockEl.count() > 0) { + obj[key] = (await stockEl.textContent())?.trim() ?? null; + } + } catch { /* skip */ } + } + } - if (includeSourceRefs) { - sources.push({ - ref: `@s${i + 1}`, - selector: { type: 'css', value: `${selector}:nth-child(${i + 1})` }, - text: text.trim().slice(0, 100), - }); - } + const hasValue = Object.values(obj).some(v => v !== null && v !== undefined && v !== ''); + if (hasValue) { + items.push(obj); + if (includeSourceRefs) { + sources.push({ + ref: `@s${items.length}`, + selector: { type: 'css', value: `${bestSelector}:nth-child(${i + 1})` }, + text: Object.values(obj).filter(Boolean).join(' | ').slice(0, 100), + }); + } + } + } + } + + return items; + } + + /** + * Extract property values from a single element container. + * For each schema property, tries class-name matching, then attribute + * matching, then falls back to positional heuristics. + */ + private async extractPropertiesFromElement( + el: ReturnType, + properties: Record + ): Promise> { + const result: Record = {}; + + for (const [key, propSchema] of Object.entries(properties)) { + const keyLower = key.toLowerCase(); + let value: string | null = null; + + // Strategy 1: Find child element whose class or tag contains the property name + const classSelectors = [ + `[class*="${keyLower}"]`, + `[data-${keyLower}]`, + `.${keyLower}`, + ]; + + for (const sel of classSelectors) { + try { + const child = el.locator(sel).first(); + if (await child.count() > 0) { + // For links, prefer title attribute (common for truncated titles) + if (keyLower === 'title' || keyLower === 'name') { + value = await child.getAttribute('title') ?? await child.textContent(); + } else { + value = await child.textContent(); + } + // If text is empty, try extracting from class name (e.g. "star-rating Three" → "Three") + if (!value?.trim()) { + const cls = await child.getAttribute('class') ?? ''; + const clsParts = cls.split(/\s+/).filter(c => !c.includes(keyLower) && c.length > 0); + if (clsParts.length > 0) value = clsParts[clsParts.length - 1] ?? null; + } + if (value?.trim()) break; + } + } catch { /* continue */ } + } + + // Strategy 2: For known property patterns, try specific selectors + if (!value?.trim()) { + try { + if (keyLower === 'title' || keyLower === 'name') { + // Headings, links with title attribute + for (const sel of ['h1 a', 'h2 a', 'h3 a', 'h4 a', 'h1', 'h2', 'h3', 'h4', 'a[title]']) { + const child = el.locator(sel).first(); + if (await child.count() > 0) { + value = await child.getAttribute('title') ?? await child.textContent(); + if (value?.trim()) break; } } - if (items.length > 0) break; + } else if (keyLower === 'price' || keyLower === 'cost' || keyLower === 'amount') { + // Price patterns + const text = await el.textContent() ?? ''; + const priceMatch = text.match(/[$€£¥]\s*[\d,.]+|[\d,.]+\s*[$€£¥]/); + if (priceMatch) value = priceMatch[0].trim(); + } else if (keyLower === 'url' || keyLower === 'link' || keyLower === 'href') { + const link = el.locator('a').first(); + if (await link.count() > 0) { + value = await link.getAttribute('href'); + } + } else if (keyLower === 'image' || keyLower === 'img' || keyLower === 'thumbnail') { + const img = el.locator('img').first(); + if (await img.count() > 0) { + value = await img.getAttribute('src'); + } } - } catch { - // Continue to next selector - } + } catch { /* continue */ } } - return { data: items, sources: includeSourceRefs ? sources : undefined, confidence: items.length > 0 ? 0.7 : 0.3 }; + // Coerce type + const trimmed = value?.trim() ?? null; + if (trimmed === null) { + result[key] = null; + } else if (propSchema.type === 'number') { + const num = parseFloat(trimmed.replace(/[^0-9.-]/g, '')); + result[key] = isNaN(num) ? trimmed : num; + } else if (propSchema.type === 'boolean') { + result[key] = ['true', 'yes', '1', 'in stock', 'available'].includes(trimmed.toLowerCase()); + } else { + result[key] = trimmed; + } } - if (schema.type === "object" && schema.properties) { - // Extract object with properties - const result: Record = {}; - const properties = schema.properties as Record; - - for (const [key, propSchema] of Object.entries(properties)) { - // Try to find content matching this property - const searchTerms = [key, propSchema.description].filter(Boolean); - - for (const term of searchTerms) { - if (!term) continue; - - // Look for labels or headings containing the term - const labelSelectors = [ - `label:has-text("${term}")`, - `th:has-text("${term}")`, - `dt:has-text("${term}")`, - `[class*="${term.toLowerCase()}"]`, - ]; - - for (const selector of labelSelectors) { - try { - const label = await page.locator(selector).first(); - if (await label.count() > 0) { - // Try to find associated value - const parent = label.locator('..'); - const siblingText = await parent.textContent(); - if (siblingText) { - const value = siblingText.replace(new RegExp(term, 'gi'), '').trim(); - if (value) { - result[key] = propSchema.type === 'number' ? parseFloat(value) || value : value; - break; + return result; + } + + /** + * Extract tabular data from an HTML table. + */ + private async extractTable( + _page: PlaywrightPage, + root: ReturnType, + schema: { items?: unknown }, + includeSourceRefs: boolean, + sources: { ref: string; selector: BAPSelector; text?: string }[] + ): Promise { + const rows: unknown[] = []; + const itemSchema = schema.items as { properties?: Record } | undefined; + + try { + // Find table headers to map columns + const headers: string[] = []; + const thElements = await root.locator('table th').all(); + for (const th of thElements) { + headers.push((await th.textContent() ?? '').trim().toLowerCase()); + } + + // Extract rows + const trElements = await root.locator('table tbody tr').all(); + const limit = Math.min(trElements.length, 100); + + for (let i = 0; i < limit; i++) { + const tr = trElements[i]!; + const cells = await tr.locator('td').all(); + const obj: Record = {}; + + if (itemSchema?.properties) { + // Map schema properties to table columns by name + for (const [key, propSchema] of Object.entries(itemSchema.properties)) { + const colIdx = headers.findIndex(h => h.includes(key.toLowerCase())); + if (colIdx >= 0 && colIdx < cells.length) { + const text = (await cells[colIdx]!.textContent() ?? '').trim(); + obj[key] = propSchema.type === 'number' ? (parseFloat(text.replace(/[^0-9.-]/g, '')) || text) : text; + } + } + } else { + // No schema properties — use headers as keys + for (let c = 0; c < cells.length; c++) { + const key = c < headers.length ? headers[c]! : `col${c}`; + obj[key] = (await cells[c]!.textContent() ?? '').trim(); + } + } + + if (Object.values(obj).some(v => v !== null && v !== undefined && v !== '')) { + rows.push(obj); + if (includeSourceRefs) { + sources.push({ + ref: `@s${rows.length}`, + selector: { type: 'css', value: `table tbody tr:nth-child(${i + 1})` }, + }); + } + } + } + } catch { /* table extraction failed */ } + + return rows; + } + + /** + * Extract a single object from page content. + */ + private async extractObject( + page: PlaywrightPage, + root: ReturnType, + schema: { properties?: Record }, + includeSourceRefs: boolean, + sources: { ref: string; selector: BAPSelector; text?: string }[] + ): Promise<{ data: Record; confidence: number }> { + const result: Record = {}; + const properties = schema.properties as Record; + + for (const [key, propSchema] of Object.entries(properties)) { + const searchTerms = [key, propSchema.description].filter(Boolean); + + for (const term of searchTerms) { + if (!term) continue; + + const labelSelectors = [ + `label:has-text("${term}")`, + `th:has-text("${term}")`, + `dt:has-text("${term}")`, + `[class*="${term.toLowerCase()}"]`, + ]; + + for (const selector of labelSelectors) { + try { + const label = root.locator(selector).first(); + if (await label.count() > 0) { + const parent = label.locator('..'); + const siblingText = await parent.textContent(); + if (siblingText) { + const value = siblingText.replace(new RegExp(term, 'gi'), '').trim(); + if (value) { + result[key] = propSchema.type === 'number' ? parseFloat(value) || value : value; + if (includeSourceRefs) { + sources.push({ + ref: `@s${Object.keys(result).length}`, + selector: { type: 'css', value: selector }, + text: value.slice(0, 100), + }); } + break; } } - } catch { - // Continue } - } + } catch { /* continue */ } } } + } - return { - data: Object.keys(result).length > 0 ? result : { raw: content.slice(0, 1000) }, - sources: includeSourceRefs ? sources : undefined, - confidence: Object.keys(result).length > 0 ? 0.6 : 0.2 - }; + // Fallback for meta-based extraction (og:title, meta description, etc.) + if (Object.keys(result).length === 0) { + try { + for (const [key, _propSchema] of Object.entries(properties)) { + if (key === 'title' || key === 'name') { + result[key] = await page.title(); + } else if (key === 'description') { + const desc = await page.locator('meta[name="description"]').getAttribute('content'); + if (desc) result[key] = desc; + } else if (key === 'url') { + result[key] = page.url(); + } + } + } catch { /* continue */ } } - // Default: return text content return { - data: content.trim().slice(0, 5000), - sources: includeSourceRefs ? sources : undefined, - confidence: 0.5 + data: Object.keys(result).length > 0 ? result : { raw: (await root.textContent() ?? "").slice(0, 1000) }, + confidence: Object.keys(result).length > 0 ? 0.7 : 0.2, }; } From 7b5941a0ce62fd0e611e00e8e5a0b0169c36fe0a Mon Sep 17 00:00:00 2001 From: Piyush Date: Thu, 12 Feb 2026 21:10:05 -0600 Subject: [PATCH 04/10] chore: add v0.2.0 changeset --- .changeset/v020-release.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 .changeset/v020-release.md diff --git a/.changeset/v020-release.md b/.changeset/v020-release.md new file mode 100644 index 0000000..53e17c1 --- /dev/null +++ b/.changeset/v020-release.md @@ -0,0 +1,9 @@ +--- +"@browseragentprotocol/protocol": minor +"@browseragentprotocol/logger": minor +"@browseragentprotocol/client": minor +"@browseragentprotocol/server-playwright": minor +"@browseragentprotocol/mcp": minor +--- + +v0.2.0 — browser selection, clean tool names, smarter extract From feb67cd90482a4552244fce852ee004ad9946498 Mon Sep 17 00:00:00 2001 From: Piyush Date: Fri, 13 Feb 2026 09:17:44 -0600 Subject: [PATCH 05/10] fix: sync plugin name, pin MCP version, label demo screenshots by client - Rename plugin from bap-browser to BAPBrowser to match MCP server name - Pin .mcp.json to @browseragentprotocol/mcp@0.2.0 instead of @latest - Add client-specific headings (Claude Code, Claude Desktop, Codex CLI, Codex Desktop) to screenshot section --- .claude-plugin/plugin.json | 2 +- .mcp.json | 2 +- README.md | 12 ++++++++---- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index f04af10..fd04a4a 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,5 +1,5 @@ { - "name": "bap-browser", + "name": "BAPBrowser", "description": "AI-optimized browser automation with semantic selectors, smart observations, and multi-step actions. Navigate, click, fill forms, extract data, and take screenshots across Chrome, Firefox, and WebKit.", "author": { "name": "Browser Agent Protocol" diff --git a/.mcp.json b/.mcp.json index 8b192cc..fcf645a 100644 --- a/.mcp.json +++ b/.mcp.json @@ -2,7 +2,7 @@ "mcpServers": { "bap-browser": { "command": "npx", - "args": ["-y", "@browseragentprotocol/mcp@latest"] + "args": ["-y", "@browseragentprotocol/mcp@0.2.0"] } } } diff --git a/README.md b/README.md index 5a878b8..140dda7 100644 --- a/README.md +++ b/README.md @@ -84,21 +84,25 @@ BAP is also available as a plugin for MCP clients with plugin directories.
Screenshots +#### Claude Code

- BAP in a terminal MCP client
- Browsing Hacker News with BAP + BAP in Claude Code
+ Claude Code browsing Hacker News with BAP

+#### Claude Desktop

- BAP in a desktop MCP client
- Desktop MCP client browsing Hacker News with BAP + BAP in Claude Desktop
+ Claude Desktop browsing Hacker News with BAP

+#### Codex CLI

BAP in Codex CLI
Codex CLI browsing Hacker News with BAP

+#### Codex Desktop

BAP in Codex Desktop
Codex Desktop browsing Hacker News with BAP From dfce7d3f9b350dfed88adf0aad2608adf988a369 Mon Sep 17 00:00:00 2001 From: Piyush Date: Fri, 13 Feb 2026 09:25:45 -0600 Subject: [PATCH 06/10] docs: rewrite Quick Start with client-specific integration instructions - Add exact commands for Claude Code (MCP + plugin install), Claude Desktop (config path + JSON), Codex CLI (CLI + TOML), Codex Desktop (TOML) - Move screenshots inline under each client section - Update MCP package README with same client-specific examples - Replace generic placeholder with real commands --- README.md | 80 ++++++++++++++++++++++++++++-------------- packages/mcp/README.md | 13 ++++--- 2 files changed, 62 insertions(+), 31 deletions(-) diff --git a/README.md b/README.md index 140dda7..7dbb4aa 100644 --- a/README.md +++ b/README.md @@ -49,16 +49,32 @@ BAP (Browser Agent Protocol) provides a standardized way for AI agents to contro ## Quick Start -### Using with MCP (Recommended) - BAP works with any MCP-compatible client. The server auto-starts — no separate setup needed. -**Add via CLI** (works with most MCP clients): +### Claude Code + +**MCP server** (one command): ```bash - mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp +claude mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp ``` -**Add via JSON config** (most MCP desktop clients): +**Plugin** (includes SKILL.md for smarter tool usage): +```bash +claude plugin add --from https://github.com/browseragentprotocol/bap +``` + +

+Screenshot +

+ BAP in Claude Code
+ Claude Code browsing Hacker News with BAP +

+
+ +### Claude Desktop + +Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows): + ```json { "mcpServers": { @@ -70,44 +86,54 @@ BAP works with any MCP-compatible client. The server auto-starts — no separate } ``` -**Add via TOML config** (Codex Desktop): -```toml -[mcp_servers.bap-browser] -command = "npx" -args = ["-y", "@browseragentprotocol/mcp"] -``` - -### Plugin Install - -BAP is also available as a plugin for MCP clients with plugin directories. +Restart Claude Desktop after saving.
-Screenshots - -#### Claude Code -

- BAP in Claude Code
- Claude Code browsing Hacker News with BAP -

- -#### Claude Desktop +Screenshot

BAP in Claude Desktop
Claude Desktop browsing Hacker News with BAP

+
+ +### Codex CLI + +```bash +codex mcp add bap-browser -- npx -y @browseragentprotocol/mcp +``` + +Or add to `~/.codex/config.toml`: + +```toml +[mcp_servers.bap-browser] +command = "npx" +args = ["-y", "@browseragentprotocol/mcp"] +``` -#### Codex CLI +
+Screenshot

BAP in Codex CLI
Codex CLI browsing Hacker News with BAP

+
+ +### Codex Desktop + +Add to `~/.codex/config.toml`: + +```toml +[mcp_servers.bap-browser] +command = "npx" +args = ["-y", "@browseragentprotocol/mcp"] +``` -#### Codex Desktop +
+Screenshot

BAP in Codex Desktop
Codex Desktop browsing Hacker News with BAP

-
### Browser Selection diff --git a/packages/mcp/README.md b/packages/mcp/README.md index 4f54770..9ab3e2d 100644 --- a/packages/mcp/README.md +++ b/packages/mcp/README.md @@ -14,7 +14,12 @@ This auto-starts a BAP Playwright server and exposes browser tools over MCP stdi ### Add to an MCP client -**JSON config** (most MCP clients): +**Claude Code:** +```bash +claude mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp +``` + +**Claude Desktop** — add to `claude_desktop_config.json`: ```json { "mcpServers": { @@ -26,12 +31,12 @@ This auto-starts a BAP Playwright server and exposes browser tools over MCP stdi } ``` -**CLI** (most MCP clients): +**Codex CLI:** ```bash - mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp +codex mcp add bap-browser -- npx -y @browseragentprotocol/mcp ``` -**TOML config** (Codex Desktop): +**Codex Desktop** — add to `~/.codex/config.toml`: ```toml [mcp_servers.bap-browser] command = "npx" From 1322a19fcb8504abc4a9fef20860c8922b5eab72 Mon Sep 17 00:00:00 2001 From: Piyush Date: Fri, 13 Feb 2026 09:36:20 -0600 Subject: [PATCH 07/10] docs: visible screenshots, fix npm badge, optimize SKILL.md for discovery - Show all 4 client screenshots inline (remove collapsed
blocks) - Fix npm badge to link to @browseragentprotocol/mcp (the install package) - Rewrite SKILL.md description with 25+ natural language trigger phrases covering browsing, search, commerce, utilities, data extraction, and more - Add Google search and price comparison recipes to SKILL.md --- README.md | 14 +----------- skills/bap-browser/SKILL.md | 44 ++++++++++++++++++++++++++++++++++++- 2 files changed, 44 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index 7dbb4aa..d880a86 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Browser Agent Protocol (BAP) -[![npm version](https://badge.fury.io/js/@browseragentprotocol%2Fprotocol.svg)](https://www.npmjs.com/package/@browseragentprotocol/protocol) +[![npm version](https://badge.fury.io/js/@browseragentprotocol%2Fmcp.svg)](https://www.npmjs.com/package/@browseragentprotocol/mcp) [![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) An open standard for AI agents to interact with web browsers. @@ -63,13 +63,10 @@ claude mcp add --transport stdio bap-browser -- npx -y @browseragentprotocol/mcp claude plugin add --from https://github.com/browseragentprotocol/bap ``` -
-Screenshot

BAP in Claude Code
Claude Code browsing Hacker News with BAP

-
### Claude Desktop @@ -88,13 +85,10 @@ Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) Restart Claude Desktop after saving. -
-Screenshot

BAP in Claude Desktop
Claude Desktop browsing Hacker News with BAP

-
### Codex CLI @@ -110,13 +104,10 @@ command = "npx" args = ["-y", "@browseragentprotocol/mcp"] ``` -
-Screenshot

BAP in Codex CLI
Codex CLI browsing Hacker News with BAP

-
### Codex Desktop @@ -128,13 +119,10 @@ command = "npx" args = ["-y", "@browseragentprotocol/mcp"] ``` -
-Screenshot

BAP in Codex Desktop
Codex Desktop browsing Hacker News with BAP

-
### Browser Selection diff --git a/skills/bap-browser/SKILL.md b/skills/bap-browser/SKILL.md index 0b229ae..a8f4a5b 100644 --- a/skills/bap-browser/SKILL.md +++ b/skills/bap-browser/SKILL.md @@ -1,6 +1,6 @@ --- name: bap-browser -description: "AI-optimized browser automation via Browser Agent Protocol (BAP). Use when the user wants to browse websites, scrape web content, automate browser interactions, fill out web forms, extract structured data from pages, take screenshots, or test web applications. Provides semantic selectors, batched multi-step actions, and structured data extraction. Triggers: navigate, click, fill, type, observe, act, extract, screenshot, aria_snapshot, content, scroll, hover, press, select, element, pages, activate_page, close_page, go_back, go_forward, reload, accessibility." +description: "AI-powered browser automation via Browser Agent Protocol. Use when the user wants to visit a website, open a webpage, go to a URL, search on Google, look something up online, check a website, read a webpage, book a flight, order food, buy something online, check email or weather, download a file, compare prices, find product reviews, take a screenshot of a page, scrape or extract data from a site, monitor a webpage for changes, test a web application, automate web tasks, interact with a web page, log in to a site, submit or fill out a form, shop online, sign up for a service, browse the web, research a topic online, check stock prices, track a package, read the news, post on social media, or any task that requires controlling a web browser. Provides semantic selectors, batched multi-step actions, and structured data extraction for fast, token-efficient browser automation." license: See LICENSE.txt (Apache-2.0) --- @@ -175,6 +175,48 @@ press({ key: "ArrowDown" }) press({ key: "Enter" }) ``` +### Google search +``` +navigate({ url: "https://www.google.com" }) +act({ + steps: [ + { action: "action/fill", selector: "role:combobox:Search", value: "best noise cancelling headphones 2025" }, + { action: "action/click", selector: "role:button:Google Search" } + ] +}) +content({ format: "markdown" }) +``` + +### Compare prices across sites +``` +navigate({ url: "https://store-a.example.com/product" }) +extract({ + instruction: "Extract the product name and price", + mode: "single", + schema: { + type: "object", + properties: { + name: { type: "string" }, + price: { type: "number" }, + currency: { type: "string" } + } + } +}) +navigate({ url: "https://store-b.example.com/product" }) +extract({ + instruction: "Extract the product name and price", + mode: "single", + schema: { + type: "object", + properties: { + name: { type: "string" }, + price: { type: "number" }, + currency: { type: "string" } + } + } +}) +``` + ## Error Recovery | Problem | Fix | From b101e39e6e835ee486ae0d9099b573a94179750e Mon Sep 17 00:00:00 2001 From: Piyush Date: Fri, 13 Feb 2026 10:01:01 -0600 Subject: [PATCH 08/10] fix: use const for never-reassigned port in standalone test --- packages/mcp/src/__tests__/standalone.test.ts | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/packages/mcp/src/__tests__/standalone.test.ts b/packages/mcp/src/__tests__/standalone.test.ts index 74d4137..6adc397 100644 --- a/packages/mcp/src/__tests__/standalone.test.ts +++ b/packages/mcp/src/__tests__/standalone.test.ts @@ -102,10 +102,8 @@ describe("standalone server utilities", () => { it("waits for a server that starts after a delay", async () => { const server = net.createServer(); - let port: number; - // Find a free port - port = await new Promise((resolve) => { + const port = await new Promise((resolve) => { const tmp = net.createServer(); tmp.listen(0, "localhost", () => { const addr = tmp.address(); From 3f66732078b33acc093d119e956b266e4365ed79 Mon Sep 17 00:00:00 2001 From: Piyush Date: Fri, 13 Feb 2026 10:08:27 -0600 Subject: [PATCH 09/10] docs: move advanced patterns to references/ for progressive disclosure --- skills/bap-browser/SKILL.md | 79 +--------------------- skills/bap-browser/references/REFERENCE.md | 78 +++++++++++++++++++++ 2 files changed, 79 insertions(+), 78 deletions(-) create mode 100644 skills/bap-browser/references/REFERENCE.md diff --git a/skills/bap-browser/SKILL.md b/skills/bap-browser/SKILL.md index a8f4a5b..736ab42 100644 --- a/skills/bap-browser/SKILL.md +++ b/skills/bap-browser/SKILL.md @@ -238,81 +238,4 @@ extract({ --- -## Advanced - -### Multi-tab workflows - -Use `pages` to list all open tabs, `activate_page` to switch between them, and `close_page` to clean up. Useful for comparing content across tabs or handling pop-ups. - -``` -navigate({ url: "https://a.example.com" }) -navigate({ url: "https://b.example.com" }) // opens in new tab -pages() // returns [{id, url}, ...] -activate_page({ pageId: "page-1" }) // switch back to first tab -``` - -### Waiting strategies - -The `waitUntil` parameter on `navigate` controls when the page is considered loaded: - -| Value | When to use | -|-------|-------------| -| `"load"` | Default. Fine for most pages. | -| `"domcontentloaded"` | Faster. Use when you don't need images/fonts. | -| `"networkidle"` | Slowest but most complete. Use for SPAs that fetch data after load. | - -If a page renders dynamically after navigation, use `observe` or `aria_snapshot` with a short delay rather than relying on `networkidle`. - -### Annotated screenshots (Set-of-Marks) - -`observe` supports `annotateScreenshot: true` which overlays numbered markers on each interactive element. Useful for visual debugging or confirming which element a ref points to. - -``` -observe({ includeScreenshot: true, annotateScreenshot: true, maxElements: 20 }) -``` - -The returned screenshot will have numbered badges corresponding to element refs. - -### Nested extraction with complex schemas - -`extract` supports deeply nested JSON schemas. Use `mode: "single"` for a single object, `mode: "list"` for arrays, or `mode: "table"` for tabular data. - -``` -extract({ - instruction: "Extract job listings with company details", - mode: "list", - schema: { - type: "array", - items: { - type: "object", - properties: { - title: { type: "string" }, - company: { - type: "object", - properties: { - name: { type: "string" }, - location: { type: "string" } - } - }, - salary: { type: "number" }, - remote: { type: "boolean" } - } - } - } -}) -``` - -### Error handling in batched actions - -`act` accepts `stopOnFirstError` (default: `true`). Set to `false` if you want to continue executing steps even when one fails — useful for best-effort form fills where some fields may not exist. - -``` -act({ - stopOnFirstError: false, - steps: [ - { action: "action/fill", selector: "label:First name", value: "Jane" }, - { action: "action/fill", selector: "label:Middle name", value: "A." }, // may not exist - { action: "action/fill", selector: "label:Last name", value: "Doe" } - ] -}) -``` +For advanced patterns (multi-tab workflows, waiting strategies, annotated screenshots, nested extraction, batched error handling), see [references/REFERENCE.md](references/REFERENCE.md). diff --git a/skills/bap-browser/references/REFERENCE.md b/skills/bap-browser/references/REFERENCE.md new file mode 100644 index 0000000..d86f4ea --- /dev/null +++ b/skills/bap-browser/references/REFERENCE.md @@ -0,0 +1,78 @@ +# Advanced BAP Patterns + +## Multi-tab workflows + +Use `pages` to list all open tabs, `activate_page` to switch between them, and `close_page` to clean up. Useful for comparing content across tabs or handling pop-ups. + +``` +navigate({ url: "https://a.example.com" }) +navigate({ url: "https://b.example.com" }) // opens in new tab +pages() // returns [{id, url}, ...] +activate_page({ pageId: "page-1" }) // switch back to first tab +``` + +## Waiting strategies + +The `waitUntil` parameter on `navigate` controls when the page is considered loaded: + +| Value | When to use | +|-------|-------------| +| `"load"` | Default. Fine for most pages. | +| `"domcontentloaded"` | Faster. Use when you don't need images/fonts. | +| `"networkidle"` | Slowest but most complete. Use for SPAs that fetch data after load. | + +If a page renders dynamically after navigation, use `observe` or `aria_snapshot` with a short delay rather than relying on `networkidle`. + +## Annotated screenshots (Set-of-Marks) + +`observe` supports `annotateScreenshot: true` which overlays numbered markers on each interactive element. Useful for visual debugging or confirming which element a ref points to. + +``` +observe({ includeScreenshot: true, annotateScreenshot: true, maxElements: 20 }) +``` + +The returned screenshot will have numbered badges corresponding to element refs. + +## Nested extraction with complex schemas + +`extract` supports deeply nested JSON schemas. Use `mode: "single"` for a single object, `mode: "list"` for arrays, or `mode: "table"` for tabular data. + +``` +extract({ + instruction: "Extract job listings with company details", + mode: "list", + schema: { + type: "array", + items: { + type: "object", + properties: { + title: { type: "string" }, + company: { + type: "object", + properties: { + name: { type: "string" }, + location: { type: "string" } + } + }, + salary: { type: "number" }, + remote: { type: "boolean" } + } + } + } +}) +``` + +## Error handling in batched actions + +`act` accepts `stopOnFirstError` (default: `true`). Set to `false` if you want to continue executing steps even when one fails — useful for best-effort form fills where some fields may not exist. + +``` +act({ + stopOnFirstError: false, + steps: [ + { action: "action/fill", selector: "label:First name", value: "Jane" }, + { action: "action/fill", selector: "label:Middle name", value: "A." }, // may not exist + { action: "action/fill", selector: "label:Last name", value: "Doe" } + ] +}) +``` From ac4f9b8e528a8780b89ae121e7bf2ca240f57fc0 Mon Sep 17 00:00:00 2001 From: Piyush Date: Fri, 13 Feb 2026 10:09:55 -0600 Subject: [PATCH 10/10] fix: remove unused variable in extract fallback loop --- packages/server-playwright/src/server.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/server-playwright/src/server.ts b/packages/server-playwright/src/server.ts index e237641..4f18856 100644 --- a/packages/server-playwright/src/server.ts +++ b/packages/server-playwright/src/server.ts @@ -3398,7 +3398,7 @@ export class BAPPlaywrightServer extends EventEmitter { // Fallback for meta-based extraction (og:title, meta description, etc.) if (Object.keys(result).length === 0) { try { - for (const [key, _propSchema] of Object.entries(properties)) { + for (const key of Object.keys(properties)) { if (key === 'title' || key === 'name') { result[key] = await page.title(); } else if (key === 'description') {