Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .changeset/webmcp-discovery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
"@browseragentprotocol/protocol": minor
"@browseragentprotocol/server-playwright": minor
"@browseragentprotocol/client": minor
"@browseragentprotocol/mcp": minor
---

Add WebMCP tool discovery support via new `discovery/discover` protocol method. Detects tools exposed by websites through the W3C WebMCP standard (declarative HTML attributes and imperative navigator.modelContext API). Also available through `agent/observe` with opt-in `includeWebMCPTools` parameter.
25 changes: 21 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,15 @@ Install BAP as a Claude Code plugin for browser automation:
/install-plugin https://github.com/browseragentprotocol/bap
```

**23 MCP tools** across five categories:
**23 MCP tools** across six categories:

| Category | Tools |
|----------|-------|
| Navigation | `navigate`, `go_back`, `go_forward`, `reload` |
| Interaction | `click`, `fill`, `type`, `press`, `hover`, `scroll`, `select` |
| Composite | `observe` (see the page), `act` (batch multi-step actions), `extract` (structured JSON) |
| Inspection | `screenshot`, `aria_snapshot`, `accessibility`, `content`, `element` |
| Discovery | `discover_tools` |
| Tabs | `pages`, `activate_page`, `close_page` |

**Example prompts:**
Expand Down Expand Up @@ -50,15 +51,31 @@ npx @browseragentprotocol/mcp

## Why BAP?

- **Composite Actions**: Execute multi-step flows in one command — 40x fewer tokens than one-action-at-a-time
- **Fused Operations**: Combine navigate+observe, act+observe into single server calls — 50-85% fewer roundtrips
- **Composite Actions**: Execute multi-step flows in one command — [up to 27% fewer tool calls](./docs/browser-tools-guide.md#results) vs Playwright MCP in reproducible benchmarks (up to 55% with fused operations)
- **Fused Operations**: Combine navigate+observe, act+observe into single server calls
- **Semantic Selectors**: Target elements by purpose (`role:button:"Submit"`) not position — survives redesigns
- **Structured Extraction**: Extract validated JSON from any page with a schema
- **Two Interfaces**: CLI (`bap act`) for shell-based agents, MCP tools for protocol-native agents
- **Accessibility-First**: Built on accessibility tree inspection, designed for AI comprehension
- **Element References**: Stable refs (`@e1`, `e15`) that persist across observations
- **Screenshot Annotation**: Set-of-Marks overlays with numbered badges for vision models

## BAP & WebMCP

[WebMCP](https://github.com/nicolo-ribaudo/tc39-proposal-model-context-protocol) is a W3C Community Group standard that lets websites expose structured tools to AI agents via browser-native APIs (Chrome 146 Canary). BAP and WebMCP are complementary:

- **BAP** works on every website today — it controls browsers via automation (accessibility tree + Playwright)
- **WebMCP** requires websites to opt in — it exposes tools via HTML attributes or `navigator.modelContext`
- **Together**: BAP uses WebMCP tools when available (`discovery/discover`), falls back to automation when not

BAP is the universal agent-browser layer. WebMCP makes cooperative sites richer. See [docs/webmcp-comparison.md](./docs/webmcp-comparison.md) for a detailed comparison including a [decision guide](./docs/webmcp-comparison.md#decision-guide-when-to-use-what).

## Choosing a Browser Tool

See [Browser Automation for AI Agents: A Decision Guide](./docs/browser-tools-guide.md) for a detailed comparison of BAP with Playwright MCP and Playwright CLI, including reproducible benchmark data.

**Short version**: If you're a coding agent, use **BAP CLI** with `bap install-skill`. If you're an MCP-native agent, use **BAP MCP**. See the [guide](./docs/browser-tools-guide.md#what-should-you-use) for the full decision matrix.

## Quick Start

### CLI — For AI Agents That Run Shell Commands
Expand Down Expand Up @@ -89,7 +106,7 @@ npm i -g @browseragentprotocol/cli
npx @browseragentprotocol/cli <command>
```

See the full [CLI documentation](./packages/cli) for all 26 commands, selector reference, and recipes.
See the full [CLI documentation](./packages/cli) for all 23 commands, selector reference, and recipes.

### MCP — For Protocol-Native Agents

Expand Down
160 changes: 160 additions & 0 deletions TODO-ADOPTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# BAP Adoption Plan

**Status:** 3 GitHub stars | **Goal:** 100+ stars, real users, ecosystem traction
**Last updated:** 2026-02-20

---

## Phase 1: Fix the Basics (Week 1)

### 1.1 README Rewrite
- [ ] Lead with a 30-second GIF/video, not architecture diagrams
- [ ] First section: "Why BAP?" with 3 bullet points (40x fewer tokens, semantic selectors, one-command install)
- [ ] Second section: "Quick Start" — 3 lines max to get running
- [ ] Third section: benchmark comparison table (BAP vs Playwright MCP vs screenshot agents)
- [ ] Push architecture, protocol spec, and roadmap below the fold
- [ ] Add badges: npm version, GitHub stars, license, Discord (when ready)

### 1.2 One-Command Experience
- [ ] Verify `npx @anthropic/bap` (or equivalent) works end-to-end with zero config
- [ ] Verify `bap install-skill` works for Claude Code, Codex, Gemini CLI (top 3 platforms)
- [ ] Add `npx bap-mcp` as standalone entry point for MCP users
- [ ] Test the cold-start experience on a clean machine — time it, fix any friction

### 1.3 Plugin Marketplace
- [ ] Follow up on claude-plugins-official submission (submitted 2026-02-20)
- [ ] Once approved: add "Available on Claude Code Plugin Marketplace" badge to README

---

## Phase 2: Demo Content (Week 2-3)

### 2.1 Terminal Demo Video (Highest Leverage)
- [ ] Record with asciinema or screen capture (NOT Sora/AI-generated)
- [ ] Script the demo as a side-by-side comparison:
```
LEFT: Playwright MCP — 10+ messages to fill a form and extract data
RIGHT: BAP — 1 composite action, same result
End card: "3 actions, 1 roundtrip, 40x fewer tokens"
```
- [ ] Keep it under 90 seconds
- [ ] Post to: X/Twitter, r/ClaudeAI, r/LocalLLaMA, LinkedIn

### 2.2 Full Walkthrough Video (Screen Recording)
- [ ] Record a real local session: install BAP, connect to Claude Code, do a real task
- [ ] Show the MCP tool calls in real-time (Claude Code's tool use UI)
- [ ] 3-5 minutes, narrated or with text overlays
- [ ] Post to: YouTube, embed in README

### 2.3 Benchmark Content
- [ ] Use the existing `benchmarks/` repo to generate real numbers
- [ ] Key metrics to highlight:
- Token cost per task (BAP vs Playwright MCP vs Computer Use)
- Roundtrip count per task
- Task completion time
- Success rate on WebVoyager-style benchmarks
- [x] Create a shareable benchmark table/graphic — see `docs/browser-tools-guide.md` (Benchmark Results section)
- [ ] Write a blog post: "We Measured the Token Cost of Browser Agents"

---

## Phase 3: Distribution (Week 3-4)

### 3.1 Go Where the Users Are
- [ ] **Hacker News:** Post the benchmark blog post as a Show HN
- [ ] **r/ClaudeAI:** Post demo video + "I built an alternative to Playwright MCP"
- [ ] **r/LocalLLaMA:** Position as "works with any agent, not just Claude"
- [ ] **X/Twitter:** Thread format — problem → demo → benchmarks → link
- [ ] **LinkedIn:** More polished version of the X thread
- [ ] **Discord servers:** Claude Code community, AI agents communities

### 3.2 Integration Partnerships
- [ ] Open PR/issue on popular agent frameworks to add BAP as a browser backend:
- [ ] LangChain / LangGraph
- [ ] CrewAI
- [ ] AutoGen
- [ ] Pydantic AI
- [ ] Write integration guides: "Use BAP with [framework]" — one page each
- [ ] Reach out to agent framework maintainers directly

### 3.3 Claude Code Ecosystem
- [ ] Plugin marketplace listing (pending)
- [ ] Write a SKILL.md tutorial: "How to write browser automation skills with BAP"
- [ ] Cross-promote from skill-tools and skills.menu

---

## Phase 4: Community (Month 2+)

### 4.1 Developer Experience
- [ ] Set up Discord or GitHub Discussions for community
- [ ] Add "Examples" directory with 5-10 real-world scripts:
- [ ] Form filling (login flow)
- [ ] Data extraction (scrape a table)
- [ ] Multi-page navigation (e-commerce checkout)
- [ ] Screenshot monitoring (visual regression)
- [ ] PDF generation
- [ ] Improve error messages — every error should suggest a fix
- [ ] Add `bap doctor` command that diagnoses common setup issues

### 4.2 Documentation Site
- [ ] Stand up a docs site (can reuse skills.menu infra or Astro Starlight)
- [ ] Pages: Getting Started, Selectors Guide, MCP Integration, CLI Reference, Python SDK, Benchmarks
- [ ] Include interactive "Try It" playground if feasible

### 4.3 Thought Leadership
- [ ] Write "Why Semantic Selectors Beat CSS for AI Agents" (dev.to / blog)
- [ ] Write "The Browser Agent Protocol: An Open Standard" (position piece)
- [ ] Give a talk at a local meetup or AI conference

---

## Messaging Guide

### One-Liner
> Fast, semantic browser control for AI agents. 40x fewer tokens than screenshot-based approaches.

### Elevator Pitch
> AI agents waste massive tokens on browser tasks — screenshot agents send 50KB images every step, DOM tools need 10+ roundtrips for simple forms. BAP uses the accessibility tree (what screen readers use) as a semantic interface, batches actions into single roundtrips, and works with any agent via MCP or WebSocket. One composite action replaces 10 messages. And when sites expose WebMCP tools, BAP discovers and surfaces them automatically — no agent changes needed.

### Differentiators (vs. competitors)
| Them | BAP |
|------|-----|
| Screenshots → pixel coordinates | Accessibility tree → semantic selectors |
| One action per roundtrip | Composite actions (batch N steps in 1 call) |
| CSS selectors that break on redesigns | `role:button:"Submit"` that survives redesigns |
| Tied to one agent platform | Vendor-neutral: MCP + WebSocket + CLI |
| No security model | Scope-based auth, domain filtering, credential redaction |
| No WebMCP support | Auto-discovers WebMCP tools, falls back to automation |

### Target Audiences (in priority order)
1. **Claude Code users** — already using MCP, BAP is a drop-in upgrade over Playwright MCP
2. **AI agent developers (Python)** — building with LangChain, CrewAI, AutoGen; need browser access
3. **AI agent developers (TS)** — building custom agents; need efficient browser protocol
4. **DevTools/testing teams** — interested in semantic selectors for more resilient automation

---

## Anti-Patterns to Avoid

- Don't lead with "protocol" or "standard" — developers adopt tools, not specs
- Don't compare to Playwright directly — Playwright is the engine, BAP is the AI-optimized layer on top
- Don't over-emphasize security features initially — it's important but not what drives first adoption
- Don't make the README longer — make it shorter with better content
- Don't pay for promotion — organic developer content wins long-term
- See `docs/browser-tools-guide.md` for the approved comparison framing — decision guide, not head-to-head battle

---

## Success Metrics

| Milestone | Target | How to Measure |
|-----------|--------|---------------|
| README converts | >5% visitor → star rate | GitHub traffic analytics |
| Demo video | >1K views in first week | Platform analytics |
| HN post | Front page (>50 points) | HN |
| npm weekly downloads | >100/week | npm stats |
| PyPI weekly downloads | >50/week | PyPI stats |
| GitHub stars | 100+ | GitHub |
| External contributors | 3+ PRs from non-maintainers | GitHub |
| Framework integrations | 2+ frameworks ship BAP support | PRs/docs |
Loading
Loading