feat: opt-in browser-use as a conditional tool family (chromedp via EgressProxy)

# Add opt-in browser-use as a conditional tool family

**Execution mode for Claude Code:** Execute phase by phase. Run the verification block at the end of each phase before proceeding. **Stop on the first verification failure** and surface it — do not continue to the next phase. Corrections based on the actual codebase always take precedence over patterns inferred from this document.

---

## Summary

Add the ability for a Forge agent to drive a real (headless) browser to accomplish web tasks: navigate, extract content, click, fill, and screenshot. This capability is implemented as a **conditional tool family** — `browser_navigate`, `browser_extract`, `browser_click`, `browser_fill`, `browser_screenshot` — that is registered **only when an active skill opts in** via a new `requires.capabilities: [browser]` declaration. It is **not** a skill, and it is **not** an unconditional builtin.

All browser network traffic is routed through Forge's existing `EgressProxy`, so the egress allowlist, SSRF IP validation, and DNS-rebinding protection apply to browser navigation exactly as they do to `http_request`.

## Motivation & design decision

Two framings were rejected:

1. **Browser-use as a skill.** Skills in Forge do not compose other skills — a skill author orchestrates *tools* from their `SKILL.md` (`http_request`, `kubectl` via `cli_execute`, `web_search`). A browser-as-skill cannot be consumed by another skill, so any author writing a task skill (scrape pricing, fill a portal) would have no clean way to reach browsing. Browsing is substrate, like `http_request` — it belongs in the tool layer.

2. **Browser-use as an unconditional builtin.** Registering the tools in `RegisterAll()` forces the Chromium dependency and proxy machinery onto every agent, including a trivial summarize agent. That breaks "browser is optional."

**Chosen model:** a **conditional tool family**, following the exact precedent of `cli_execute` — a first-class tool any skill can drive, registered by the runner only when a dependency condition is met. The opt-in signal is a skill-declared capability requirement. This solves composition (any skill orchestrates the tools from its `SKILL.md`) and keeps the browser optional (agents with no browser-requiring skill never register the tools and never need Chromium).

**Separation of concerns this produces:** the skill layer holds task/policy ("go to the pricing page, extract the table, alert on >5% change"); the tool layer holds the capability (drive a browser safely through the egress proxy).

## Library choice

Use **`github.com/chromedp/chromedp`** (pure-Go Chrome DevTools Protocol). It needs no Node, imports as a Go library (consistent with the "forge-core imported as a library, never shelled to as a binary" invariant), and connects to a Chromium process over CDP. **Do not** use Playwright (drags in a Node driver server) or shell Chromium through `cli_execute`.

---

## Prerequisite reading (architecture-first — do this before writing code)

1. `FORGE_PROJECT_DESIGN.md` — sections: *Skills Architecture*, *SKILL.md Format*, *Tools* (esp. *Conditional Tools* and the `cli_execute` row), *Egress Security* (`EgressEnforcer`, `EgressProxy`, `SafeDialer`, IP validator, redirect policy), *Container Packaging*.
2. The `cli_execute` implementation and **how it is conditionally registered into the runner** (`forge-cli/tools/` — `cli_execute`, `SkillCommandExecutor`; and the registration path in `forge-core/runtime/runner.go`). This is the canonical precedent. **Mirror it; do not invent new wiring.**
3. `forge-skills/requirements/requirements.go` (`AggregateRequirements()`) and `forge-skills/requirements/derive.go` (`DerivedCLIConfig`) — how `bins`/`env`/`guardrails` are merged across skills and surfaced to the runner. The new `capabilities` requirement flows through the same aggregation.
4. `forge-core/security/proxy.go` (`EgressProxy`, `NewEgressProxy`), `safe_dialer.go`, `ip_validator.go` — the proxy already handles HTTP forwarding and CONNECT tunnels via `SafeDialer`.

**Open question to resolve in Phase 0 and report before continuing:** Does `EgressProxy` currently expose a bound network listener (an HTTP forward proxy a separate process can connect to), or is it only wired as an in-process `http.RoundTripper`/handler? The answer determines whether Phase 2 is "add a listener" or "bind the existing handler."

---

## Phase 0 — Investigation & confirmation

**Goal:** Confirm the integration points before any code changes.

**Tasks:**
- Locate the exact function(s) where `cli_execute` is conditionally registered into the runner's tool registry, and how `DerivedCLIConfig` (or equivalent) signals "this dependency is needed."
- Determine `EgressProxy`'s current surface (listener vs. in-process). Report the answer.
- Confirm where the aggregated egress allowlist / `DomainMatcher` is constructed at runtime, so the browser proxy can reuse the **same** matcher.
- Confirm the `Tool` interface signature (`Name`, `Description`, `InputSchema`, `Execute`) and the registry registration call.

**Verification:**
- Produce a short written report (paste into the issue thread) naming: the cli_execute registration function, the runtime egress matcher construction site, the EgressProxy listener status, and the Tool interface location. **Do not proceed until this is posted.**

---

## Phase 1 — `capabilities` requirement plumbing

**Goal:** Add a `requires.capabilities` field to the skill schema and flow it through parse → aggregate → derive. No behavior yet.

**Files (confirm exact paths in Phase 0):**

| File | Change |
|---|---|
| `forge-skills/.../parser` (SKILL.md frontmatter types) | Add `Capabilities []string` under `requires`. |
| `forge-skills/requirements/requirements.go` | `AggregateRequirements()` merges + dedups `capabilities` across active skills (preserve first-occurrence order, mirror bins handling). |
| `forge-skills/requirements/derive.go` | Surface `BrowserRequired bool` (or `Capabilities` set) on the derived config the runner reads. |
| Descriptor/info structs | Thread `capabilities` parser → descriptor → registry → derived config, same as `bins`. |

**Schema addition (SKILL.md):**
```yaml
metadata:
  forge:
    requires:
      capabilities:
        - browser
```

**Verification:**
- `go test ./forge-skills/...` passes.
- Add a unit test: parse a fixture skill declaring `capabilities: [browser]`; assert it appears in `AggregateRequirements()` output and that `DerivedCLIConfig` reports the browser requirement true. With no such skill, it reports false.

---

## Phase 2 — `EgressProxy` local listener

**Goal:** Make `EgressProxy` reachable by a separate process (the browser) as an HTTP forward proxy bound to localhost, using the agent's existing egress allowlist + `SafeDialer` + IP validation. **This is the load-bearing security phase.**

**Files:**

| File | Change |
|---|---|
| `forge-core/security/proxy.go` | If no listener exists: add `Serve(ctx) (addr string, err error)` (or `ListenAndServe`) binding `127.0.0.1:0` (ephemeral), returning the resolved `host:port`. Ensure CONNECT (HTTPS) and plain HTTP forwarding both go through `SafeDialer`/`safeTransport` and `ValidateHostIP`. Graceful shutdown on ctx cancel. |
| `forge-core/runtime/runner.go` | Start the proxy listener **only when** the browser capability is active (Phase 4 gates this). Construct it with the **same `DomainMatcher`/allowlist** used by `EgressEnforcer`, and the same `allowPrivateIPs`. Expose the resolved address via context/runner field for the browser tool. |

**Constraints:**
- The proxy must enforce the identical allowlist as HTTP tools — no separate, looser policy for the browser.
- `127.0.0.1` only. Never bind a non-localhost interface.

**Verification:**
- Start the proxy in a test against a derived allowlist of `example.com`. Using a plain HTTP client configured to use the proxy:
  - request to `https://example.com` → succeeds (CONNECT tunnel established).
  - request to a non-allowed domain → blocked.
  - request to `http://169.254.169.254/` → blocked by IP validator.
  - request to a private IP with `allowPrivateIPs=false` → blocked.

---

## Phase 3 — Browser tool family

**Goal:** Implement the chromedp-backed browser tools and the process lifecycle, routed through the Phase 2 proxy. (Registration is Phase 4 — keep this phase registration-free.)

**Home:** `forge-cli/tools/browser/` — same package neighborhood as `cli_execute`, because this is process-lifecycle + CDP, not a pure-Go utility.

**Components:**
- `manager.go` — owns one Chromium instance per agent (lazily launched on first browser tool call). Launch flags:
  - `--headless=new` (unless `FORGE_BROWSER_HEADLESS=false`)
  - `--proxy-server=http://<phase2-addr>`
  - `--proxy-bypass-list=` (empty — nothing bypasses the proxy; the browser must not reach the network directly)
  - `--no-first-run`, `--no-default-browser-check`, `--disable-extensions`
  - throwaway `--user-data-dir` under the agent workdir; deleted on shutdown (no persisted cookies/profile across runs by default)
  - resolves the Chromium binary via `exec.LookPath` (e.g. `chromium`, `chromium-browser`, `headless-shell`); honor `FORGE_BROWSER_BIN` override.
- Tools (each implements the `Tool` interface):

| Tool | Input | Output |
|---|---|---|
| `browser_navigate` | `url` (string), optional `wait_selector`, `timeout_ms` | final URL, HTTP-ish status, page title |
| `browser_extract` | optional `selector`, `mode` (`text`\|`html`\|`links`) | extracted content (defaults to readable text of the current page) |
| `browser_click` | `selector` | post-click URL/title |
| `browser_fill` | `selector`, `value` | confirmation |
| `browser_screenshot` | optional `selector`, `full_page` (bool) | PNG via the existing `file_create` output convention so it can be uploaded to channels |

**Constraints:**
- One shared browser/context per agent; manage tabs/pages internally. Do not spawn a new Chromium per tool call.
- Respect the skill's `timeout_hint` for per-call timeouts (default 120s).
- `browser_screenshot` must emit the same structured JSON shape as `file_create` so the channel runtime uploads it.

**Verification:**
- Unit test against a local test HTTP server (`httptest`): `browser_navigate` to the test URL, `browser_extract` returns the expected text, `browser_click` follows a link. Run with the browser routed through a Phase 2 proxy whose allowlist includes `127.0.0.1` (localhost is always allowed per egress rules).
- Test is skipped with a clear message if no Chromium binary is present in the CI environment (do not fail CI for a missing browser).

---

## Phase 4 — Conditional registration (the opt-in)

**Goal:** Register the browser tool family in the runner **iff** an active skill declares `capabilities: [browser]` **and** a Chromium binary is found.

**Files:**

| File | Change |
|---|---|
| `forge-core/runtime/runner.go` | Mirror the `cli_execute` conditional-registration path. When derived requirements report browser required: (1) verify Chromium via `exec.LookPath`; (2) start the Phase 2 proxy listener; (3) construct the browser manager with the proxy address; (4) register the five tools into the registry. When not required: register nothing, start no proxy listener, launch no browser. |

**Failure behavior:**
- Capability declared but no Chromium binary found → **fail fast at startup** with an actionable error naming the missing binary and how to provide it (image install / `FORGE_BROWSER_BIN`). Mirror how missing `bins` are reported for `cli_execute`.

**Verification:**
- Agent with a browser-capable skill → `forge run` registers `browser_*` tools (assert via tool registry / `--mock-tools` introspection).
- Agent with no such skill → `browser_*` tools absent; no proxy listener opened; no Chromium launched.
- Capability declared, Chromium absent → startup fails with the actionable error.

---

## Phase 5 — Security analyzer & audit scoring

**Goal:** Treat the browser capability as high-risk and enforce trust-hint consistency.

**Files:**

| File | Change |
|---|---|
| `forge-skills/compiler/security.go` | Recognize `capabilities: [browser]` as a high-risk capability. Add a consistency check: declaring `browser` while `trust_hints.requires_network: false` is a **trust violation** (Critical) — browsing requires network by definition. |
| `forge skills audit` scoring | Score the `browser` capability in the high-risk bucket (`+15`, alongside `bash`/`python`/`node`). Reflect in both `--format text` and `--format json`. |

**Verification:**
- `forge skills audit` on a browser skill reports it high-risk with the browser capability called out.
- A skill declaring `browser` + `requires_network: false` is flagged as a Critical trust violation by the analyzer.

---

## Phase 6 — Guardrails, denied_tools & form-safety

**Goal:** Make browser output and actions governable by the existing guardrail engine, and add baseline form-entry safety.

**Files:**

| File | Change |
|---|---|
| `forge-core/runtime/skill_guardrails.go` + tool exec hooks | Ensure `browser_extract` (and other content-returning browser tools) pass their output through the `AfterToolExec` `deny_output` redaction path, so a skill's `deny_output` patterns redact secrets from extracted page content. |
| Browser tool egress errors | A `browser_navigate` to a domain outside the allowlist must surface the egress denial as a normal tool error (the proxy already blocks it; ensure the tool reports it cleanly rather than hanging/timing out). |
| `browser_fill` safety | Add a default-on guardrail that refuses to fill fields that look like credential/payment inputs (`type=password`, autocomplete tokens such as `cc-number`, `current-password`) unless the skill explicitly sets an opt-in flag in its guardrail config. Document this. |

**Notes:**
- The browser tools must be deniable via the existing `denied_tools` mechanism (a skill or policy can list `browser_navigate` etc.).
- Document the pattern for a skill author who wants a single audited web chokepoint: deny `http_request` so all web interaction flows through the browser tools (or vice versa).

**Verification:**
- A skill with a `deny_output` secret pattern redacts that secret from `browser_extract` output.
- `browser_navigate` to a non-allowlisted domain returns the egress error promptly.
- `browser_fill` on a `type=password` field is refused without the opt-in flag and allowed with it.

---

## Phase 7 — Packaging

**Goal:** Ship Chromium in the image only for browser agents; document the prod-egress reality.

**Files:**

| File | Change |
|---|---|
| Dockerfile generation (`ConfigToAgentSpec` / build pipeline) | When the agent's derived requirements include the browser capability, the generated multi-stage Dockerfile installs `chromium`/`headless-shell` (and required fonts/libs). When not, it does not. Document the image-size delta. |
| `forge build` / `forge package` docs | Note: general browsing skills declare broad `egress_domains` and `requires_network: true`; `forge package --prod` rejects `dev-open`, so a browser skill must ship a **curated allowlist** or run under a non-prod profile with audit logging. State this explicitly. |

**Verification:**
- `forge build` on a browser agent produces a Dockerfile that installs Chromium; on a non-browser agent it does not.
- `forge package --prod` on a browser skill configured with `dev-open` egress is rejected with a clear message.

---

## Phase 8 — Docs & reference example

**Goal:** Document the capability and provide one opt-in reference.

**Files:**

| File | Change |
|---|---|
| `FORGE_PROJECT_DESIGN.md` | Add `browser_*` to the *Conditional Tools* table (condition: "active skill declares `capabilities: [browser]` and Chromium present"). Add the `capabilities` field to the *SKILL.md Format* section. Note the EgressProxy-listener routing in the *Egress Security* section. |
| Embedded skill (optional, light) | A `web-browse` instructional skill — frontmatter declaring `requires.capabilities: [browser]`, `trust_hints.requires_network: true`, curated/example `egress_domains`, and a SKILL.md body teaching the LLM how to drive the browser tools. No scripts. Serves as the canonical opt-in example. |

**Verification:**
- `forge skills list` shows the example skill (if added) with correct category/tags/icon.
- Design-doc tables updated; `go test ./...` green.

---

## Global anti-pattern checklist (review before every PR)

- [ ] Browser tools are **never** registered in `RegisterAll()` / are never unconditional builtins.
- [ ] Chromium is **never** `go:embed`'d into the `forge` binary, and **never** added to the `cli_execute` `bins` allowlist or shelled through `cli_execute`.
- [ ] All browser traffic goes through the `EgressProxy` listener; `--proxy-bypass-list` is empty; no direct-network escape hatch.
- [ ] The browser proxy uses the **same** allowlist/`DomainMatcher` and `allowPrivateIPs` as `EgressEnforcer` — no separate looser policy.
- [ ] No Playwright/Node dependency; chromedp only.
- [ ] Proxy listener binds `127.0.0.1` only.
- [ ] Trust level is computed by the analyzer; the skill never self-declares trust.
- [ ] No persisted browser profile/cookies across runs by default (throwaway `user-data-dir`).
- [ ] No CAPTCHA solving; no automatic credential/payment field entry without explicit opt-in.
- [ ] Conditional registration mirrors the `cli_execute` path rather than introducing parallel wiring.

## Acceptance criteria

1. A skill declaring `requires.capabilities: [browser]` causes `browser_*` tools to register; a skill without it does not.
2. Browser navigation is governed by the agent's egress allowlist, SSRF IP validation, and DNS-rebinding protection (verified by the Phase 2 tests).
3. `forge skills audit` scores the browser capability high-risk; a `browser` + `requires_network: false` skill is a trust violation.
4. `deny_output` guardrails redact secrets from extracted page content.
5. `forge build` installs Chromium only for browser agents; `--prod` rejects `dev-open` browser skills.
6. `go test ./...` green; design doc updated.

## Non-goals / scope exclusions

- Headful browsing inside containers (local-dev debugging only via `FORGE_BROWSER_HEADLESS=false`).
- Playwright / Node-based automation.
- Browser tools as unconditional builtins or as a standalone skill.
- Embedding Chromium in the forge binary.
- Multiple browser engines / Firefox / WebKit.
- Persistent cookie/session stores across runs.
- CAPTCHA solving, anti-bot evasion, automated credential or payment entry.
- Remote/registry distribution of the browser tool (covered by the existing registry roadmap).


File	Change
`forge-skills/.../parser` (SKILL.md frontmatter types)	Add `Capabilities []string` under `requires`.
`forge-skills/requirements/requirements.go`	`AggregateRequirements()` merges + dedups `capabilities` across active skills (preserve first-occurrence order, mirror bins handling).
`forge-skills/requirements/derive.go`	Surface `BrowserRequired bool` (or `Capabilities` set) on the derived config the runner reads.
Descriptor/info structs	Thread `capabilities` parser → descriptor → registry → derived config, same as `bins`.

File	Change
`forge-core/security/proxy.go`	If no listener exists: add `Serve(ctx) (addr string, err error)` (or `ListenAndServe`) binding `127.0.0.1:0` (ephemeral), returning the resolved `host:port`. Ensure CONNECT (HTTPS) and plain HTTP forwarding both go through `SafeDialer`/`safeTransport` and `ValidateHostIP`. Graceful shutdown on ctx cancel.
`forge-core/runtime/runner.go`	Start the proxy listener only when the browser capability is active (Phase 4 gates this). Construct it with the same `DomainMatcher`/allowlist used by `EgressEnforcer`, and the same `allowPrivateIPs`. Expose the resolved address via context/runner field for the browser tool.

Tool	Input	Output
`browser_navigate`	`url` (string), optional `wait_selector`, `timeout_ms`	final URL, HTTP-ish status, page title
`browser_extract`	optional `selector`, `mode` (`text`\|`html`\|`links`)	extracted content (defaults to readable text of the current page)
`browser_click`	`selector`	post-click URL/title
`browser_fill`	`selector`, `value`	confirmation
`browser_screenshot`	optional `selector`, `full_page` (bool)	PNG via the existing `file_create` output convention so it can be uploaded to channels

File	Change
`forge-skills/compiler/security.go`	Recognize `capabilities: [browser]` as a high-risk capability. Add a consistency check: declaring `browser` while `trust_hints.requires_network: false` is a trust violation (Critical) — browsing requires network by definition.
`forge skills audit` scoring	Score the `browser` capability in the high-risk bucket (`+15`, alongside `bash`/`python`/`node`). Reflect in both `--format text` and `--format json`.

File	Change
`forge-core/runtime/skill_guardrails.go` + tool exec hooks	Ensure `browser_extract` (and other content-returning browser tools) pass their output through the `AfterToolExec` `deny_output` redaction path, so a skill's `deny_output` patterns redact secrets from extracted page content.
Browser tool egress errors	A `browser_navigate` to a domain outside the allowlist must surface the egress denial as a normal tool error (the proxy already blocks it; ensure the tool reports it cleanly rather than hanging/timing out).
`browser_fill` safety	Add a default-on guardrail that refuses to fill fields that look like credential/payment inputs (`type=password`, autocomplete tokens such as `cc-number`, `current-password`) unless the skill explicitly sets an opt-in flag in its guardrail config. Document this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: opt-in browser-use as a conditional tool family (chromedp via EgressProxy) #94

Add opt-in browser-use as a conditional tool family

Summary

Motivation & design decision

Library choice

Prerequisite reading (architecture-first — do this before writing code)

Phase 0 — Investigation & confirmation

Phase 1 — `capabilities` requirement plumbing

Phase 2 — `EgressProxy` local listener

Phase 3 — Browser tool family

Phase 4 — Conditional registration (the opt-in)

Phase 5 — Security analyzer & audit scoring

Phase 6 — Guardrails, denied_tools & form-safety

Phase 7 — Packaging

Phase 8 — Docs & reference example

Global anti-pattern checklist (review before every PR)

Acceptance criteria

Non-goals / scope exclusions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

File	Change
Dockerfile generation (`ConfigToAgentSpec` / build pipeline)	When the agent's derived requirements include the browser capability, the generated multi-stage Dockerfile installs `chromium`/`headless-shell` (and required fonts/libs). When not, it does not. Document the image-size delta.
`forge build` / `forge package` docs	Note: general browsing skills declare broad `egress_domains` and `requires_network: true`; `forge package --prod` rejects `dev-open`, so a browser skill must ship a curated allowlist or run under a non-prod profile with audit logging. State this explicitly.

File	Change
`FORGE_PROJECT_DESIGN.md`	Add `browser_` to the Conditional Tools* table (condition: "active skill declares `capabilities: [browser]` and Chromium present"). Add the `capabilities` field to the SKILL.md Format section. Note the EgressProxy-listener routing in the Egress Security section.
Embedded skill (optional, light)	A `web-browse` instructional skill — frontmatter declaring `requires.capabilities: [browser]`, `trust_hints.requires_network: true`, curated/example `egress_domains`, and a SKILL.md body teaching the LLM how to drive the browser tools. No scripts. Serves as the canonical opt-in example.

feat: opt-in browser-use as a conditional tool family (chromedp via EgressProxy) #94

Description

Add opt-in browser-use as a conditional tool family

Summary

Motivation & design decision

Library choice

Prerequisite reading (architecture-first — do this before writing code)

Phase 0 — Investigation & confirmation

Phase 1 — capabilities requirement plumbing

Phase 2 — EgressProxy local listener

Phase 3 — Browser tool family

Phase 4 — Conditional registration (the opt-in)

Phase 5 — Security analyzer & audit scoring

Phase 6 — Guardrails, denied_tools & form-safety

Phase 7 — Packaging

Phase 8 — Docs & reference example

Global anti-pattern checklist (review before every PR)

Acceptance criteria

Non-goals / scope exclusions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Phase 1 — `capabilities` requirement plumbing

Phase 2 — `EgressProxy` local listener