Skip to content

feat: opt-in browser-use as a conditional tool family (chromedp via EgressProxy) #94

@initializ-mk

Description

@initializ-mk

Add opt-in browser-use as a conditional tool family

Execution mode for Claude Code: Execute phase by phase. Run the verification block at the end of each phase before proceeding. Stop on the first verification failure and surface it — do not continue to the next phase. Corrections based on the actual codebase always take precedence over patterns inferred from this document.


Summary

Add the ability for a Forge agent to drive a real (headless) browser to accomplish web tasks: navigate, extract content, click, fill, and screenshot. This capability is implemented as a conditional tool familybrowser_navigate, browser_extract, browser_click, browser_fill, browser_screenshot — that is registered only when an active skill opts in via a new requires.capabilities: [browser] declaration. It is not a skill, and it is not an unconditional builtin.

All browser network traffic is routed through Forge's existing EgressProxy, so the egress allowlist, SSRF IP validation, and DNS-rebinding protection apply to browser navigation exactly as they do to http_request.

Motivation & design decision

Two framings were rejected:

  1. Browser-use as a skill. Skills in Forge do not compose other skills — a skill author orchestrates tools from their SKILL.md (http_request, kubectl via cli_execute, web_search). A browser-as-skill cannot be consumed by another skill, so any author writing a task skill (scrape pricing, fill a portal) would have no clean way to reach browsing. Browsing is substrate, like http_request — it belongs in the tool layer.

  2. Browser-use as an unconditional builtin. Registering the tools in RegisterAll() forces the Chromium dependency and proxy machinery onto every agent, including a trivial summarize agent. That breaks "browser is optional."

Chosen model: a conditional tool family, following the exact precedent of cli_execute — a first-class tool any skill can drive, registered by the runner only when a dependency condition is met. The opt-in signal is a skill-declared capability requirement. This solves composition (any skill orchestrates the tools from its SKILL.md) and keeps the browser optional (agents with no browser-requiring skill never register the tools and never need Chromium).

Separation of concerns this produces: the skill layer holds task/policy ("go to the pricing page, extract the table, alert on >5% change"); the tool layer holds the capability (drive a browser safely through the egress proxy).

Library choice

Use github.com/chromedp/chromedp (pure-Go Chrome DevTools Protocol). It needs no Node, imports as a Go library (consistent with the "forge-core imported as a library, never shelled to as a binary" invariant), and connects to a Chromium process over CDP. Do not use Playwright (drags in a Node driver server) or shell Chromium through cli_execute.


Prerequisite reading (architecture-first — do this before writing code)

  1. FORGE_PROJECT_DESIGN.md — sections: Skills Architecture, SKILL.md Format, Tools (esp. Conditional Tools and the cli_execute row), Egress Security (EgressEnforcer, EgressProxy, SafeDialer, IP validator, redirect policy), Container Packaging.
  2. The cli_execute implementation and how it is conditionally registered into the runner (forge-cli/tools/cli_execute, SkillCommandExecutor; and the registration path in forge-core/runtime/runner.go). This is the canonical precedent. Mirror it; do not invent new wiring.
  3. forge-skills/requirements/requirements.go (AggregateRequirements()) and forge-skills/requirements/derive.go (DerivedCLIConfig) — how bins/env/guardrails are merged across skills and surfaced to the runner. The new capabilities requirement flows through the same aggregation.
  4. forge-core/security/proxy.go (EgressProxy, NewEgressProxy), safe_dialer.go, ip_validator.go — the proxy already handles HTTP forwarding and CONNECT tunnels via SafeDialer.

Open question to resolve in Phase 0 and report before continuing: Does EgressProxy currently expose a bound network listener (an HTTP forward proxy a separate process can connect to), or is it only wired as an in-process http.RoundTripper/handler? The answer determines whether Phase 2 is "add a listener" or "bind the existing handler."


Phase 0 — Investigation & confirmation

Goal: Confirm the integration points before any code changes.

Tasks:

  • Locate the exact function(s) where cli_execute is conditionally registered into the runner's tool registry, and how DerivedCLIConfig (or equivalent) signals "this dependency is needed."
  • Determine EgressProxy's current surface (listener vs. in-process). Report the answer.
  • Confirm where the aggregated egress allowlist / DomainMatcher is constructed at runtime, so the browser proxy can reuse the same matcher.
  • Confirm the Tool interface signature (Name, Description, InputSchema, Execute) and the registry registration call.

Verification:

  • Produce a short written report (paste into the issue thread) naming: the cli_execute registration function, the runtime egress matcher construction site, the EgressProxy listener status, and the Tool interface location. Do not proceed until this is posted.

Phase 1 — capabilities requirement plumbing

Goal: Add a requires.capabilities field to the skill schema and flow it through parse → aggregate → derive. No behavior yet.

Files (confirm exact paths in Phase 0):

File Change
forge-skills/.../parser (SKILL.md frontmatter types) Add Capabilities []string under requires.
forge-skills/requirements/requirements.go AggregateRequirements() merges + dedups capabilities across active skills (preserve first-occurrence order, mirror bins handling).
forge-skills/requirements/derive.go Surface BrowserRequired bool (or Capabilities set) on the derived config the runner reads.
Descriptor/info structs Thread capabilities parser → descriptor → registry → derived config, same as bins.

Schema addition (SKILL.md):

metadata:
  forge:
    requires:
      capabilities:
        - browser

Verification:

  • go test ./forge-skills/... passes.
  • Add a unit test: parse a fixture skill declaring capabilities: [browser]; assert it appears in AggregateRequirements() output and that DerivedCLIConfig reports the browser requirement true. With no such skill, it reports false.

Phase 2 — EgressProxy local listener

Goal: Make EgressProxy reachable by a separate process (the browser) as an HTTP forward proxy bound to localhost, using the agent's existing egress allowlist + SafeDialer + IP validation. This is the load-bearing security phase.

Files:

File Change
forge-core/security/proxy.go If no listener exists: add Serve(ctx) (addr string, err error) (or ListenAndServe) binding 127.0.0.1:0 (ephemeral), returning the resolved host:port. Ensure CONNECT (HTTPS) and plain HTTP forwarding both go through SafeDialer/safeTransport and ValidateHostIP. Graceful shutdown on ctx cancel.
forge-core/runtime/runner.go Start the proxy listener only when the browser capability is active (Phase 4 gates this). Construct it with the same DomainMatcher/allowlist used by EgressEnforcer, and the same allowPrivateIPs. Expose the resolved address via context/runner field for the browser tool.

Constraints:

  • The proxy must enforce the identical allowlist as HTTP tools — no separate, looser policy for the browser.
  • 127.0.0.1 only. Never bind a non-localhost interface.

Verification:

  • Start the proxy in a test against a derived allowlist of example.com. Using a plain HTTP client configured to use the proxy:
    • request to https://example.com → succeeds (CONNECT tunnel established).
    • request to a non-allowed domain → blocked.
    • request to http://169.254.169.254/ → blocked by IP validator.
    • request to a private IP with allowPrivateIPs=false → blocked.

Phase 3 — Browser tool family

Goal: Implement the chromedp-backed browser tools and the process lifecycle, routed through the Phase 2 proxy. (Registration is Phase 4 — keep this phase registration-free.)

Home: forge-cli/tools/browser/ — same package neighborhood as cli_execute, because this is process-lifecycle + CDP, not a pure-Go utility.

Components:

  • manager.go — owns one Chromium instance per agent (lazily launched on first browser tool call). Launch flags:
    • --headless=new (unless FORGE_BROWSER_HEADLESS=false)
    • --proxy-server=http://<phase2-addr>
    • --proxy-bypass-list= (empty — nothing bypasses the proxy; the browser must not reach the network directly)
    • --no-first-run, --no-default-browser-check, --disable-extensions
    • throwaway --user-data-dir under the agent workdir; deleted on shutdown (no persisted cookies/profile across runs by default)
    • resolves the Chromium binary via exec.LookPath (e.g. chromium, chromium-browser, headless-shell); honor FORGE_BROWSER_BIN override.
  • Tools (each implements the Tool interface):
Tool Input Output
browser_navigate url (string), optional wait_selector, timeout_ms final URL, HTTP-ish status, page title
browser_extract optional selector, mode (text|html|links) extracted content (defaults to readable text of the current page)
browser_click selector post-click URL/title
browser_fill selector, value confirmation
browser_screenshot optional selector, full_page (bool) PNG via the existing file_create output convention so it can be uploaded to channels

Constraints:

  • One shared browser/context per agent; manage tabs/pages internally. Do not spawn a new Chromium per tool call.
  • Respect the skill's timeout_hint for per-call timeouts (default 120s).
  • browser_screenshot must emit the same structured JSON shape as file_create so the channel runtime uploads it.

Verification:

  • Unit test against a local test HTTP server (httptest): browser_navigate to the test URL, browser_extract returns the expected text, browser_click follows a link. Run with the browser routed through a Phase 2 proxy whose allowlist includes 127.0.0.1 (localhost is always allowed per egress rules).
  • Test is skipped with a clear message if no Chromium binary is present in the CI environment (do not fail CI for a missing browser).

Phase 4 — Conditional registration (the opt-in)

Goal: Register the browser tool family in the runner iff an active skill declares capabilities: [browser] and a Chromium binary is found.

Files:

File Change
forge-core/runtime/runner.go Mirror the cli_execute conditional-registration path. When derived requirements report browser required: (1) verify Chromium via exec.LookPath; (2) start the Phase 2 proxy listener; (3) construct the browser manager with the proxy address; (4) register the five tools into the registry. When not required: register nothing, start no proxy listener, launch no browser.

Failure behavior:

  • Capability declared but no Chromium binary found → fail fast at startup with an actionable error naming the missing binary and how to provide it (image install / FORGE_BROWSER_BIN). Mirror how missing bins are reported for cli_execute.

Verification:

  • Agent with a browser-capable skill → forge run registers browser_* tools (assert via tool registry / --mock-tools introspection).
  • Agent with no such skill → browser_* tools absent; no proxy listener opened; no Chromium launched.
  • Capability declared, Chromium absent → startup fails with the actionable error.

Phase 5 — Security analyzer & audit scoring

Goal: Treat the browser capability as high-risk and enforce trust-hint consistency.

Files:

File Change
forge-skills/compiler/security.go Recognize capabilities: [browser] as a high-risk capability. Add a consistency check: declaring browser while trust_hints.requires_network: false is a trust violation (Critical) — browsing requires network by definition.
forge skills audit scoring Score the browser capability in the high-risk bucket (+15, alongside bash/python/node). Reflect in both --format text and --format json.

Verification:

  • forge skills audit on a browser skill reports it high-risk with the browser capability called out.
  • A skill declaring browser + requires_network: false is flagged as a Critical trust violation by the analyzer.

Phase 6 — Guardrails, denied_tools & form-safety

Goal: Make browser output and actions governable by the existing guardrail engine, and add baseline form-entry safety.

Files:

File Change
forge-core/runtime/skill_guardrails.go + tool exec hooks Ensure browser_extract (and other content-returning browser tools) pass their output through the AfterToolExec deny_output redaction path, so a skill's deny_output patterns redact secrets from extracted page content.
Browser tool egress errors A browser_navigate to a domain outside the allowlist must surface the egress denial as a normal tool error (the proxy already blocks it; ensure the tool reports it cleanly rather than hanging/timing out).
browser_fill safety Add a default-on guardrail that refuses to fill fields that look like credential/payment inputs (type=password, autocomplete tokens such as cc-number, current-password) unless the skill explicitly sets an opt-in flag in its guardrail config. Document this.

Notes:

  • The browser tools must be deniable via the existing denied_tools mechanism (a skill or policy can list browser_navigate etc.).
  • Document the pattern for a skill author who wants a single audited web chokepoint: deny http_request so all web interaction flows through the browser tools (or vice versa).

Verification:

  • A skill with a deny_output secret pattern redacts that secret from browser_extract output.
  • browser_navigate to a non-allowlisted domain returns the egress error promptly.
  • browser_fill on a type=password field is refused without the opt-in flag and allowed with it.

Phase 7 — Packaging

Goal: Ship Chromium in the image only for browser agents; document the prod-egress reality.

Files:

File Change
Dockerfile generation (ConfigToAgentSpec / build pipeline) When the agent's derived requirements include the browser capability, the generated multi-stage Dockerfile installs chromium/headless-shell (and required fonts/libs). When not, it does not. Document the image-size delta.
forge build / forge package docs Note: general browsing skills declare broad egress_domains and requires_network: true; forge package --prod rejects dev-open, so a browser skill must ship a curated allowlist or run under a non-prod profile with audit logging. State this explicitly.

Verification:

  • forge build on a browser agent produces a Dockerfile that installs Chromium; on a non-browser agent it does not.
  • forge package --prod on a browser skill configured with dev-open egress is rejected with a clear message.

Phase 8 — Docs & reference example

Goal: Document the capability and provide one opt-in reference.

Files:

File Change
FORGE_PROJECT_DESIGN.md Add browser_* to the Conditional Tools table (condition: "active skill declares capabilities: [browser] and Chromium present"). Add the capabilities field to the SKILL.md Format section. Note the EgressProxy-listener routing in the Egress Security section.
Embedded skill (optional, light) A web-browse instructional skill — frontmatter declaring requires.capabilities: [browser], trust_hints.requires_network: true, curated/example egress_domains, and a SKILL.md body teaching the LLM how to drive the browser tools. No scripts. Serves as the canonical opt-in example.

Verification:

  • forge skills list shows the example skill (if added) with correct category/tags/icon.
  • Design-doc tables updated; go test ./... green.

Global anti-pattern checklist (review before every PR)

  • Browser tools are never registered in RegisterAll() / are never unconditional builtins.
  • Chromium is never go:embed'd into the forge binary, and never added to the cli_execute bins allowlist or shelled through cli_execute.
  • All browser traffic goes through the EgressProxy listener; --proxy-bypass-list is empty; no direct-network escape hatch.
  • The browser proxy uses the same allowlist/DomainMatcher and allowPrivateIPs as EgressEnforcer — no separate looser policy.
  • No Playwright/Node dependency; chromedp only.
  • Proxy listener binds 127.0.0.1 only.
  • Trust level is computed by the analyzer; the skill never self-declares trust.
  • No persisted browser profile/cookies across runs by default (throwaway user-data-dir).
  • No CAPTCHA solving; no automatic credential/payment field entry without explicit opt-in.
  • Conditional registration mirrors the cli_execute path rather than introducing parallel wiring.

Acceptance criteria

  1. A skill declaring requires.capabilities: [browser] causes browser_* tools to register; a skill without it does not.
  2. Browser navigation is governed by the agent's egress allowlist, SSRF IP validation, and DNS-rebinding protection (verified by the Phase 2 tests).
  3. forge skills audit scores the browser capability high-risk; a browser + requires_network: false skill is a trust violation.
  4. deny_output guardrails redact secrets from extracted page content.
  5. forge build installs Chromium only for browser agents; --prod rejects dev-open browser skills.
  6. go test ./... green; design doc updated.

Non-goals / scope exclusions

  • Headful browsing inside containers (local-dev debugging only via FORGE_BROWSER_HEADLESS=false).
  • Playwright / Node-based automation.
  • Browser tools as unconditional builtins or as a standalone skill.
  • Embedding Chromium in the forge binary.
  • Multiple browser engines / Firefox / WebKit.
  • Persistent cookie/session stores across runs.
  • CAPTCHA solving, anti-bot evasion, automated credential or payment entry.
  • Remote/registry distribution of the browser tool (covered by the existing registry roadmap).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestforge-cliAffects the forge-cli command-line tool (init, run, build, mcp commands)forge-coreAffects the forge-core library (runtime, security, types, llm, mcp, auth)securitySecurity vulnerability fixestoolsAffects builtin or adapter tools (cli_execute, http_request, browser, etc.)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions