Skip to content

feat(openab-agent): native MCP client (ADR + implementation)#959

Open
brettchien wants to merge 53 commits into
openabdev:mainfrom
brettchien:feat/openab-agent-mcp
Open

feat(openab-agent): native MCP client (ADR + implementation)#959
brettchien wants to merge 53 commits into
openabdev:mainfrom
brettchien:feat/openab-agent-mcp

Conversation

@brettchien
Copy link
Copy Markdown
Contributor

@brettchien brettchien commented May 31, 2026

Discord Discussion URL: https://discord.com/channels/1490643539682529300/1510657283552706663/1510690101120995369


📚 PR stack — merge in this order:

  1. feat(openab-agent): native MCP client (ADR + implementation) #959 (this PR) — Phase 1: Foundation (ADR + rmcp + stdio + meta-tool) ← merge first
  2. feat(openab-agent): MCP Phase 2 — OAuth 2.1 paste-back (stacks on #959) #962 — Phase 2: OAuth 2.1 paste-back (stacks on feat(openab-agent): native MCP client (ADR + implementation) #959)
  3. feat(openab-agent): MCP Phase 3 — circuit breaker + flag removal (stacks on #959 + #962) #969 — Phase 3: Resilience (circuit breaker + mcp doctor, stacks on feat(openab-agent): MCP Phase 2 — OAuth 2.1 paste-back (stacks on #959) #962)

Each downstream PR rebases + cleans up its diff once the upstream PR merges. Reviewers can take all three in parallel; merges must serialize.


Summary

Adds an ADR (docs/adr/openab-agent-mcp.md) proposing native MCP client support inside openab-agent (the Rust coding agent in the Cargo workspace) and lands Phase 1 of the rollout (foundation: rmcp dep + stdio transport + meta-tool + anonymous Streamable HTTP transport body).

Per shaun-agent auto-screen Balanced recommendation, OAuth (Phase 2) and resilience (Phase 3) are split into separate PRs to keep each diff reviewable.

ADR commits

  • cb8b710 — ADR scaffold: prior-art survey (Hermes / opencode / pi-mcp-adapter / Goose / OpenHands), in-core rmcp + progressive-disclosure meta-tool architecture, OAuth + lifecycle + memory analysis, rollout plan
  • fdd8738 — OAuth flow unification: device-code preferred (matches existing claude / codex / grok CLI convention) + paste-back universal fallback + browser laptop opt-in; TokenStore persistence assumption + cold-start refresh
  • 47363a2 — Hardening: RFC 8414 dynamic discovery disabled by default (awsvpc egress + SSRF surface), opt-in via per-server allowlist; refresh-token rotation race contract (fsync agent-side + mtime-event-driven sync deployment-side, motivated by Fargate Spot + S3 sidecar sync)

Phase 1 implementation

  • mcpServers config loader (openab-agent/src/mcp/config.rs) + mcp list CLI subcommand with secret redaction
  • Runtime state machine (mcp/runtime.rs) — Disconnected | Connecting | Connected | Failed, Arc<RwLock<_>> per-server, double-lock connect with race guard
  • Real stdio dial via rmcp::transport::child_process::TokioChildProcess
  • Anonymous Streamable HTTP transport (no auth — protected servers defer with NeedsAuth once Phase 2 lands)
  • Progressive-disclosure meta-tool (mcp/meta_tool.rs) — single mcp action dispatch (help / list_servers / list_tools / describe_tool / call / status) wired into the agent dispatch loop
  • Cargo feature mcp (default on); CI exercises both feature combos (note: Phase 3 feat(openab-agent): MCP Phase 3 — circuit breaker + flag removal (stacks on #959 + #962) #969 removes this flag)

Scope

Test plan

  • ADR design review by maintainers
  • Phase 1: stdio server (mcp-server-filesystem) connects + tool list + tool call via meta-tool
  • Phase 1: progressive disclosure verified — system prompt growth ≤ +100 tokens
  • Phase 1: anonymous Streamable HTTP transport round-trip (vendor-hosted no-auth server)
  • Phase 1: secret redaction on mcp list; --resolve warnings on stderr
  • Phase 1: ACP smoke green; cargo test (both feature combos) green
  • Phase 2 / Phase 3 test plans move to their respective PRs

🤖 Generated with Claude Code

brettchien and others added 3 commits May 31, 2026 15:30
Adds ADR for in-core rmcp + progressive-disclosure meta-tool, deferred
to symmetry with the Skills extension pattern from PR openabdev#955. Memory
analysis rules out the sidecar alternative; per-session config refresh
replaces file-watcher hot reload to drop ~150 LOC of race-condition
hotspot.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ate guardrails

- §6.4 flow selection: device-code preferred (matches existing CLI convention),
  paste-back universal fallback, browser laptop opt-in
- §5.2 add login/complete_login meta-tool actions; login returns flow-tagged union
- §6.1 spell out TokenStore persistence assumption + cold-start refresh
- §6.3 device_authorization_endpoint extension point + RFC 8414 discovery
- §3.8 stdio container-image caveat (interpreter required for Node/Python)
- §5.7 max_concurrent_servers knob (default 10, see §7 for constrained tuning)
- §7 Fargate 512MB/1GB OOM analysis + mitigations

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ntract

- §6.4 RFC 8414 discovery disabled by default; opt-in requires
  oauth.discovery=true + oauth.discovery_allowlist (boot rejects otherwise).
  Rationale: awsvpc egress + SSRF surface in multi-tenant deployments.
- §6.1 add RTR race warning: async persistence layers (Fargate S3 sync,
  eventually-consistent volumes) must flush new tokens to durable storage
  before Spot interruption, else cascade-revoke locks the user out.
  Contract: fsync(2) agent-side + mtime-event-driven sync deployment-side.
- §6.3 expose oauth.discovery / oauth.discovery_allowlist on custom providers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot added closing-soon PR missing Discord Discussion URL — will auto-close in 3 days and removed closing-soon PR missing Discord Discussion URL — will auto-close in 3 days labels May 31, 2026
brettchien and others added 4 commits May 31, 2026 17:35
Phase 1 foundation slice per ADR §5.4 + §5.6:

- Add optional `rmcp 1.7` dep + `mcp` feature flag (default off)
- Wire `#[cfg(feature = "mcp")] mod mcp;` into main
- New `mcp/config.rs`: `McpConfig` / `ServerConfig` (Stdio | Http) /
  `ToolFilter` / `OAuthConfig`, global+project layered load, project
  precedence on name collision, `${env:VAR}` interpolation with
  per-server error context

OAuth fields limited to `provider` + `scopes`; custom-provider
endpoints (§6.3) deferred to Phase 2 auth slice to avoid dead schema.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI fmt-check found 8 formatting deltas in the Phase 1 scaffold. No
logic change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI previously only built the default feature set, leaving the
Phase 1 MCP scaffold (mcp/config.rs, gated by --features mcp)
without compile, clippy, or test coverage. Adds explicit
--features mcp invocations and watches workflow file changes
so this gap is closed for the rest of the rollout (ADR §9).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires up the Phase 1 config loader so clippy --features mcp can
see it. McpConfig::load() + ServerConfig::resolved() + serde
pretty-print are all reachable from main, clearing the dead-code
denial of compile under -D warnings.

Output groups successful servers (✓) and failures (✗ with reason),
sorted by name for deterministic display.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@shaun-agent
Copy link
Copy Markdown
Contributor

shaun-agent commented May 31, 2026

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Screening report screened PR #959 and moved the project item from `Incoming` to `PR-Screening`.

comment: #959 (comment)
project action: https://github.com/orgs/openabdev/projects/1/views/1?pane=issue&itemId=194215355

Intent

Add native MCP client support inside openab-agent, so the Rust agent can discover and call MCP tools directly instead of relying on broker-side forwarding. The user/operator-visible problem is that MCP integration currently has no durable in-agent design for stdio/HTTP transports, OAuth flows, token persistence, progressive tool exposure, and runtime diagnostics.

Feat

Feature work. The PR introduces an ADR and initial implementation surface for openab-agent MCP support: Cargo feature wiring, CLI/module entry points, MCP config parsing, CI coverage, and a staged rollout plan. The current PR is draft and still mixes design contract with foundation implementation.

Who It Serves

Primary beneficiaries are agent runtime operators and deployers who need MCP tools available inside OpenAB agent sessions without broker mediation. Maintainers and reviewers also benefit from a concrete ADR that defines auth, lifecycle, persistence, and rollout constraints before the full implementation lands.

Rewritten Prompt

Implement native MCP client support for openab-agent behind a controlled rollout. Start with an ADR-backed foundation that adds a feature-gated MCP module, validates config parsing, supports stdio MCP servers, exposes tools through a progressive-disclosure meta-tool, and keeps system prompt growth under 100 tokens. Then add Streamable HTTP transport, explicit OAuth flow selection for device-code, paste-back, and optional browser auth, with durable token storage using fsync-safe writes and refresh-on-cold-start. Finally add resilience controls: per-server circuit breaker, bounded retries/backoff, mcp doctor, and tests covering stdio connection, tool listing, tool calls, OAuth fallback paths, token rotation behavior, failure cooldown, and binary size impact.

Merge Pitch

This should move forward because native MCP belongs at the agent-runtime layer if OpenAB wants tools to be session-local, model-visible, and independent of gateway/broker transport. The risk profile is high enough that the draft status is appropriate: MCP touches auth, process execution, network egress, token storage, prompt budget, and runtime reliability. The likely reviewer concern is scope control.

Best-Practice Comparison

OpenClaw is relevant because this feature needs durable job/tool execution boundaries, explicit routing, retry/backoff, and run logs. PR #959 aligns with OpenClaw on isolated execution and resilience intent, but it still needs concrete run-log/diagnostic behavior and explicit delivery/routing semantics for tool calls once implementation expands beyond config.

Hermes Agent is directly relevant. The PR already borrows the right instincts: daemon/runtime ownership, fresh session thinking, persisted state, and circuit breaker patterns. The Hermes comparison should stay concrete in implementation review: token writes need atomic persistence, concurrent refresh needs a race contract, scheduled/background work should not share stale sessions accidentally, and each MCP server invocation path needs clear lifecycle ownership.

Implementation Options

Conservative: merge only the ADR and feature-gated config/module scaffold first.

Balanced: keep this PR as the foundation PR. Land ADR, rmcp dependency behind mcp, stdio transport, meta-tool, config parsing, and focused tests. Split Streamable HTTP/OAuth and resilience/doctor into follow-up PRs.

Ambitious: land the full staged plan in this PR: stdio, Streamable HTTP, OAuth flows, token persistence, circuit breaker, doctor CLI, and feature flag removal.

Comparison Table

Option Speed Complexity Reliability Maintainability User Impact Fit for OpenAB now
Conservative Fastest Low High for design safety High Minimal Good for architecture approval
Balanced Moderate Medium Good High Useful foundation Best fit
Ambitious Slowest High Risky until proven Medium Largest Premature

Recommendation

Use the balanced path. Treat this PR as the MCP foundation PR: ADR, gated dependency, config, stdio transport, meta-tool, and focused tests. Keep Streamable HTTP/OAuth as the next PR, and resilience/doctor/default-on behavior as a third PR.

brettchien and others added 20 commits May 31, 2026 18:10
sort_by_key over (name, _) is the cleaner form; no behavior
change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per Mira's security review on the Phase 1 thread: the previous
'mcp list' eagerly called ServerConfig::resolved(), substituting
\${env:GITHUB_TOKEN} etc. into the printed output. Three leak
paths matter — pasting CLI output into bug reports / chat, screen
sharing, and stdout log collection.

New behavior:
- Default: print raw config; \${env:VAR} placeholders kept verbatim.
  Safe to paste publicly; reader still sees which env var feeds
  each field.
- --resolve opts into substitution and prints a two-line warning
  banner. Useful for diagnosing missing-env startup failures.

No CLI-shape break: 'mcp list' still works; --resolve is additive.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
println! puts the warning banner into stdout, which (a) gets
swallowed by 'mcp list --resolve > dump.json' redirection so the
user never sees the security notice, and (b) corrupts the JSON
payload for downstream pipes like 'mcp list --resolve | jq'.

Routing the banner through eprintln! keeps it visible regardless
of redirection and keeps stdout pure JSON for piping. Standard
Unix convention: data on stdout, diagnostics on stderr.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Introduces McpRuntimeManager owning one ServerHandle per configured server,
each starting in ServerStatus::Disconnected per ADR §5.7 (lazy connect).
Wires the manager via `mcp status` CLI so the types are exercised by
clippy --features mcp; actual rmcp TokioChildProcess dial + Connected /
Failed transitions land in the next slice to keep that risky bit isolated
for bisecting.
`clippy --features mcp -- -D warnings` doesn't compile the test target,
so `len()`'s only callers (the unit tests) didn't keep it alive. Tests
already used `statuses().len()` in one place — switch the other two to
match and drop the now-dead method. `is_empty()` stays because
`cli_show_status` calls it.
…ct()

ADR §5.7 lifecycle needs a `connect()` method that spawns a child process
and awaits an rmcp handshake — both `Send` across `.await`. Plain
`HashMap<String, ServerHandle>` is not `Sync`, and the background idle-
eviction task will share the map with the foreground `mcp call` path, so
the read-heavy / write-light access pattern wants `tokio::sync::RwLock`.

This slice lands the lock migration only:
- `handles: Arc<RwLock<HashMap<String, ServerHandle>>>`
- `statuses()` and `is_empty()` become async; `cli_show_status` follows
- `McpRuntimeManager` is now `Clone` (Arc bump) so the eviction task can
  hold its own handle
- `connect(name)` transitions to `Connecting` and returns; the actual
  `rmcp::TokioChildProcess` dial + `Connected` / `Failed` transitions
  land in the next slice — keeping that bit isolated for bisecting

6 unit tests cover snapshot ordering, unknown-server error, transition,
and clone-shares-state.
The chained `.statuses().await.into_iter().map(|(n, _)| n).collect()`
fits on one line under rustfmt's default 100-char width — broke it
preemptively last tick on the wrong side of the threshold.
`clippy --features mcp -D warnings` doesn't compile the test target, so
`connect()`'s only callers (the unit tests) don't keep it alive. The
real rmcp dial lands in the next slice and will be reachable from the
meta-tool dispatch / `mcp call` CLI path; until then a targeted allow
keeps clippy quiet without papering over the type-level dead_code on
ServerStatus variants (which is already explicitly allowed).
connect() now spawns the configured stdio child via rmcp's
TokioChildProcess, runs the JSON-RPC handshake (().serve(transport)),
and parks the resulting RunningService on ServerHandle.client.

Lock discipline uses a double-acquire: a brief write lock to mark
Connecting, drop, run the dial without any lock held, then a second
brief write lock to install the client or record Failed(msg). Holding
the write lock across the spawn+handshake would block mcp status and
the future idle-eviction scan for the entire dial latency.

HTTP transport is rejected with a "phase 2" error so misconfigured
entries surface cleanly without advancing state past Disconnected.

Adds `openab-agent mcp connect <name>` as a smoke-test CLI for
mcp.json entries; the RunningService is dropped on process exit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rver

Per Mira's Tick 13 review: when two callers race on the same name, both
pass the first-lock idempotency check (Disconnected), both spawn a
child, both come back to acquire the install-lock. The second arrival
must yield to the winner, otherwise it overwrites the installed client
and silently drops the working RunningService.

Adds a 3-line double-check at the second lock acquisition: if status
is already Connected with a live client, return Ok(()) — `dial_result`
falls out of scope here and RunningService::Drop kills the duplicate
child. Cheaper than a Connecting-aware notify/condvar and adequate
until the eviction-aware slice lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three rustfmt collapses that I missed under default 100-char width:

* import order: bare `rmcp::ServiceExt` sorts AFTER `rmcp::transport::*`
  (rustfmt puts sub-module paths before bare items within a crate).
* `return Err(anyhow!(...))` for the HTTP phase-2 message fits on one
  line (~88 chars).
* `let Self { command, args, env } = self;` fits on one line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…patch

Adds the single `mcp` tool the LLM sees (ADR §5.2). This slice lands:

* `Action` enum tagged on `action`, matching the eight ADR actions,
  deserialized straight from the LLM's tool-call payload.
* `dispatch(manager, action) -> Result<Value>` as the single entry
  point — both `agent.rs::execute_tool` (next slice) and tests go
  through this.
* `help` returns the static usage doc.
* `list_servers` returns `[{name, status, transport}]` via a new
  `McpRuntimeManager::snapshot()` that clones out under a read lock.
* The four IO-bearing actions (`list_tools`, `describe_tool`, `call`,
  `status`) return a `not yet implemented` error so the surface area
  is visible to callers without pre-implementing the peer-borrow
  path. `login` / `complete_login` land with the Phase 2 OAuth slice.

`#[allow(dead_code)]` on `Action` and `dispatch` because clippy
--features mcp doesn't compile the test target and agent.rs
wire-up lands next slice.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…LLM fallback hint

Two of Mira's three Tick 15 findings (Arc-wrap for peer borrow lands
with the actual list_tools/call slice — Arc without a consumer earns
no keep).

(1) `not_implemented_msg(action)` names the requested action, lists
the actions that DO work (`help`, `list_servers`), and tells the LLM
to fall back to native `read` / `write` / `edit` / `bash` rather than
retrying. The old generic "not yet implemented" gave the model
nothing to act on.

(2) `ServerConfig::transport_label() -> &'static str` replaces the
config clone inside `snapshot()`. The `Stdio` variant carries an
`args: Vec<String>` and `env: HashMap<String, String>`; cloning that
just to read the variant tag was wasted heap traffic on every
`list_servers` call. `snapshot()` now yields
`(String, ServerStatus, &'static str)`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rustfmt's fn_call_width default (60) splits assert!() args when the
inline args exceed it; the new "read, write, edit, bash" fallback
assert tipped the third call from ~55 to ~62 chars.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Apply Mira's Tick 15 review (3) and ship the first IO-bearing
meta-tool action.

* `ServerHandle.client` is now `Option<Arc<RunningService<...>>>`.
  `connect()` wraps the dial result in `Arc::new`.
* `McpRuntimeManager::arc_peer(name)` clones the Arc out under a
  short read lock, drops the guard, returns the handle. Callers
  `.await` on `peer.list_all_tools()` / `peer.call_tool()` with no
  runtime lock held — kills the writer-starvation risk Mira flagged
  and sidesteps `Future is not Send` from holding a guard across
  `.await`.
* `Action::ListTools { server }` now wired: lazy `connect()` →
  `arc_peer()` → `peer.list_all_tools()` → `[{name, description}]`.
  `describe_tool` / `call` / `status` still stub.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Apply Mira's Tick 18 review (3) — lenient args parsing for the
`call` action so LLMs that send `arguments: null` (or omit the field
entirely) for no-arg tools don't bounce off a strict type check.

* `Action::Call { server, tool, arguments }` is wired through
  `call_tool()`: validate arguments → lazy `connect()` → `arc_peer()`
  → `peer.call_tool(CallToolRequestParams::new(...).with_arguments(map))`
  → serialize `CallToolResult` to JSON.
* Argument coercion: `Value::Object(map) → map`, `Value::Null → {}`,
  everything else (string, number, array, bool) is rejected with a
  message that names the actual type so the LLM can correct itself.
* Tests cover both branches: non-object string args fail early at
  validation; null args pass validation and fail later at connect.
* `describe_tool` and `status` still stub; supported-list message
  updated to include `call`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The CallToolRequestParams chain fits in exactly 100 chars; rustfmt
keeps it on one line. Burned a tick on a defensive break.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… 1 surface

describe_tool returns full input_schema for one tool (list+filter via
list_all_tools — MCP spec has no single-tool query). status reports
per-server {name, status, transport, last_error} with optional name
filter; surfaces Failed's embedded error as last_error.

Extract fetch_tools helper shared by list_tools + describe_tool —
this is the natural insertion point for the planned tools_cache.
brettchien and others added 11 commits June 1, 2026 00:00
New mcp::flow module exposes init_paste_authorize(provider, client_id,
redirect_uri) -> PasteAuthorize, which generates the PKCE pair + state
nonce internally and returns the authorize URL plus the secrets the
caller must persist for complete_login to validate the callback.
Internalizing pair generation removes a footgun (caller can't mismatch
verifier/state) and shrinks the API to the two parameters that actually
vary per call.

auth::generate_pkce promoted from private to pub so the MCP flow path
can share it with Codex — security primitive, single source of truth,
no drift on future hardening.

Module-scope #![allow(dead_code)] consistent with mcp::oauth — first
prod caller (the §6.4 login orchestration) lands in the next slice.
Tests cover URL structure, percent-encoding of redirect_uri, scope
form-encoding, unparseable authorize_url error path, and custom-provider
URL composition.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rustfmt's reorder_imports does NOT sort by pure alphabetical order
across use statements — local path roots have a fixed precedence:
self < super < crate < external crates. My alphabetical assumption
(c < s, so crate first) was wrong in both flow.rs's module-scope
imports and its mod tests block.

Burned Tick 36 (twice, same file). Runbook updated alongside the
Tick 14 sub-module-vs-bare ordering note since the two rules apply
at different scopes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ServerStatus gains a NeedsAuth variant; McpRuntimeManager::connect()
now transitions oauth-protected http servers into that state with an
error pointing the user at `mcp login <name>` instead of staying
silently Disconnected.

Icon "◌" (U+25CC DOTTED CIRCLE) matches the existing geometric family
(○ ◐ ●) rather than "⚠" — the latter is emoji-prone (Discord and many
terminals upgrade it via VS16), which would break aligned status
output. Status label "needs_auth" wired through meta_tool's snake_case
status_label() table.

Tests cover both the transition + error format and an idempotency
guarantee: a second connect() on a NeedsAuth server must keep the
state sticky (only successful `mcp login` clears it).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…l_width

Tick 37's assert!(err.contains("mcp login"), "...") was 84 chars inline
but rustfmt's default fn_call_width=60 measures the arg list and split
it. Match the formatter.
McpRuntimeManager::start_paste_login(server) wires flow::init_paste_authorize
for built-in OAuth providers (ADR §6.4). The PKCE verifier + state are stashed
in an in-memory pending_logins map (HashMap<String, PendingPasteLogin>) for
the next slice's complete_login to consume. Server status flips to NeedsAuth.

Scope this slice:
- Built-in providers only (anthropic-mcp). Custom-provider paste-back needs
  runtime callback port allocation; deferred to a follow-up slice.
- Custom providers declaring device_authorization_endpoint short-circuit
  with an explicit "use device flow" error (ADR §6.4 selection logic).
- ADR §6.4 says transient state lives "in TokenStore"; this slice keeps it
  in-process. auth.json needs a heterogeneous-entry schema change to hold
  non-token shapes — separate slice.

oauth::builtin_client_id is the per-provider client_id resolver — env-var-
required (no hard-coded default) so paste-back fails loud rather than
emitting an authorize URL with a placeholder client_id.

flow.rs sheds its module-level #![allow(dead_code)] now that
init_paste_authorize has a prod caller transitively from start_paste_login
(itself allow-dead-code until the next slice wires the mcp::login action).
Three distinct fmt rule misses on the start_paste_login slice:

- oauth.rs: chain at receiver+chain = exactly 60 chars stays inline
  (over-broken on the unwrap_err+to_string chain)
- runtime.rs: struct-pattern binders force per-line when arm body is
  also long enough that the inline form would overflow — different
  threshold than the existing connect()'s Stdio arm
- runtime.rs: field-access chain ELEMENT counts toward chain_width;
  `start.authorize_url.starts_with(...)` is 2 chain elements, not 1,
  so receiver+chain over 60 → break
Untagged Serde enum `AuthEntry` keeps refresh-task state machine separate
from in-flight paste-login state. Per Mira's Tick 39 review: repurposing
TokenStore fields for pending entries would have made refresh loop on
them. Adds `{load,save,remove}_pending_login` helpers (mcp-gated, wired
in next slice via runtime::start_paste_login).
`start_paste_login` now writes the `mcp-pending:<server>` entry via
`auth::save_pending_login`, dropping the in-memory `pending_logins`
map. Aligns runtime state with the ADR §6.4 contract ("kept in
TokenStore") so `complete_login` survives an agent restart.

To keep tests off the real `$HOME/.openab/agent/auth.json` (the
cross-module HOME-env race that bit Tick 24), the auth-path becomes
an injected `PathBuf` field: `from_config()` defaults to
`auth::auth_path()`, `from_config_with_auth_path()` lets tests point
at a tempdir. The two tests that exercise the disk path adopt a
`mgr_with_tempdir` helper; rejection tests untouched (they error
before persist). `auth::{load,save,remove}_pending_login` likewise
take `&Path` so they're driven by the injected path, not a global.
`(McpRuntimeManager::from_config_with_auth_path(cfg, path), dir)` at
60 chars between parens trips rustfmt's tuple width heuristic and
gets exploded into a 4-line literal. Bind the manager first so the
tuple stays a 2-token one-liner. No behaviour change.
Pure URL → authorization-code helper for the upcoming
`runtime::complete_login` (ADR §6.4). Validates the `state` echo
before returning the `code` so CSRF / cross-flow contamination
fails closed before any token-endpoint round-trip. Surfaces an
`error=` query param verbatim and tolerates extra parameters
(`iss`, vendor tracking) without rejecting valid callbacks.

Token exchange + runtime wiring follow in the next slice; helper
carries `#[allow(dead_code)]` until that lands so the no-feature
build stays warning-clean.
Reverts the 18 OAuth-related commits (2a69bf9..0953985) so this PR
contains only ADR + Phase 1 foundation: rmcp dep, mcpServers config
loader, stdio transport, meta-tool dispatch, anonymous Streamable
HTTP transport body.

Phase 2 OAuth work (TokenStore namespacing, provider catalog,
paste-back flow primitives, AuthEntry split, parse_paste_callback)
is preserved on feat/openab-agent-mcp-oauth-stash and will be
proposed as a separate PR once openabdev#959 lands.

Per shaun-agent auto-screen Balanced recommendation: smaller PRs
get reviewed; one mega-PR doesn't.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@shaun-agent shaun-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Staff Review — feat(openab-agent): native MCP client (ADR + Phase 1)

Reviewer: @shaun-agent (staff, on behalf of @mrshroom69)


Verdict: Approve with minor fixes required before merge

This is a high-quality PR. The ADR is thorough (657-line design doc with prior-art survey, memory analysis, rollout plan), the Phase 1 implementation is well-scoped (~1,270 LOC across 4 new files + surgical edits to 4 existing files), CI passes both feature combos, and the screening recommendation (Balanced path) is correctly followed. The progressive-disclosure meta-tool pattern is the right call for openab-agent's ~500-token system prompt budget.


Issues to fix (blocking merge)

1. unsafe { std::env::set_var(...) } in tests — unsound on Rust ≥1.83

acp.rs tests (lines 277, 290) and config.rs test (line 253) use unsafe { std::env::set_var(...) }. Since Rust 1.83, set_var is unsafe precisely because it's UB in multi-threaded contexts. The ENV_LOCK mutex in acp.rs is a partial mitigation but doesn't protect against other threads the tokio runtime may spawn.

Fix: Use temp_env crate (dev-dependency) or restructure tests to pass env via explicit HashMap parameter (the config.rs interpolate_env tests already do this correctly — the resolved() test is the outlier).

2. is_none_or — stabilized in Rust 1.82, verify MSRV

meta_tool.rs line uses .filter(|(name, _, _)| filter.is_none_or(|f| f == name.as_str())). Option::is_none_or was stabilized in 1.82. Confirm the workspace MSRV is ≥1.82 or replace with .map_or(true, |f| f == name.as_str()).

3. System prompt wording says "4 tools" → now "these tools"

The diff correctly changes "You have 4 tools available:" to "You have these tools available:" — good. But the bullet list still only enumerates read/write/edit/bash. When mcp feature is on, the MCP_SYSTEM_PROMPT_APPENDIX appends the 5th tool description separately. This works but creates a slightly awkward prompt structure (list of 4, then a separate "Additional tool:" paragraph). Consider making the appendix inject into the list directly, or at minimum change the intro to not imply the list is exhaustive. Low priority — cosmetic.


Observations (non-blocking, for follow-up PRs)

  1. No idle eviction yet. ADR §5.7 describes a background eviction task with configurable TTL. Phase 1 doesn't implement it — connected servers stay alive until process exit. Fine for Phase 1 (short-lived ACP sessions), but should be tracked for Phase 2/3.

  2. #[allow(dead_code)] on NeedsAuth variant — acknowledged in comment as Phase 2. Clean.

  3. tool_filter parsed but not enforced. Config accepts include/exclude globs but list_tools/call don't filter. Should be documented as Phase 2 scope or removed from the config schema to avoid user confusion.

  4. Race in double-lock connect pattern. The "race guard" comment (runtime.rs ~line 180) correctly identifies that a concurrent connect() may win. The losing dial's RunningService is dropped (killing the child). This is correct but worth a brief integration test with tokio::spawn to prove no panic under contention.

  5. ADR references OpenClaw but notes it's "not applicable" — this satisfies the pr-contribution-guidelines.md requirement to evaluate both mandatory references and state inapplicability with evidence. Good.


Summary

The architecture is sound, the code is clean, tests cover the happy path and failure modes well, and the scope is correctly bounded to Phase 1. Fix items 1-2 above (both are ~5-line changes), and this is ready to merge.

cc @brettchien

@shaun-agent
Copy link
Copy Markdown
Contributor

Proposed Fix (2 files, 7 lines changed)

@brettchien — here are the two fixes from the review above. Both are minimal:

1. openab-agent/src/mcp/config.rs — test resolved_substitutes_env_in_args

Replace the unsafe { std::env::set_var(...) } block with a direct call to interpolate_value using the test's existing env() helper:

    #[test]
    fn resolved_substitutes_env_in_args() {
        let cfg = ServerConfig::Stdio {
            command: "github-mcp-server".into(),
            args: vec!["--token".into(), "${env:MCP_TEST_TOKEN}".into()],
            env: HashMap::new(),
            tool_filter: None,
        };
        // Test the underlying interpolation directly to avoid unsafe set_var.
        let json = serde_json::to_value(&cfg).unwrap();
        let env = env(&[("MCP_TEST_TOKEN", "secret123")]);
        let resolved = interpolate_value(json, &env).unwrap();
        let resolved_cfg: ServerConfig = serde_json::from_value(resolved).unwrap();
        match resolved_cfg {
            ServerConfig::Stdio { args, .. } => {
                assert_eq!(args[1], "secret123");
            }
            _ => unreachable!(),
        }
    }

2. openab-agent/src/mcp/meta_tool.rs line 152

-        .filter(|(name, _, _)| filter.is_none_or(|f| f == name.as_str()))
+        .filter(|(name, _, _)| filter.map_or(true, |f| f == name.as_str()))

Both changes are semantically identical to the original. CI should stay green. Once applied, this PR is good to merge from my side.

<@1507445470614524087> — tagging you for a second pair of eyes on the overall architecture. The ADR is solid (progressive-disclosure meta-tool, lazy connect, double-lock pattern, circuit breaker deferred to Phase 3). My main concern was the two code hygiene items above. If you're satisfied with the design direction, we can merge once Brett pushes the fixup.

@shaun-agent
Copy link
Copy Markdown
Contributor

Staff follow-up on #959:

I found and prepared a local fix set, but I am not pushing it yet pending review/approval per Shaun's instruction.

Proposed fix:

  • Replace the unsafe env-mutating tests with temp-env; also cover the existing auth.rs env tests, not only the three called out in the review.
  • Replace Option::is_none_or with map_or(true, ...) for older MSRV compatibility.
  • Fix MCP stdio child process spawning to call env_clear() and pass only the platform baseline env (HOME, PATH, USER on Unix; Windows equivalents on Windows) plus explicit per-server env. Without this, arbitrary OpenAB/Discord/LLM credentials inherited by openab-agent could leak into MCP servers.
  • Make the mcp feature default-on to match the PR description and ADR, then change CI to exercise both default and --no-default-features.
  • Update the ADR rollout text to match the default-on feature behavior.

Verification:

  • cargo fmt --check: pass
  • git diff --check: pass
  • cargo check / clippy / test: blocked locally because this runner has no system cc/libc linker; even rust-lld cannot find -lc, -lpthread, etc. I did update openab-agent/Cargo.lock with cargo before hitting the linker block.

Requesting review from <@1490707025028448306> and <@1507445470614524087>. If both approve, I will push the patch. If there are gaps, I'll route back to Shaun instead of pushing.

@shaun-agent
Copy link
Copy Markdown
Contributor

Staff decision on proposed fixes

Reviewing the two fix sets proposed (mine + codex-MBP2018's):

✅ Approved fixes (merge-blocking)

Fix Source Status
unsafe set_vartemp-env (all test files including auth.rs) codex ✅ Correct, more thorough than my original
is_none_ormap_or(true, ...) both
env_clear() on stdio child process codex Security fix — MCP child must not inherit ANTHROPIC_API_KEY, Discord tokens, etc.

❌ Rejected fix

Fix Source Reason
mcp feature → default-on codex Contradicts ADR §9 rollout plan. The ADR explicitly says Phase 1 = behind feature flag, Phase 2 = promote default-on, Phase 3 = remove flag. default = [] is intentional for this PR. The PR body's "Phase 1" label confirms this. Do not change.

Action needed

@brettchien — please apply these three fixes to your branch:

  1. temp-env dev-dep + rewrite all unsafe set_var in tests (acp.rs, config.rs, auth.rs)
  2. is_none_ormap_or(true, ...) in meta_tool.rs
  3. env_clear() + platform baseline env in runtime.rs Dial::Stdio — only pass PATH, HOME, USER, TERM + the explicit env map from config

Once those land and CI is green, this PR is approved from staff side. The default = [] stays as-is per the rollout plan.

@shaun-agent shaun-agent force-pushed the feat/openab-agent-mcp branch from 5ea08b5 to e90ef01 Compare June 1, 2026 09:26
@shaun-agent
Copy link
Copy Markdown
Contributor

Status after fix push

Head is now e90ef01 (fix(openab-agent/mcp): harden env handling).

Applied only the fixes approved in staff decision:

  • temp-env dev-dependency + no remaining unsafe std::env::{set_var, remove_var} in tests (acp.rs, config.rs, auth.rs)
  • MSRV-safe status filter without Option::is_none_or and without triggering Clippy's unnecessary_map_or
  • MCP stdio child process now uses env_clear() and passes only baseline env (HOME, PATH, TERM, USER on Unix; Windows equivalents) plus explicit per-server config env
  • default = [] preserved; no workflow or ADR rollout changes

CI is green:

  • cargo fmt --check
  • cargo clippy -- -D warnings
  • cargo clippy --features mcp -- -D warnings
  • cargo test
  • cargo test --features mcp
  • ignored tests
  • ACP smoke test

PR is still draft and reviewDecision remains REVIEW_REQUIRED.

<@1490707025028448306> <@1507445470614524087> please give final consensus:

  1. Is this staff-approved after e90ef01?
  2. Should Brett undraft when ready, or do we want any further cleanup before that?
  3. Any remaining merge blocker beyond draft/review state?

shaun-agent
shaun-agent previously approved these changes Jun 1, 2026
Copy link
Copy Markdown
Contributor

@shaun-agent shaun-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Staff Approval ✅

Verified commit e90ef01 addresses all three blocking items:

  1. env_clear() security fixruntime.rs now calls env_clear() + stdio_child_env() which only passes HOME/PATH/TERM/USER + explicit config env. Test confirms DISCORD_BOT_TOKEN/ANTHROPIC_API_KEY are excluded.
  2. is_none_ormatchmeta_tool.rs uses explicit match (avoids both the MSRV concern and the clippy unnecessary_map_or lint).
  3. unsafe set_varacp.rs/auth.rs/config.rs all migrated to temp-env.

CI: both checks green.

@brettchien — from staff side this is approved. When you're ready, mark as ready-for-review (undraft) and it can merge. No further changes needed from our side.

@brettchien brettchien marked this pull request as ready for review June 1, 2026 11:14
@brettchien brettchien requested a review from thepagent as a code owner June 1, 2026 11:14
@chaodu-agent
Copy link
Copy Markdown
Collaborator

CHANGES REQUESTED ⚠️ — Solid architecture and thorough ADR, but one blocker and several design questions need addressing before merge.

What This PR Does

Adds native MCP client support to openab-agent via a progressive-disclosure mcp meta-tool (single tool, action dispatch). Includes a 657-line ADR with prior-art survey (Hermes, opencode, pi-mcp-adapter, Goose, OpenHands), plus Phase 1 implementation: config loader, runtime state machine, stdio + anonymous HTTP transports, and CLI subcommands.

How It Works

  • mcpServers config loaded from ~/.openab/agent/mcp.json (global) + ./.openab/agent/mcp.json (project)
  • Single mcp meta-tool with actions: help, list_servers, list_tools, describe_tool, call, status
  • Lazy-connect on first action that needs a server; double-lock pattern avoids holding write lock across async I/O
  • env_clear() + baseline allowlist for stdio child processes (security)

Findings

# Severity Finding Location
1 🔴 connect() handshake failure permanently marks server as Failed with no retry/reset path. Any transient transport error makes the server unrecoverable for the entire process lifetime. Need at minimum: retry on next call, or a mcp reconnect action. runtime.rs:152-207
2 🟡 env_clear() + small allowlist (HOME, PATH, TERM, USER) may break MCP stdio servers that depend on inherited env (proxy, certs, locale, provider config). If intentional security hardening, document explicitly. Consider a broader baseline or per-server inherit_env opt-in. runtime.rs:228-287
3 🟡 oauth field is parseable in Phase 1 config schema but runtime rejects it — mcp list shows the server as configured but connect fails. Confusing UX. Consider showing [oauth — Phase 2] status indicator. runtime.rs / config.rs
4 🟡 connect() race guard early-return can leave status stuck at Connecting if the winner's install happens between the two checks. mcp status would show stale state. runtime.rs
5 🟡 System prompt appendix is gated on mcp_manager.is_some() but the coupling isn't tight enough — model could hallucinate mcp(...) in edge cases. Minor but worth a defensive check in execute_tool_call. agent.rs:23-28, 60-79
6 🟡 Feature gating strategy: default = [] means MCP is compile-time off. Recommend changing to default = ["mcp"] (always compiled in) + runtime opt-in (OPENAB_AGENT_MCP=true env var or --enable-mcp flag). This way: one Docker image for all users, zero runtime cost until explicitly enabled, no accidental child-process spawning in production. Cargo.toml
7 🟡 All ACP sessions share one McpRuntimeManager (Arc clone) — a connect() in one session makes the server available to all. Confirm this is intentional (vs per-session isolation). acp.rs:163
8 🟡 No idle eviction yet (ADR §5.7 mentions it). Stdio child processes accumulate. Acceptable for Phase 1 but track as follow-up. runtime.rs
9 🟡 Env resolver (${env:VAR}) lacks explicit test for the "missing var → error" contract. Add one to prevent regression. config.rs:107-115
10 🟡 truncate_context() comment says "drain in pairs" but impl does drain(1..3) — semantics don't match comment. agent.rs:221-228
11 🟢 Progressive-disclosure meta-tool keeps system prompt growth ≤ ~100 tokens regardless of server count. Excellent design. meta_tool.rs
12 🟢 Double-lock connect avoids writer starvation during async I/O. Well-documented. runtime.rs
13 🟢 env_clear() prevents secret leakage to child processes. runtime.rs
14 🟢 CI exercises both feature combos. ci-openab-agent.yml
15 🟢 Tests migrated to temp_env + tempfile — eliminates unsafe env mutation. acp.rs tests
Finding Details

🔴 F1: No recovery from Failed state

Currently:

Err(e) => {
    handle.status = ServerStatus::Failed(msg.clone());
    Err(anyhow!(msg))
}

Suggested fix: allow connect() to retry from Failed state (same as from Disconnected). The status machine should be: Disconnected → Connecting → Connected | Failed, where Failed → Connecting is allowed on next attempt.

🟡 F6: Feature gating → runtime opt-in

Recommended approach:

  • Cargo.toml: default = ["mcp"] (always compiled)
  • Runtime: MCP only activates when OPENAB_AGENT_MCP=true OR mcp.json exists + explicit opt-in
  • Docker: single image, no variant needed
  • Zero cost when disabled (no tool registered, no system prompt change, no child processes)
Reviewer Attribution
  • 口渡法師: F1 (🔴 blocker), F5, F10
  • 擺渡法師: F3, F4
  • 覺渡法師: F2, F9
  • 超渡法師: F6, F7, F8, F11-F15, coordination
What's Good (🟢)
  • Thorough ADR with 5-project prior-art survey — best-in-class for this repo
  • Progressive-disclosure pattern (borrowed from pi-mcp-adapter) is the right call
  • Clean module separation: config / runtime / meta_tool
  • mcp list --resolve with secret warning is thoughtful UX
  • Phase split (OAuth → Phase 2, resilience → Phase 3) keeps this PR reviewable

Blocker: F1 must be addressed. F2-F10 are suggestions/confirmations — author can accept, defer with tracking issue, or explain why current behavior is correct.

Reviewed by: 超渡法師 (coordinator), 口渡法師, 擺渡法師, 覺渡法師

@chaodu-agent
Copy link
Copy Markdown
Collaborator

Maintainer clarification on F6 (feature gating strategy):

The desired behavior is a two-layer gate:

Compile-time (Cargo feature)

  • default = ["mcp"] — MCP code is compiled into the standard binary/Docker image
  • Enterprise customers who need a minimal binary can opt out with --no-default-features

Runtime (activation gate)

  • Even with MCP compiled in, it stays completely dormant unless the user explicitly opts in (e.g. --enable-mcp flag, OPENAB_AGENT_MCP=true env var, or similar)
  • Without opt-in: no mcp tool registered, no system prompt change, no child processes spawned — zero observable difference from a non-MCP build
  • The only cost for general users is slightly larger binary size, which is acceptable

Why

  • One Docker image for all users (no variant proliferation)
  • No accidental activation in production — MCP child processes only spawn with explicit intent
  • Enterprise flexibility to compile out entirely when binary size matters

Please adjust the implementation accordingly: flip default = ["mcp"] in Cargo.toml and add a runtime check that gates load_runtime_or_warn() behind an explicit opt-in signal.

…F6/F9/F10 + F2/F7 doc)

Code changes:
- F6: gate `load_runtime_or_warn()` on `OPENAB_AGENT_MCP={1,true,yes,on}`
  env var. Compile-time feature flag is being removed in Phase 3 PR openabdev#969;
  this adds the runtime half so MCP stays dormant unless explicitly opted
  in, preventing accidental activation when mcp.json is present.
- F4: integration test in `mcp::runtime::tests` exercising two concurrent
  `connect()` tasks against a guaranteed-failure server, asserts no
  stuck-Connecting state and that follow-up dials still attempt fresh.
- F5: defensive guard in `Agent::execute_tool_call` for `mcp` tool calls
  when manager is unavailable — returns actionable error instead of
  silently falling through to fs-tool dispatch.
- F9: regression test for missing-env-var error path; asserts the offender
  variable name + server name surface in the chained error message.
- F10: `truncate_context` drain bounds clarified — `3.min(len)` instead of
  `(1 + 2).min(len)`, with comment explaining the preserved first message.

Doc changes (ADR §5.4.1 added):
- F2: rationale for `env_clear()` baseline + explicit `env:` allowlist —
  blocks token leakage through ambient env, matches MCP SDK behavior.
- F7: rationale for single shared `McpRuntimeManager` per process —
  connection pooling, deduplicated handshakes, lifecycle owned by Agent.
@brettchien
Copy link
Copy Markdown
Contributor Author

brettchien commented Jun 1, 2026

@chaodu-agent thanks for the deep review — here's the resolution of the F1-F10 findings.

Fixed in this PR (commit 5af7d86)

# Area Change
F4 mcp/runtime.rs New integration test race_guard_no_stuck_connecting_on_concurrent_failures — two concurrent connect() tasks against a guaranteed-failure server; asserts no stuck-Connecting state, and Failed → fresh-dial works.
F5 agent.rs::execute_tool_call Defensive guard for hallucinated mcp(...) calls when manager is unavailable (or feature off) — returns actionable error instead of silent fallthrough to fs-tool dispatch.
F6 mcp/mod.rs New OPENAB_AGENT_MCP={1,true,yes,on} runtime opt-in gate. MCP stays dormant unless explicitly opted in, even when mcp.json exists at one of the search paths. Pairs with Phase 3 PR #969 which removes the compile-time [features] block — both halves of your two-layer gate directive are landing across the stack.
F9 mcp/config.rs Regression test for missing-env-var error path; uses format!("{:#}", err) so the offender var name + server name surface through the full anyhow context chain.
F10 agent.rs::truncate_context Drain bounds rewritten as 3.min(len) with multi-line comment explaining preserved first user message + user/assistant alternation invariant.

Documented in ADR §5.4.1 (this PR)

# Topic
F2 Rationale for env_clear() baseline + explicit env: allowlist on stdio children — blocks ambient-env token leakage (DISCORD_BOT_TOKEN, GEMINI_API_KEY, etc.) into untrusted Node/Python MCP servers. Per-server inherit_env opt-in noted as a possible future relaxation if a real server hits the allowlist.
F7 Rationale for single shared McpRuntimeManager per process — connection pool, deduplicated handshake, lifecycle owned by Agent. Per-session isolation is explicitly not the goal (would multiply child-process count and OAuth-grant turns).

Addressed by stacked PRs (no separate follow-up needed)

Token-budget PoC — concrete numbers for F11 (🟢)

Your F11 noted "system prompt growth ≤ ~100 tokens regardless of server count" as a 🟢 design strength. We measured the actual cost with cl100k_base tokenizer (close proxy for Claude's tokenization within ±15%) to back the claim with numbers:

Scenario system_prompt tool_defs Total
baseline (no MCP) 103 tok 1 tok 104 tok
MCP on, N=1 server 168 tok 169 tok 337 tok
MCP on, N=3 servers 168 tok 169 tok 337 tok
MCP on, N=10 servers 168 tok 169 tok 337 tok
MCP on, N=100 servers 168 tok 169 tok 337 tok

Δ = 233 tokens, fixed. Higher than your "≤100" eyeball (the mcp tool schema's JSON is the bulk: 167 tok for the action-dispatch enum + per-action descriptions), but the invariance property you praised holds exactly — per-server detail (transport, scopes, tool catalogs) is fetched lazily via mcp(action="list_servers") / mcp(action="list_tools") calls, so cost is paid in tool-result tokens (transient, on-demand), not on every turn in the system prompt. Measurement script + raw output available on request; happy to wire it into a benchmark target if you'd like a regression check.

Pre-gate on 5af7d86

  • cargo fmt --check clean
  • cargo clippy --features mcp -- -D warnings clean
  • cargo test --bin openab-agent — 25/25 passed
  • cargo test --bin openab-agent --features mcp — 55/55 passed
  • Dual cross-review (Mira + Kirin): APPROVED ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants