Skip to content

feat(openab-agent): MCP Phase 2 — OAuth 2.1 paste-back (stacks on #959)#962

Open
brettchien wants to merge 94 commits into
openabdev:mainfrom
brettchien:feat/openab-agent-mcp-oauth
Open

feat(openab-agent): MCP Phase 2 — OAuth 2.1 paste-back (stacks on #959)#962
brettchien wants to merge 94 commits into
openabdev:mainfrom
brettchien:feat/openab-agent-mcp-oauth

Conversation

@brettchien
Copy link
Copy Markdown
Contributor

@brettchien brettchien commented Jun 1, 2026

Discord Discussion URL: https://discord.com/channels/1490643539682529300/1510657283552706663/1510690101120995369


📚 PR stack — merge in this order:

  1. feat(openab-agent): native MCP client (ADR + implementation) #959 — Phase 1: Foundation (ADR + rmcp + stdio + meta-tool)
  2. feat(openab-agent): MCP Phase 2 — OAuth 2.1 paste-back (stacks on #959) #962 (this PR) — Phase 2: OAuth 2.1 paste-back ← stacks on feat(openab-agent): native MCP client (ADR + implementation) #959, merge second
  3. feat(openab-agent): MCP Phase 3 — circuit breaker + flag removal (stacks on #959 + #962) #969 — Phase 3: Resilience (circuit breaker + mcp doctor, stacks on feat(openab-agent): MCP Phase 2 — OAuth 2.1 paste-back (stacks on #959) #962)

Once #959 merges, this branch rebases and the diff collapses to only the 18 OAuth-specific commits. Reviewers can take all three in parallel; merges must serialize.


Summary

Phase 2 of the openab-agent native MCP client (ADR docs/adr/openab-agent-mcp.md §6). Adds OAuth 2.1 paste-back authentication for Streamable HTTP servers.

OAuth commits

commit content ADR ref
2a69bf9 Namespaced TokenStore + fsync §6.1
4c77595 Provider catalog (anthropic / openai / xai) §6.2
da86d5b OAuthConfig fields + discovery boot check §6.3
65f1c4d Custom provider resolution §6.3
04b84f9 ResolvedProvider as enum {Builtin, Custom} §6.3
2400144 Paste-back flow primitives (PKCE S256, state nonce) §6.4
95f33bc NeedsAuth runtime state for protected HTTP servers §6.4
af3a25d start_paste_login + builtin client_id resolver §6.4
7dd718c Split auth.json into TokenStore | PendingPasteLogin §6.4
79b1e2e Persist pending paste-login to auth.json §6.4
bc006e2 parse_paste_callback URL helper (state validation + CSRF) §6.4

Plus fmt fixups: 06acde6, 7ea3e59, f61b498, 7f4ef17, 32232c0, ca91f1d, 111f43a.

Scope

Test plan

  • mcp login <server> happy path against vendor-hosted MCP (Linear, etc.) in headless container
  • State mismatch rejected (RFC 6749 §10.12)
  • PKCE verifier round-trip
  • Refresh-token rotation race: agent fsync + auth.json mtime stable
  • mcp list redacts client_secret (regression on Phase 1 secret-redaction)
  • Both cargo feature combos green

🤖 Generated with Claude Code

brettchien and others added 30 commits May 31, 2026 15:30
Adds ADR for in-core rmcp + progressive-disclosure meta-tool, deferred
to symmetry with the Skills extension pattern from PR openabdev#955. Memory
analysis rules out the sidecar alternative; per-session config refresh
replaces file-watcher hot reload to drop ~150 LOC of race-condition
hotspot.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ate guardrails

- §6.4 flow selection: device-code preferred (matches existing CLI convention),
  paste-back universal fallback, browser laptop opt-in
- §5.2 add login/complete_login meta-tool actions; login returns flow-tagged union
- §6.1 spell out TokenStore persistence assumption + cold-start refresh
- §6.3 device_authorization_endpoint extension point + RFC 8414 discovery
- §3.8 stdio container-image caveat (interpreter required for Node/Python)
- §5.7 max_concurrent_servers knob (default 10, see §7 for constrained tuning)
- §7 Fargate 512MB/1GB OOM analysis + mitigations

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ntract

- §6.4 RFC 8414 discovery disabled by default; opt-in requires
  oauth.discovery=true + oauth.discovery_allowlist (boot rejects otherwise).
  Rationale: awsvpc egress + SSRF surface in multi-tenant deployments.
- §6.1 add RTR race warning: async persistence layers (Fargate S3 sync,
  eventually-consistent volumes) must flush new tokens to durable storage
  before Spot interruption, else cascade-revoke locks the user out.
  Contract: fsync(2) agent-side + mtime-event-driven sync deployment-side.
- §6.3 expose oauth.discovery / oauth.discovery_allowlist on custom providers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 1 foundation slice per ADR §5.4 + §5.6:

- Add optional `rmcp 1.7` dep + `mcp` feature flag (default off)
- Wire `#[cfg(feature = "mcp")] mod mcp;` into main
- New `mcp/config.rs`: `McpConfig` / `ServerConfig` (Stdio | Http) /
  `ToolFilter` / `OAuthConfig`, global+project layered load, project
  precedence on name collision, `${env:VAR}` interpolation with
  per-server error context

OAuth fields limited to `provider` + `scopes`; custom-provider
endpoints (§6.3) deferred to Phase 2 auth slice to avoid dead schema.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI fmt-check found 8 formatting deltas in the Phase 1 scaffold. No
logic change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI previously only built the default feature set, leaving the
Phase 1 MCP scaffold (mcp/config.rs, gated by --features mcp)
without compile, clippy, or test coverage. Adds explicit
--features mcp invocations and watches workflow file changes
so this gap is closed for the rest of the rollout (ADR §9).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires up the Phase 1 config loader so clippy --features mcp can
see it. McpConfig::load() + ServerConfig::resolved() + serde
pretty-print are all reachable from main, clearing the dead-code
denial of compile under -D warnings.

Output groups successful servers (✓) and failures (✗ with reason),
sorted by name for deterministic display.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sort_by_key over (name, _) is the cleaner form; no behavior
change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per Mira's security review on the Phase 1 thread: the previous
'mcp list' eagerly called ServerConfig::resolved(), substituting
\${env:GITHUB_TOKEN} etc. into the printed output. Three leak
paths matter — pasting CLI output into bug reports / chat, screen
sharing, and stdout log collection.

New behavior:
- Default: print raw config; \${env:VAR} placeholders kept verbatim.
  Safe to paste publicly; reader still sees which env var feeds
  each field.
- --resolve opts into substitution and prints a two-line warning
  banner. Useful for diagnosing missing-env startup failures.

No CLI-shape break: 'mcp list' still works; --resolve is additive.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
println! puts the warning banner into stdout, which (a) gets
swallowed by 'mcp list --resolve > dump.json' redirection so the
user never sees the security notice, and (b) corrupts the JSON
payload for downstream pipes like 'mcp list --resolve | jq'.

Routing the banner through eprintln! keeps it visible regardless
of redirection and keeps stdout pure JSON for piping. Standard
Unix convention: data on stdout, diagnostics on stderr.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Introduces McpRuntimeManager owning one ServerHandle per configured server,
each starting in ServerStatus::Disconnected per ADR §5.7 (lazy connect).
Wires the manager via `mcp status` CLI so the types are exercised by
clippy --features mcp; actual rmcp TokioChildProcess dial + Connected /
Failed transitions land in the next slice to keep that risky bit isolated
for bisecting.
`clippy --features mcp -- -D warnings` doesn't compile the test target,
so `len()`'s only callers (the unit tests) didn't keep it alive. Tests
already used `statuses().len()` in one place — switch the other two to
match and drop the now-dead method. `is_empty()` stays because
`cli_show_status` calls it.
…ct()

ADR §5.7 lifecycle needs a `connect()` method that spawns a child process
and awaits an rmcp handshake — both `Send` across `.await`. Plain
`HashMap<String, ServerHandle>` is not `Sync`, and the background idle-
eviction task will share the map with the foreground `mcp call` path, so
the read-heavy / write-light access pattern wants `tokio::sync::RwLock`.

This slice lands the lock migration only:
- `handles: Arc<RwLock<HashMap<String, ServerHandle>>>`
- `statuses()` and `is_empty()` become async; `cli_show_status` follows
- `McpRuntimeManager` is now `Clone` (Arc bump) so the eviction task can
  hold its own handle
- `connect(name)` transitions to `Connecting` and returns; the actual
  `rmcp::TokioChildProcess` dial + `Connected` / `Failed` transitions
  land in the next slice — keeping that bit isolated for bisecting

6 unit tests cover snapshot ordering, unknown-server error, transition,
and clone-shares-state.
The chained `.statuses().await.into_iter().map(|(n, _)| n).collect()`
fits on one line under rustfmt's default 100-char width — broke it
preemptively last tick on the wrong side of the threshold.
`clippy --features mcp -D warnings` doesn't compile the test target, so
`connect()`'s only callers (the unit tests) don't keep it alive. The
real rmcp dial lands in the next slice and will be reachable from the
meta-tool dispatch / `mcp call` CLI path; until then a targeted allow
keeps clippy quiet without papering over the type-level dead_code on
ServerStatus variants (which is already explicitly allowed).
connect() now spawns the configured stdio child via rmcp's
TokioChildProcess, runs the JSON-RPC handshake (().serve(transport)),
and parks the resulting RunningService on ServerHandle.client.

Lock discipline uses a double-acquire: a brief write lock to mark
Connecting, drop, run the dial without any lock held, then a second
brief write lock to install the client or record Failed(msg). Holding
the write lock across the spawn+handshake would block mcp status and
the future idle-eviction scan for the entire dial latency.

HTTP transport is rejected with a "phase 2" error so misconfigured
entries surface cleanly without advancing state past Disconnected.

Adds `openab-agent mcp connect <name>` as a smoke-test CLI for
mcp.json entries; the RunningService is dropped on process exit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rver

Per Mira's Tick 13 review: when two callers race on the same name, both
pass the first-lock idempotency check (Disconnected), both spawn a
child, both come back to acquire the install-lock. The second arrival
must yield to the winner, otherwise it overwrites the installed client
and silently drops the working RunningService.

Adds a 3-line double-check at the second lock acquisition: if status
is already Connected with a live client, return Ok(()) — `dial_result`
falls out of scope here and RunningService::Drop kills the duplicate
child. Cheaper than a Connecting-aware notify/condvar and adequate
until the eviction-aware slice lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three rustfmt collapses that I missed under default 100-char width:

* import order: bare `rmcp::ServiceExt` sorts AFTER `rmcp::transport::*`
  (rustfmt puts sub-module paths before bare items within a crate).
* `return Err(anyhow!(...))` for the HTTP phase-2 message fits on one
  line (~88 chars).
* `let Self { command, args, env } = self;` fits on one line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…patch

Adds the single `mcp` tool the LLM sees (ADR §5.2). This slice lands:

* `Action` enum tagged on `action`, matching the eight ADR actions,
  deserialized straight from the LLM's tool-call payload.
* `dispatch(manager, action) -> Result<Value>` as the single entry
  point — both `agent.rs::execute_tool` (next slice) and tests go
  through this.
* `help` returns the static usage doc.
* `list_servers` returns `[{name, status, transport}]` via a new
  `McpRuntimeManager::snapshot()` that clones out under a read lock.
* The four IO-bearing actions (`list_tools`, `describe_tool`, `call`,
  `status`) return a `not yet implemented` error so the surface area
  is visible to callers without pre-implementing the peer-borrow
  path. `login` / `complete_login` land with the Phase 2 OAuth slice.

`#[allow(dead_code)]` on `Action` and `dispatch` because clippy
--features mcp doesn't compile the test target and agent.rs
wire-up lands next slice.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…LLM fallback hint

Two of Mira's three Tick 15 findings (Arc-wrap for peer borrow lands
with the actual list_tools/call slice — Arc without a consumer earns
no keep).

(1) `not_implemented_msg(action)` names the requested action, lists
the actions that DO work (`help`, `list_servers`), and tells the LLM
to fall back to native `read` / `write` / `edit` / `bash` rather than
retrying. The old generic "not yet implemented" gave the model
nothing to act on.

(2) `ServerConfig::transport_label() -> &'static str` replaces the
config clone inside `snapshot()`. The `Stdio` variant carries an
`args: Vec<String>` and `env: HashMap<String, String>`; cloning that
just to read the variant tag was wasted heap traffic on every
`list_servers` call. `snapshot()` now yields
`(String, ServerStatus, &'static str)`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rustfmt's fn_call_width default (60) splits assert!() args when the
inline args exceed it; the new "read, write, edit, bash" fallback
assert tipped the third call from ~55 to ~62 chars.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Apply Mira's Tick 15 review (3) and ship the first IO-bearing
meta-tool action.

* `ServerHandle.client` is now `Option<Arc<RunningService<...>>>`.
  `connect()` wraps the dial result in `Arc::new`.
* `McpRuntimeManager::arc_peer(name)` clones the Arc out under a
  short read lock, drops the guard, returns the handle. Callers
  `.await` on `peer.list_all_tools()` / `peer.call_tool()` with no
  runtime lock held — kills the writer-starvation risk Mira flagged
  and sidesteps `Future is not Send` from holding a guard across
  `.await`.
* `Action::ListTools { server }` now wired: lazy `connect()` →
  `arc_peer()` → `peer.list_all_tools()` → `[{name, description}]`.
  `describe_tool` / `call` / `status` still stub.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Apply Mira's Tick 18 review (3) — lenient args parsing for the
`call` action so LLMs that send `arguments: null` (or omit the field
entirely) for no-arg tools don't bounce off a strict type check.

* `Action::Call { server, tool, arguments }` is wired through
  `call_tool()`: validate arguments → lazy `connect()` → `arc_peer()`
  → `peer.call_tool(CallToolRequestParams::new(...).with_arguments(map))`
  → serialize `CallToolResult` to JSON.
* Argument coercion: `Value::Object(map) → map`, `Value::Null → {}`,
  everything else (string, number, array, bool) is rejected with a
  message that names the actual type so the LLM can correct itself.
* Tests cover both branches: non-object string args fail early at
  validation; null args pass validation and fail later at connect.
* `describe_tool` and `status` still stub; supported-list message
  updated to include `call`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The CallToolRequestParams chain fits in exactly 100 chars; rustfmt
keeps it on one line. Burned a tick on a defensive break.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… 1 surface

describe_tool returns full input_schema for one tool (list+filter via
list_all_tools — MCP spec has no single-tool query). status reports
per-server {name, status, transport, last_error} with optional name
filter; surfaces Failed's embedded error as last_error.

Extract fetch_tools helper shared by list_tools + describe_tool —
this is the natural insertion point for the planned tools_cache.
Surfaces the `mcp` action enum to the LLM via a new ToolDef and routes
calls through McpRuntimeManager when configured. AcpServer constructs
the manager once at startup (warn-and-default on config parse failure)
and clones the cheap Arc-backed handle into each session's Agent.

Tools.rs stays stateless and feature-flag-free — the mcp routing arm
lives in Agent::execute_tool_call where it has access to the manager.
A shared MCP_TOOL_NAME const keeps mcp_tool_def() and the dispatch arm
in sync.
The `mut` on `tools` and `base` is consumed only inside `#[cfg(feature = "mcp")]`
blocks, so the no-feature build saw them as unused. Gate `allow(unused_mut)`
behind `cfg_attr(not(feature = "mcp"), ...)` so both builds stay clean.
`test_session_new` and `test_session_new_missing_key` both mutate
`ANTHROPIC_API_KEY`. cargo runs tests in parallel, so a `set` from
one thread can be observed by the other before the first reads it.
The pre-existing race was timing-tight enough to pass historically;
adding `mcp::load_runtime_or_warn()` to `AcpServer::new()` widened
the gap between set and read enough to flip the race deterministic.

Guarding both tests with a shared `Mutex<()>` serializes them without
adding a `serial_test` dep. Poisoned guard is recovered via
`into_inner()` — we don't care if a prior assertion panicked.
brettchien and others added 16 commits June 1, 2026 10:34
Mira Tick 46 catch: when a provider omits both refresh_token and
expires_in (Figma, Sentry, xAI), the prior `now + 0` fallback set
the token "already expired", causing the next connect() to enter
the refresh path, fail closed on the empty refresh_token, and
bounce the user back to NeedsAuth seconds after a successful
login.

`expires_in: None` now uses the u64::MAX sentinel — `is_expired`
returns false until the provider eventually 401s on use (at which
point `mcp login` re-runs is the correct UX for non-refreshable
tokens). Test now asserts the sentinel directly so the regression
can't sneak back in.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`openab-agent mcp login <name> [--paste URL]` drives the whole
paste-back loop: start_paste_login pins PKCE state, the CLI shows
the authorize URL, blocks on stdin (or --paste for scripted use)
for the redirect URL, then complete_login validates state +
exchanges the code + persists the TokenStore.

--paste exists for CI smoke tests and scripted setups; without it
the CLI is interactive (read_redirect_from_stdin). State-mismatch
or network failure leaves the pending entry intact so the user
can retry with a fresh paste without re-issuing `mcp login`.

Drops the #[allow(dead_code)] markers from start_paste_login,
complete_login, save_namespaced_token_at, remove_pending_login
and PasteLoginStart — all reachable from main via cli_login now,
so dead_code propagates through the call graph (skill rule).

Phase 2 user-facing surface is now complete: list / status /
connect / login. mcp status surfacing of NeedsAuth + the
device-code flow remain for the next slices.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… --paste

Mira's PR openabdev#962 review (Q2) noted that --paste echoes the redirect URL
into shell history and `ps`. PKCE makes either route theoretically
safe, but `echo "<url>" | openab-agent mcp login <name>` leaves no
trace — preferred for CI / scripts.
`auth::list_pending_logins_at` enumerates `mcp-pending:<name>` entries;
`McpRuntimeManager::pending_logins` delegates via injected auth_path.
`cli_show_status` cross-references against configured servers to flag
both "login pending — finish via mcp login <name>" and orphaned
entries with no matching mcp.json server. Introduces `PENDING_PREFIX`
const so the read/write sides can't drift on the literal.
…_status one-pass

Simplify pass on 9bca3de:

- auth.rs: PENDING_PREFIX demoted to private; pending_key() promoted to
  pub fn as the single public constructor. runtime.rs uses it instead
  of formatting against the const, so the namespace literal lives in
  exactly one place.
- mcp/mod.rs: cli_show_status folds the orphan calculation into the
  print loop via pending.remove(name) — the leftover set IS the
  orphans, dropping the second configured: HashSet<&str> pass.
- mcp/mod.rs: drop the cli_login "Security note" docstring — the same
  rationale lives on the Login clap subcommand doc in main.rs, which
  is the user-facing surface.
- runtime.rs: trim the spawn_blocking rationale on pending_logins —
  moved next to list_pending_logins_at where the sync read actually
  lives.

Net -10 LOC across 3 files; ~/ops/openab-agent-ci.sh --quick green.
Phase 2 / ADR §6.5: HTTP+oauth servers now dial with their cached
access_token when one is present and not within REFRESH_SKEW_SECONDS
of expiry. Refresh-token rotation for the expired-but-recoverable
case is the next slice — for now, expired tokens fall through to
NeedsAuth alongside missing tokens, so `mcp login` stays the single
user-actionable path.

- auth::is_expired now pub(crate) — single source of truth for the
  expiry+skew check across codex (get_valid_token) and mcp (connect).
- Dial::Http carries Option<String> auth; Dial::run uses
  StreamableHttpClientTransportConfig::with_uri(url).auth_header(token)
  when present, falls back to from_uri for anonymous HTTP.
- connect() oauth branch tries load_namespaced_token_at first; the
  Ok+!expired arm builds an authenticated Dial, all other arms bounce
  to NeedsAuth.

Tests:
- connect_oauth_with_valid_cached_token_attempts_dial_not_needs_auth
  proves the bearer-injection path reaches handshake (fails at the
  hermetic 127.0.0.1:1 dead address, surfaces Failed not NeedsAuth).
- connect_oauth_with_expired_token_bounces_to_needs_auth pins the
  expiry guard against future refresh-flow regressions.
Phase 2 / ADR §6.6: HTTP+oauth servers with an expired cached token now
attempt RFC 6749 §6 refresh-grant before bouncing to NeedsAuth. On
success the rotated TokenStore is fsynced and the dial proceeds with
the new bearer; on failure the bounce is preserved so `mcp login`
stays the user-actionable path.

- post_token_refresh: sibling of post_token_exchange for grant_type=
  refresh_token (public client, no client_secret).
- try_refresh_oauth_token: resolves client_id from current config (so
  a rotated catalog entry is picked up), POSTs the refresh, builds a
  new TokenStore preserving the previous refresh_token if the provider
  omits one (Google-style rotation), persists via
  save_namespaced_token_at.
- DialPlan two-phase enum lets the inside-lock branch tag "needs
  refresh" and defer the async refresh to outside the write lock —
  slow refresh must not block concurrent `mcp status` reads.

Tests:
- connect_oauth_expired_no_refresh_token_bounces_to_needs_auth pins
  the empty-refresh short-circuit (no POST attempted).
- connect_oauth_expired_with_refresh_token_failed_refresh_bounces_to_needs_auth
  proves the refresh path runs and that any failure surfaces as
  user-actionable NeedsAuth (current fixture's custom provider plus
  dead token endpoint both fail the refresh).

Custom-provider refresh shares the same gap as custom-provider login
— resolve_paste_client errors before the refresh POST runs. Will be
unblocked by the same future slice that wires custom-provider login.
…ected winner

Mira + Kirin caught a race: two concurrent connect() calls on the same
expired-token server can interleave so one wins the refresh + dial
while the other's refresh fails — and the loser was unconditionally
overwriting `Connected` with `NeedsAuth`, giving the user a false
"please log in" signal on a healthy session.

Guard the status mutation with `!matches!(h.status, Connected)` so a
concurrent winner's terminal state is preserved.
Mirrors finish_login (oauth.rs:271) so a hostile/buggy provider returning
a huge expires_in can't trigger a debug-build overflow panic in the
refresh-token rotation path.
…d refresh

post_token_exchange and post_token_refresh diverged only in the form
payload — same Client::builder() boilerplate, same status/body folding,
same response shape. Extract post_token_form so each grant call site is
just the RFC 6749 form tuple + a grant_label for the error context.

Also drops a narrating comment in connect() that recapped the three
match arms below it without adding any non-obvious context.
Adds the §3.1 device authorization request + §3.4 polling tick HTTP
helpers — pure data plumbing, no runtime wiring yet. The polling-tick
response classifier is split out as a pure fn (classify_device_poll) so
the §3.5 error-code → flow-state mapping (authorization_pending,
slow_down, access_denied, expired_token) is unit-testable without a
mock HTTP server.

DeviceAuthResponse defaults interval to 5s per RFC 8628 §3.5 when the
provider omits it. verification_uri_complete (§3.3.1 extension) is
Option since verification_uri + user_code is the always-present
fallback.

Both POST fns + the two types are #[allow(dead_code)] — next slice wires
start_device_login() and the polling loop into McpRuntimeManager.
…loop)

- `start_device_login(name)`: POST §3.1 + spawn detached §3.4 polling task
- `DeviceLoginStart`: public user-facing bundle (user_code / verification_uri / verification_uri_complete / expires_in)
- `resolve_device_client`: Custom-provider-only gate with explicit error
  for missing `device_authorization_endpoint` / `client_id`
- `run_device_poll_loop`: deadline check + SlowDown back-off (+5s) +
  success persistence + NeedsAuth on terminal failure
- `finalize_device_login` / `mark_device_login_failed`: status transitions
  (Disconnected on success keeps MCP handshake out of detached task;
   next `connect()` picks up the cached token via the oauth-aware DialPlan)
- 4 gating tests: unknown server / stdio / missing device endpoint / HTTP reach

#[allow(dead_code)] on the new surface until the next slice wires
`mcp login --device` CLI. Quick gate green.
- main.rs: add `--device` flag to `McpAction::Login` (conflicts_with `paste`)
- mcp/mod.rs: `cli_login_device(name)` — prints verification_uri_complete
  (§3.3.1) when present, falls back to `verification_uri` + `user_code`;
  watches `statuses()` until Disconnected (success) / NeedsAuth (terminal
  failure) or the provider-declared `expires_in` deadline elapses
- runtime.rs: drop `#[allow(dead_code)]` on the Tick-55 device-flow chain
  (`start_device_login` is now called from the CLI surface)
- config.rs: add `oauth.redirect_uri: Option<String>` to `OAuthConfig`
  for the paste-back branch on custom providers (built-ins pin their
  callback in `ProviderSpec`; ignored by the device-code branch)
- oauth.rs: propagate `redirect_uri` through `ResolvedProvider::Custom`
- runtime.rs: in `resolve_paste_client`, accept Custom + client_id +
  redirect_uri; emit precise per-missing-field error otherwise:
  - device endpoint present → "use device flow" (preserved)
  - missing client_id → "requires `oauth.client_id`"
  - missing redirect_uri → "requires `oauth.redirect_uri` (must match
    the redirect URL pre-registered with the provider)"
- Tests: rename `_rejects_custom_provider_for_now` →
  `_rejects_custom_without_redirect_uri`; add
  `_rejects_custom_without_client_id` + success-path
  `_custom_with_client_id_and_redirect_uri_succeeds`

Closes the last Phase 2 paste-back functional gap. ADR §6.3 enumerates
optional custom fields but doesn't currently list `redirect_uri`;
follow-up ADR amendment lands separately.
Adds the field to the §6.3 config-shape list and a follow-up paragraph
explaining why custom paste-back requires it (no local listener; must
match provider OAuth app registration) and why the device-code / built-in
branches ignore it.

Follow-up to 84837ee (custom-provider paste-back support); Mira + Kirin
dual sign-off requested this as a non-blocking docs amendment.
- auth.rs: convert free function `is_expired(&TokenStore) -> bool` into
  `impl TokenStore { pub fn is_expired(&self) -> bool }`. Switch the skew
  addition to `saturating_add` for parity with the rest of the OAuth
  arithmetic (defends `u64::MAX` "never expires" sentinel + any
  near-overflow clock value).
- auth.rs: fold `show_status`'s inline duplicate of the expiry formula
  into the new method.
- mcp/runtime.rs: drop the free-function import + update the cached-token
  DialPlan branch to `store.is_expired()`.
- Tests: rewrite the three existing `is_expired` cases to method form and
  add `_sentinel_u64_max` covering the saturating_add behavior.

Mira Tick 50 non-blocking cleanup.
@brettchien brettchien force-pushed the feat/openab-agent-mcp-oauth branch from 93da300 to e5ea59f Compare June 1, 2026 10:36
brettchien and others added 2 commits June 1, 2026 11:21
Drop stale `#[allow(dead_code)]` markers now that runtime/flow wire all
items through prod paths. Reuse `reqwest::Client` across the device-poll
loop so TLS / TCP handshakes are amortized across the dozens-to-hundreds
of polls a 30-minute device-flow window can produce. Trim session-narrative
parentheticals from doc comments.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reviewer-pass findings:

- Drop dead auth.rs wrappers (`load/save/remove_namespaced_token`) that
  were never wired; runtime/tests use the `_at` siblings directly.
- Extract `build_token_store` helper to dedupe the `TokenStore`
  construction + `u64::MAX` sentinel + refresh-rotation logic across
  `finish_login`, `finalize_device_login`, and `try_refresh_oauth_token`.
- Preserve underlying error context in `complete_login` via
  `.with_context()` instead of `.map_err(|_| anyhow!(...))`.
- Trim session-narrative parentheticals ("Tick N", "Mira's review")
  from doc comments; keep the technical reason that future readers need.
- Refresh stale doc strings that still said "next slice" for behaviour
  Phase 2 itself ships.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@shaun-agent
Copy link
Copy Markdown
Contributor

shaun-agent commented Jun 1, 2026

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Screening report posted screening and moved the project item to `PR-Screening`.

GitHub comment: #962 (comment)
Project action: https://github.com/orgs/openabdev/projects/1, item PVTI_lADOEFbZWM4BUUALzguVxqM, status now PR-Screening

Intent

PR #962 is trying to add Phase 2 of the openab-agent native MCP client: OAuth 2.1 paste-back authentication for Streamable HTTP MCP servers. The operator-visible problem is that protected HTTP MCP servers can be discovered but cannot yet complete a durable auth flow in headless or Discord-driven environments.

Feat

Feature work. It introduces provider resolution, OAuth config/discovery checks, PKCE/state primitives, pending paste-login persistence, token-store namespacing/fsync behavior, and runtime NeedsAuth state for protected HTTP servers. The PR is draft and stacked on #959, so reviewers should treat the current diff as noisy until Phase 1 lands and this branch rebases.

Who It Serves

Primary beneficiaries: agent runtime operators and deployers who need OpenAB agents to connect to protected MCP servers. Secondary beneficiaries: maintainers who need the MCP auth model split into reviewable phases before exposing mcp login UX.

Rewritten Prompt

Implement the OAuth 2.1 paste-back foundation for openab-agent MCP Streamable HTTP servers, stacked after #959. Add typed OAuth configuration, builtin provider metadata for Anthropic/OpenAI/xAI, custom provider resolution, PKCE S256 verifier/challenge generation, CSRF-resistant state handling, pending-login persistence in auth.json, and runtime state that marks protected servers as needing auth. Keep this slice below CLI UX: do not add the final mcp login command wiring unless it can be tested end to end. Include focused tests for provider resolution, state mismatch rejection, PKCE round-trip, secret redaction, auth persistence/fsync behavior, and feature combinations.

Merge Pitch

This should move forward as the next MCP capability slice because it builds the security-critical auth substrate before user-facing command wiring. Risk is medium-high: OAuth state, token persistence, and secret redaction are easy places to create durable security bugs. The likely reviewer concern is not whether OAuth is needed, but whether this draft can be reviewed while stacked on #959 and whether the implementation has enough tests around CSRF/state, PKCE, persistence, and redaction before it becomes the base for CLI login.

Best-Practice Comparison

OpenClaw is relevant mainly as an operational reference: durable job/state persistence, explicit routing, retries, and run logs map to the need for durable auth state and clear auth/runtime transitions. This PR appears aligned on durable persisted state, but should make failure and recovery behavior explicit before CLI UX lands.

Hermes Agent is also relevant: its fresh-session scheduled runs and atomic persisted state are a useful bar for auth.json writes and pending-login recovery. The paste-back design should be checked for atomic writes, lock behavior if multiple agent processes touch auth state, and deterministic recovery after interruption.

Implementation Options

Conservative: wait for #959 to merge, rebase, then review only the OAuth commits with no new scope. Require focused unit tests for PKCE, state mismatch, provider resolution, pending-login serialization, and secret redaction before marking ready.

Balanced: after rebase, merge the OAuth foundation behind the existing MCP feature gates, but split any large runtime behavior into a follow-up if it is not directly needed for paste-back state. Add a small integration-style test around protected HTTP server auth state without requiring a live vendor server.

Ambitious: rebase and complete the full paste-back user path in one PR, including mcp login, callback parsing, token exchange completion, redacted listing, and a mocked end-to-end OAuth server test. This gives reviewers a complete UX but increases review size and security blast radius.

Comparison Table

Option Speed Complexity Reliability Maintainability User Impact Fit for OpenAB now
Conservative Fast after #959 Low Good if tests are added Good, smallest review No immediate login UX Strong fit for stacked draft cleanup
Balanced Moderate Medium Better coverage of runtime behavior Good if runtime scope stays tight Prepares reliable next-slice CLI Best fit
Ambitious Slow High Potentially strong, but larger risk Harder to review and bisect Delivers visible login path Premature while #959 is still pending

Recommendation

Use the balanced path after #959 lands: rebase, collapse the diff to OAuth-only commits, keep mcp login as the next slice, and require focused tests for PKCE/state, provider resolution, persistence/recovery, feature gates, and secret redaction before advancing from draft. If runtime changes remain large after rebase, split nonessential runtime expansion into Phase 3 so Masami or Pahud can review the security-sensitive OAuth core cleanly.

brettchien and others added 5 commits June 1, 2026 12:08
Concurrent connect() callers may each hit `try_refresh_oauth_token`
with the same stale `refresh_token` (no per-server single-flight gate
yet). The connect() race guard prevents state corruption; the worst
case is N duplicate POSTs to the token endpoint. A Phase 3 follow-up
will add per-server single-flight for providers that cascade-revoke
on replayed refresh tokens.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Align ADR with implementation: the device-code polling loop persists
tokens then sets status to Disconnected, so the next connect() reads
the cached token via the oauth-aware dial path. Keeps the rmcp handshake
out of the detached polling task.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Track the most-recent device-poll JoinHandle per server in
`McpRuntimeManager.device_login_tasks`. On a fresh `start_device_login`
the prior task is aborted before the new one is registered, so a retry
after a transient failure can't leave two loops racing to finalize the
same server (would risk double-persisting tokens / contradictory status
transitions).

Phase 2 follow-up openabdev#4 of 4.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add `refresh_locks: HashMap<server, Arc<tokio::Mutex>>` and acquire the
per-server lock before the refresh POST. After taking the lock, re-read
the on-disk token; if a prior waiter already refreshed, return the
cached store without a second POST.

Closes the cascade-revoke window on providers (Google, ...) that
invalidate every issued refresh_token when the same one is replayed
concurrently. Phase 2 follow-up openabdev#1 of 4.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the truncate+write+sync_all path with: write same-dir tmp
(PID + atomic counter for uniqueness, mode 0o600 via create_new),
fsync the tmp, rename(2) onto auth.json, fsync the parent directory.
On crash, the file is either the prior version or the new version —
never half-written.

The old implementation truncated `auth.json` before writing, so a
mid-write crash left a corrupt file that the next startup would fail
to parse. On the ECS deployment that then re-restored a now-revoked
refresh_token from S3 (ADR §6.1). Phase 2 follow-up openabdev#3 of 4.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants