Skip to content

fix(router): harden /router/stats disclosure and decode_target trust#393

Merged
inureyes merged 3 commits into
mainfrom
fix/issue-389-disaggregated-defense-in-depth
Jun 21, 2026
Merged

fix(router): harden /router/stats disclosure and decode_target trust#393
inureyes merged 3 commits into
mainfrom
fix/issue-389-disaggregated-defense-in-depth

Conversation

@inureyes

Copy link
Copy Markdown
Member

Summary

Defense-in-depth hardening for the disaggregated router (issue #201) covering the two MEDIUM findings in issue #389: /router/stats disclosing internal cluster topology, and the prefill node trusting the router-chosen decode_target. Both harden on top of the existing trusted-network-segment threat model rather than replacing it.

Finding 1: /router/stats topology disclosure

GET /router/stats is mounted on the client-facing axum app (same surface as /v1/chat/completions) and returned every prefill/decode peer's raw host:port, so an unauthenticated client that reaches the inference port could map the internal cluster.

Chosen option: (a) redact raw addresses on the client-facing variant (the lowest-risk default; the codebase has no separate management/auth interface that would make option (b) natural). The redacted body reports only the stable router-assigned node ids (prefill-<i> / decode-<i>), roles, health, per-node dispatch counts, and the RouterMetrics snapshot, which is enough to read the load spread and confirm a failed node is marked unreachable without disclosing topology. The body carries "addresses_redacted" so a caller knows which view it received. Full detail stays available via an opt-in env flag MLXCEL_ROUTER_STATS_VERBOSE (1/true/yes/on) for trusted-segment debugging.

Finding 2: prefill node trusts router-chosen decode_target

The prefill node connects to decode_target to ship the KV-cache handoff (which encodes the prompt), so a forged frame from a breached segment could redirect it off-cluster. The prefill role now validates decode_target against a DecodeAllowlist before connecting: an off-list target is rejected (the request fails with a clear error and the KV cache never leaves for an unknown address) instead of connected to.

Allowlist design and why it does NOT break router-driven balancing: the allowlist is built from the prefill node's FULL --decode-peers set, not just its single static handoff peer (.first()), so it covers every decode node the router may select. Operators configure each prefill node's --decode-peers with the same decode pool the router routes over (the shared cluster config), so any router-selectable decode node is on every prefill's allowlist and router-driven decode balancing keeps working. When no allowlist is configured (an empty set, which a correctly configured prefill-only node never has because --decode-peers is required), the prefill stays permissive-with-warning so a missing allowlist source never silently disables balancing. A frame without decode_target (an older router) still falls back to the first static --decode-peers entry, which needs no validation because it is operator config rather than a wire-supplied address. Comparison is canonical (parsed SocketAddr), so equivalent spellings of the same address match.

serve_prefill_role_networked_blocking now takes the full Vec<SocketAddr> decode pool (first entry is the static fallback, whole set is the allowlist); the model-worker dispatch passes its configured decode_peers through unchanged.

What changed

  • src/server/router_front.rs: router_stats handler redacts addresses by default; new pure helpers router_stats_body and router_stats_verbose_from (env-gated verbose view).
  • src/distributed/disaggregated/coordinator.rs: new DecodeAllowlist + DecodeTargetDecision; ServingCoordinator carries the allowlist (with_decode_allowlist builder); run_prefill_role_networked validates decode_target and rejects off-list targets with an error result frame; serve_prefill_role_networked_blocking takes the full decode-peer set.
  • src/server/model_worker.rs: passes the full decode_peers into the prefill role.
  • src/distributed/disaggregated/coordinator_tests.rs: allowlist unit tests (on-list accepted, off-list rejected, unconfigured permissive, canonical-address comparison).
  • src/server/router_front_tests.rs: /router/stats redaction unit tests (redacted by default, verbose includes addresses, env parsing).
  • docs/distributed.md: documents both behaviors, including the critical requirement to configure prefill --decode-peers with the full decode pool.

Test plan

  • cargo check --lib --tests --features metal,accelerate
  • cargo test --lib --features metal,accelerate distributed::disaggregated::coordinator::tests::allowlist (4 passed)
  • cargo test --lib --features metal,accelerate server::router_front::tests::router_stats (3 passed)
  • cargo clippy --lib --tests --features metal,accelerate -- -D warnings (clean)

Closes #389

Defense-in-depth on top of the disaggregated router (issue #201) for the two MEDIUM findings tracked in issue #389. Both go beyond the existing trusted-network-segment baseline rather than replacing it.

Finding 1: /router/stats topology disclosure. The endpoint is mounted on the client-facing axum app (same surface as /v1/chat/completions) and returned every prefill/decode peer's raw host:port, so an unauthenticated client that reaches the inference port could map the internal cluster. The client-facing variant now redacts raw addresses by default and reports only the stable router-assigned node ids (prefill-<i> / decode-<i>), roles, health, per-node dispatch counts, and the RouterMetrics snapshot, which is enough to read the load spread without disclosing topology. The body carries "addresses_redacted" so a caller knows which view it got. MLXCEL_ROUTER_STATS_VERBOSE opts a trusted-segment deployment back into the full address view. The parsing and body builder are pure functions with unit tests.

Finding 2: prefill node trusts the router-chosen decode_target. The prefill node connects to decode_target to ship the KV-cache handoff (which encodes the prompt), so a forged frame from a breached segment could redirect it off-cluster. The prefill role now validates decode_target against a DecodeAllowlist before connecting: an off-list target is rejected (the request fails with a clear error and the KV cache never leaves) instead of connected to.

The allowlist is the crux of the issue: it is built from the prefill node's FULL --decode-peers set, not just its single static handoff peer, so it covers every decode node the router may select. Operators configure each prefill node's --decode-peers with the same decode pool the router routes over (the shared cluster config), so router-driven decode balancing keeps working for any router-selectable decode node. When no allowlist is configured (an empty set, which a correctly configured prefill-only node never has because --decode-peers is required), the prefill stays permissive-with-warning so a missing allowlist source never silently disables balancing. A frame without decode_target (an older router) still falls back to the first static --decode-peers entry, which needs no validation because it is operator config rather than a wire-supplied address. Comparison is canonical (parsed SocketAddr) so equivalent spellings of the same address match. The allowlist decision is a pure function with unit tests for on-list, off-list, and unconfigured cases.

serve_prefill_role_networked_blocking now takes the full Vec<SocketAddr> decode pool (first entry is the static fallback, whole set is the allowlist); the model-worker dispatch passes its configured decode_peers through unchanged.

Both behaviors are documented in docs/distributed.md, including the critical requirement to configure prefill --decode-peers with the full decode pool.
@inureyes inureyes added type:security Security vulnerability or fix priority:low Low priority area:architecture Architecture and code structure changes status:review Under review labels Jun 21, 2026
inureyes added 2 commits June 22, 2026 08:42
…IST, not local --decode-peers

The prefill allowlist was built from the prefill node's own --decode-peers set, which is normally narrower than the router's selectable decode pool (a prefill's --decode-peers is the static fallback, often a single peer, while the router balances over its own registry). Validating against the local set rejected legitimate router-balanced handoffs and broke the #201 multi-node decode balancing, including the existing 2x2 disaggregated E2E (prefill configured with decode0 while the router routes decode0,decode1).

Decouple the allowlist from --decode-peers: build it from a dedicated opt-in env input MLXCEL_DECODE_ALLOWLIST (comma-separated host:port covering the full router-selectable pool, the shared cluster config the issue suggested). When unset the allowlist is empty and validation is permissive-with-warning, so router-driven balancing is never silently disabled by default; --decode-peers.first() remains the static fallback. Unparseable entries are skipped with a warning rather than aborting startup. Docs in distributed.md and CONTINUOUS_BATCHING.md reconciled so --decode-peers keeps its static-fallback meaning.
Suppress repeated WARN spam in the AllowUnchecked path of
run_prefill_role_networked. When MLXCEL_DECODE_ALLOWLIST is unset and
the router populates decode_target on every request (the default
multi-node config per issue #201), the previous code emitted one WARN
per request, flooding the log and making real warnings hard to spot.
A process-wide AtomicBool now gates the tracing::warn! so it fires at
most once per process lifetime. Security posture is unchanged: the node
is still permissive and still forwards; only the log noise is reduced.

Also add a one-line note to docs/distributed.md and
docs/CONTINUOUS_BATCHING.md clarifying that MLXCEL_DECODE_ALLOWLIST
entries must be numeric IP:port values (parsed as SocketAddr); hostname
entries are skipped with a warning and will never match a router-chosen
target.
@inureyes

Copy link
Copy Markdown
Member Author

PR Finalization Complete

Summary

Tests: all 10 coordinator allowlist tests and 3 router_stats tests pass unchanged; no new test needed for the log-guard path (it wraps a tracing macro, not business logic).

Documentation: added a one-line note to docs/distributed.md and docs/CONTINUOUS_BATCHING.md stating that MLXCEL_DECODE_ALLOWLIST entries must be numeric IP:port values (parsed as SocketAddr); hostname entries are skipped with a warning and will not match.

Lint/Format: cargo fmt applied; cargo clippy --lib --tests --features metal,accelerate -- -D warnings clean.

Changes

  • src/distributed/disaggregated/coordinator.rs: added a process-wide static AtomicBool (ALLOWLIST_UNCONFIGURED_WARNED); the AllowUnchecked arm now calls .swap(true, Relaxed) and emits the warning only on the first hit. Ordering added to the existing AtomicBool import. The warning text names MLXCEL_DECODE_ALLOWLIST, describes the permissive posture, and confirms it fires once per process. Security behavior is unchanged.
  • docs/distributed.md: added the IP:port-only format note at the end of the existing allowlist paragraph.
  • docs/CONTINUOUS_BATCHING.md: same note in the corresponding paragraph.

Commit: 7b698b11d. All checks passing. Ready for merge.

@inureyes inureyes added status:done Completed and removed status:review Under review labels Jun 21, 2026
@inureyes inureyes merged commit e5780e2 into main Jun 21, 2026
5 checks passed
@inureyes inureyes deleted the fix/issue-389-disaggregated-defense-in-depth branch June 21, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:architecture Architecture and code structure changes priority:low Low priority status:done Completed type:security Security vulnerability or fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore(router): defense-in-depth hardening for /router/stats exposure and decode_target trust

1 participant