Skip to content

fix(controlplane): bind Flight sessions to connection SNI org, not a username map#652

Merged
fuziontech merged 2 commits into
mainfrom
fix/flight-sni-org-routing
Jun 1, 2026
Merged

fix(controlplane): bind Flight sessions to connection SNI org, not a username map#652
fuziontech merged 2 commits into
mainfrom
fix/flight-sni-org-routing

Conversation

@fuziontech
Copy link
Copy Markdown
Member

Follow-up to #651, addressing the review blocker from @EDsCODE.

Problem

#651 made Flight auth resolve the org from SNI correctly, but the result was stashed in a process-global map keyed by bare username:

// during auth:
v.orgProvider.userOrg[username] = orgID
// later, at session create:
orgID := p.userOrg[username]

Usernames are only unique within an org, so two tenants can both have alice:

  1. alice@acme.<suffix> authenticates → userOrg["alice"] = org-acme
  2. another alice@beta.<suffix> authenticates → overwrites → userOrg["alice"] = org-beta
  3. connection 1 creates its Flight session and is routed to org-beta's worker stack.

This reintroduced the exact username-collision class the SNI-only identity change was meant to eliminate — the SNI-scoped principal was collapsed back to a username at the auth→session handoff.

Fix

Derive the session's org from the connection's managed hostname (SNI) — the same immutable per-connection identity auth uses — re-resolved at session-create time, and delete the username map entirely:

  • orgRoutedSessionProvider drops userOrg and gains an injected resolveOrg(ctx) (orgID, ok). CreateSession resolves the org from the request context's SNI and fails closed if it doesn't resolve.
  • Production wires resolveOrg to ControlPlane.flightOrgFromContext (extract SNI from the gRPC peer → resolveFlightOrgFromSNI). The Flight validator now only authenticates (ValidateOrgUser against the SNI-resolved org) and stores no routing state. The Postgres-side resolution is untouched.
  • The ctx reaching CreateSession is a timeout-child of the gRPC request ctx, so peer.FromContext still yields the TLS ServerName.

There is now no shared mutable username→org state anywhere in the Flight path; routing identity is a pure function of the connection's hostname.

Tests

  • TestOrgRoutedSessionProviderRoutesByContextSNINotUsername: two connections sharing username alice from different org contexts each route to their own org (StackForOrg(org-a) then (org-b)), proving the collision is gone.
  • TestOrgRoutedSessionProviderFailsClosedWhenSNIUnresolved: no session created when the SNI doesn't resolve.
  • Updated the validator unit tests (no more userOrg assertion); the durable-reconnect path already carried OrgID correctly and is unchanged.

Builds clean on both default and -tags kubernetes; controlplane suite green on both.

🤖 Generated with Claude Code

…not a username map

Addresses the blocker raised in review of #651: Flight auth resolved the org
from SNI correctly, but stored the result in a process-global
`orgRoutedSessionProvider.userOrg[username] -> orgID` map that `CreateSession`
read back by bare username. Usernames are only unique within an org, so two
tenants sharing a username could race: connection A (org-acme) writes
`userOrg["alice"]=org-acme`, connection B (org-beta) overwrites it, and A's
session then gets created against org-beta's worker stack. This reintroduced the
exact username-collision class the SNI-only identity change was meant to remove.

Fix: derive the org for a session from the connection's managed hostname (SNI) —
the same immutable per-connection identity auth uses — re-resolved at
session-create time, and delete the username map entirely:

- orgRoutedSessionProvider gains an injected `resolveOrg(ctx) -> (orgID, ok)` and
  drops the `userOrg` map. CreateSession resolves the org from the request
  context's SNI and fails closed if it doesn't resolve.
- Production wires resolveOrg to ControlPlane.flightOrgFromContext (extract SNI
  from the gRPC peer → resolveFlightOrgFromSNI). The Postgres-side auth resolution
  is unchanged; the Flight validator now only authenticates and stores no routing
  state.
- Tests: prove two same-username connections route to their own org by context,
  and fail closed when the SNI doesn't resolve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ession

Follow-up review hardening for the Flight org-routing fix:
- Export flightsqlingress.SNIFromContext and use it from both the auth path and
  controlplane session routing, deleting the duplicated SNI-extraction helper so
  auth and routing can never silently diverge on a connection's hostname.
- Fail closed with a clear error if an orgRoutedSessionProvider is ever
  constructed without an org resolver (defensive; the one production wiring always
  sets it).
- Add TestFlightOrgFromContextResolvesViaSNI: drives the real peer→TLS
  ServerName→extractOrgFromSNI→ResolveSNIPrefix chain (managed resolves; unknown
  prefix, unmanaged hostname, and missing peer all fail closed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fuziontech fuziontech requested a review from a team June 1, 2026 22:22
@fuziontech fuziontech merged commit 4822d31 into main Jun 1, 2026
22 checks passed
@fuziontech fuziontech deleted the fix/flight-sni-org-routing branch June 1, 2026 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant