Database = catalog selection; serverless identity = hostname + username#651
Conversation
… + username Stop masking the connection database name onto a physical catalog, and stop using it to identify the org on serverless. Database is now catalog selection, not identity: - Connect with `ducklake`/`iceberg` to default into that catalog; empty selects the org's default attached catalog. Any other name fails 3D000. - Drop the logical->physical masking everywhere: the transpiler pins LogicalDatabaseName to the physical `ducklake` (keeps public->main, no rename); rewriteDirectQuery only expands bare `USE ducklake`/`USE iceberg`; sessionmeta / current_database() report the real catalog. Serverless identity = managed hostname (SNI) + username only: - ResolvePostgresConnection resolves the org from the SNI prefix and authenticates (org, user); the database name no longer routes. Post-session attachment probe fails closed (3D000) if the requested catalog isn't attached. - sni_routing_mode defaults to `enforce`; unresolvable hostnames are rejected. - Flight SQL: identity is now SNI-only; removed the FindAndValidateUser username-scan (a username could collide across orgs). BREAKING: existing serverless clients connecting with dbname=<org name> must switch to dbname=ducklake/iceberg (or empty). Coordinate with the enforce-SNI rollout. Tests: reworked configstore/control-plane/transpiler/direct-query unit tests (green); replaced the masking-only integration tests with catalog_demask_test.go; migrated the k8s harness to present a managed SNI + empty (default-catalog) dbname and flipped the kind manifest to enforce. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
catalog_demask_test.go sorts before catalog_test.go, so it ran first and left a `bill` schema in the shared DuckLake catalog, breaking TestCatalogPsqlCommands/ psql_dn's schema-count parity (3 vs 2). Use a uniquely-named schema dropped via t.Cleanup so the shared catalog is clean for later tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t attached Review follow-ups: - resolveEffectiveCatalog: when database is empty and a user's configured DefaultCatalog (iceberg) isn't attached, fail closed (3D000) instead of silently routing to ducklake. Restores the pre-rework fail-closed contract. - standalone conn.go: align c.database with the resolved real catalog so pg_stat_activity.datname/logs agree with current_database() (the control-plane path already does this via NewClientConn). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review notesRan an adversarial correctness + security pass over the diff. Two findings fixed in
Known pre-existing issue (follow-up, not addressed here)Flight SQL's Verified sound (no change needed)
|
EDsCODE
left a comment
There was a problem hiding this comment.
Blocker: Flight auth still collapses SNI-scoped identity back to a bare username before session creation.
The PR correctly resolves the org from SNI during auth, but then stores that result as userOrg[username] = orgID in controlplane/control.go. Later, orgRoutedSessionProvider.CreateSession in controlplane/flight_ingress.go routes by looking up userOrg[username].
That means the authenticated identity is not “this connection is alice in org-acme”; it becomes “the latest authenticated org for username alice.” Usernames are only unique inside an org, so two tenants can both have alice:
aliceauthenticates toacme.<managed-suffix>and storesuserOrg["alice"] = "org-acme".- another
aliceauthenticates tobeta.<managed-suffix>and overwrites it withuserOrg["alice"] = "org-beta". - the first connection creates a Flight session and can now be routed to Beta’s worker stack.
This preserves the exact username-collision class the SNI-only identity change is meant to remove. SNI resolves the org correctly during auth, but the auth-to-session handoff drops the org and reintroduces a mutable global username -> orgID side channel.
Can we restructure this so the authenticated principal carries both orgID and username through to session creation? A few acceptable shapes would be: return/pass an authenticated principal from the Flight auth layer, key the handoff by a per-auth/session nonce, or include org identity in the Flight session key/context. The important part is that session routing must not depend on a bare username lookup.
|
Addressed the Flight SNI-org-routing blocker in #652 — now re-resolves the org from the connection's SNI (deleting the username→org map entirely), with tests proving two same-username connections route to their own org. |
…username map (#652) * fix(controlplane): bind Flight sessions to the connection's SNI org, not a username map Addresses the blocker raised in review of #651: Flight auth resolved the org from SNI correctly, but stored the result in a process-global `orgRoutedSessionProvider.userOrg[username] -> orgID` map that `CreateSession` read back by bare username. Usernames are only unique within an org, so two tenants sharing a username could race: connection A (org-acme) writes `userOrg["alice"]=org-acme`, connection B (org-beta) overwrites it, and A's session then gets created against org-beta's worker stack. This reintroduced the exact username-collision class the SNI-only identity change was meant to remove. Fix: derive the org for a session from the connection's managed hostname (SNI) — the same immutable per-connection identity auth uses — re-resolved at session-create time, and delete the username map entirely: - orgRoutedSessionProvider gains an injected `resolveOrg(ctx) -> (orgID, ok)` and drops the `userOrg` map. CreateSession resolves the org from the request context's SNI and fails closed if it doesn't resolve. - Production wires resolveOrg to ControlPlane.flightOrgFromContext (extract SNI from the gRPC peer → resolveFlightOrgFromSNI). The Postgres-side auth resolution is unchanged; the Flight validator now only authenticates and stores no routing state. - Tests: prove two same-username connections route to their own org by context, and fail closed when the SNI doesn't resolve. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * review: share SNIFromContext between auth and routing; harden CreateSession Follow-up review hardening for the Flight org-routing fix: - Export flightsqlingress.SNIFromContext and use it from both the auth path and controlplane session routing, deleting the duplicated SNI-extraction helper so auth and routing can never silently diverge on a connection's hostname. - Fail closed with a clear error if an orgRoutedSessionProvider is ever constructed without an org resolver (defensive; the one production wiring always sets it). - Add TestFlightOrgFromContextResolvesViaSNI: drives the real peer→TLS ServerName→extractOrgFromSNI→ResolveSNIPrefix chain (managed resolves; unknown prefix, unmanaged hostname, and missing peer all fail closed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What & why
Two coupled changes the connection model has wanted for a while:
Stop masking the connection database name onto a physical catalog. Previously a client's
databasewas treated as a logical name masked over the physicalducklakecatalog (acurrent_database()macro, a transpiler catalog-rename,USE <logical>rewrites, a filteredSHOW DATABASES). Now the database name is catalog selection: connect withducklake/icebergto default into that catalog, empty selects the org's default attached catalog, anything else fails3D000.current_database()reports the real catalog.On serverless, stop using the database name to establish identity. The org is now resolved solely from the managed hostname (SNI) + username; the database name plays no part in routing.
Changes
De-masking (all modes)
server/conn.go::newTranspilerpinsLogicalDatabaseNameto the physicalducklake(gated on DuckLake mode) — keepspublic→mainfor three-partducklake.*refs, never renames; iceberg/arbitrary names untouched.rewriteDirectQuerytrimmed to just the bare-USE ducklake/USE icebergtwo-part expansion (DuckDB bare-catalog reliability); dropped theUSE <logical>arm and theSHOW DATABASESrewrite. Flag renamedlogicalCatalogMapping→catalogUseRewrite.sessionmeta.InitSessionDatabaseMetadatanow receives the real catalog; standalone honors the attached catalog.Serverless identity (control plane)
configstore.ResolvePostgresConnectionresolves org from the SNI prefix, authenticates(org, user), and returnsEffectiveCatalog/CatalogValid(validatingdatabase ∈ {ducklake, iceberg, ""}). DroppedDatabaseOrg-as-identity.control.gorejects unresolvable hostnames (08006), invalid catalogs (3D000), and adds a post-session attachment probe (resolveEffectiveCatalog) that fails closed if the requested catalog isn't actually attached.sni_routing_modedefaults toenforce.Flight SQL — identity is now SNI-only; removed the
FindAndValidateUserusername-scan (a username could collide across orgs).⚠ Breaking
Existing serverless clients connecting with
dbname=<org name>must switch todbname=ducklake/iceberg(or empty). Coordinate with theenforce-SNI rollout.Tests
go test ./controlplane/... ./server/... ./transpiler/... ./configresolve/...).catalog_demask_test.go(compiles).dbname, flippedk8s/kind/control-plane.yamltoenforce, rewrotesni_test.go. Binary compiles (go test -c -tags 'k8s_integration kubernetes').Local k8s validation note
I drove the kind e2e locally and confirmed the identity path works end-to-end: connections authenticated with
database=empty + managed SNIlocal.dw.test.local, resolving the org and reaching the worker-acquisition stage (no08006/3D000). The run couldn't complete because this dev host (fc44, kernel 7.0.10) hits a kindnet NetworkPolicy-egress bug where the control-plane pod's egress to the API ClusterIP is dropped (a plain pod with no policy reaches it fine), so no worker pods spawn. That's host infra, not this change — CI should exercise the full suite.🤖 Generated with Claude Code