Skip to content

Database = catalog selection; serverless identity = hostname + username#651

Merged
fuziontech merged 3 commits into
mainfrom
feat/database-as-catalog-selection
Jun 1, 2026
Merged

Database = catalog selection; serverless identity = hostname + username#651
fuziontech merged 3 commits into
mainfrom
feat/database-as-catalog-selection

Conversation

@fuziontech
Copy link
Copy Markdown
Member

What & why

Two coupled changes the connection model has wanted for a while:

  1. Stop masking the connection database name onto a physical catalog. Previously a client's database was treated as a logical name masked over the physical ducklake catalog (a current_database() macro, a transpiler catalog-rename, USE <logical> rewrites, a filtered SHOW DATABASES). Now the database name is catalog selection: connect with ducklake/iceberg to default into that catalog, empty selects the org's default attached catalog, anything else fails 3D000. current_database() reports the real catalog.

  2. On serverless, stop using the database name to establish identity. The org is now resolved solely from the managed hostname (SNI) + username; the database name plays no part in routing.

Changes

De-masking (all modes)

  • server/conn.go::newTranspiler pins LogicalDatabaseName to the physical ducklake (gated on DuckLake mode) — keeps public→main for three-part ducklake.* refs, never renames; iceberg/arbitrary names untouched.
  • rewriteDirectQuery trimmed to just the bare-USE ducklake/USE iceberg two-part expansion (DuckDB bare-catalog reliability); dropped the USE <logical> arm and the SHOW DATABASES rewrite. Flag renamed logicalCatalogMappingcatalogUseRewrite.
  • sessionmeta.InitSessionDatabaseMetadata now receives the real catalog; standalone honors the attached catalog.

Serverless identity (control plane)

  • configstore.ResolvePostgresConnection resolves org from the SNI prefix, authenticates (org, user), and returns EffectiveCatalog/CatalogValid (validating database ∈ {ducklake, iceberg, ""}). Dropped DatabaseOrg-as-identity.
  • control.go rejects unresolvable hostnames (08006), invalid catalogs (3D000), and adds a post-session attachment probe (resolveEffectiveCatalog) that fails closed if the requested catalog isn't actually attached.
  • sni_routing_mode defaults to enforce.

Flight SQL — identity is now SNI-only; removed the FindAndValidateUser username-scan (a username could collide across orgs).

⚠ Breaking

Existing serverless clients connecting with dbname=<org name> must switch to dbname=ducklake/iceberg (or empty). Coordinate with the enforce-SNI rollout.

Tests

  • Unit: reworked configstore / control-plane / transpiler / direct-query tests — green locally (go test ./controlplane/... ./server/... ./transpiler/... ./configresolve/...).
  • Integration: replaced the three masking-only files with catalog_demask_test.go (compiles).
  • k8s: migrated the harness to present a managed SNI per tenant + empty (default-catalog) dbname, flipped k8s/kind/control-plane.yaml to enforce, rewrote sni_test.go. Binary compiles (go test -c -tags 'k8s_integration kubernetes').

Local k8s validation note

I drove the kind e2e locally and confirmed the identity path works end-to-end: connections authenticated with database= empty + managed SNI local.dw.test.local, resolving the org and reaching the worker-acquisition stage (no 08006/3D000). The run couldn't complete because this dev host (fc44, kernel 7.0.10) hits a kindnet NetworkPolicy-egress bug where the control-plane pod's egress to the API ClusterIP is dropped (a plain pod with no policy reaches it fine), so no worker pods spawn. That's host infra, not this change — CI should exercise the full suite.

🤖 Generated with Claude Code

fuziontech and others added 3 commits June 1, 2026 11:58
… + username

Stop masking the connection database name onto a physical catalog, and stop
using it to identify the org on serverless.

Database is now catalog selection, not identity:
- Connect with `ducklake`/`iceberg` to default into that catalog; empty selects
  the org's default attached catalog. Any other name fails 3D000.
- Drop the logical->physical masking everywhere: the transpiler pins
  LogicalDatabaseName to the physical `ducklake` (keeps public->main, no rename);
  rewriteDirectQuery only expands bare `USE ducklake`/`USE iceberg`; sessionmeta /
  current_database() report the real catalog.

Serverless identity = managed hostname (SNI) + username only:
- ResolvePostgresConnection resolves the org from the SNI prefix and authenticates
  (org, user); the database name no longer routes. Post-session attachment probe
  fails closed (3D000) if the requested catalog isn't attached.
- sni_routing_mode defaults to `enforce`; unresolvable hostnames are rejected.
- Flight SQL: identity is now SNI-only; removed the FindAndValidateUser
  username-scan (a username could collide across orgs).

BREAKING: existing serverless clients connecting with dbname=<org name> must
switch to dbname=ducklake/iceberg (or empty). Coordinate with the enforce-SNI
rollout.

Tests: reworked configstore/control-plane/transpiler/direct-query unit tests
(green); replaced the masking-only integration tests with catalog_demask_test.go;
migrated the k8s harness to present a managed SNI + empty (default-catalog)
dbname and flipped the kind manifest to enforce.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
catalog_demask_test.go sorts before catalog_test.go, so it ran first and left a
`bill` schema in the shared DuckLake catalog, breaking TestCatalogPsqlCommands/
psql_dn's schema-count parity (3 vs 2). Use a uniquely-named schema dropped via
t.Cleanup so the shared catalog is clean for later tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t attached

Review follow-ups:
- resolveEffectiveCatalog: when database is empty and a user's configured
  DefaultCatalog (iceberg) isn't attached, fail closed (3D000) instead of
  silently routing to ducklake. Restores the pre-rework fail-closed contract.
- standalone conn.go: align c.database with the resolved real catalog so
  pg_stat_activity.datname/logs agree with current_database() (the control-plane
  path already does this via NewClientConn).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fuziontech
Copy link
Copy Markdown
Member Author

Review notes

Ran an adversarial correctness + security pass over the diff. Two findings fixed in 74f6858:

  1. Fail-closed regression (introduced, fixed): resolveEffectiveCatalog silently routed to ducklake when database was empty and a user's configured DefaultCatalog=iceberg wasn't attached. On main this failed closed (the connect-time USE iceberg.public errored → rejected). Restored fail-closed (→ 3D000) + added a unit case.
  2. Standalone observability (fixed): aligned c.database with the resolved real catalog so pg_stat_activity.datname/logs agree with current_database() (the control-plane path already did this via NewClientConn).

Known pre-existing issue (follow-up, not addressed here)

Flight SQL's orgRoutedSessionProvider.userOrg map is keyed by username alone (controlplane/flight_ingress.go:75,80). Two concurrent Flight clients with the same username from different managed hostnames can collide (last-writer-wins), so a session could be created against the wrong org's stack. This predates this PR; this PR actually reduces the exposure (auth is now strictly SNI-derived per-org instead of the old arbitrary first-match username scan). The proper fix is to key the map by (orgID, username) (or re-resolve the org from SNI in CreateSession), which touches the SessionProvider signature — better as its own change. Filing as a follow-up.

Verified sound (no change needed)

  • No cross-tenant bypass via the redesigned database→catalog path: org is taken only from the SNI hostname, auth is scoped to (orgID, username).
  • enforce-default flip is safe: the SNI block is gated on configStore != nil (remote backend only); process/standalone are unaffected.
  • bcrypt timing-leak guard preserved for unknown users / unresolved orgs.
  • Pre-auth 3D000 (invalid catalog name) before 28P01 only reveals that ducklake/iceberg are the valid names — public; entitlement/attachment is checked post-auth.
  • Post-session 3D000 teardown is covered by the existing defer DestroySession.

Copy link
Copy Markdown
Contributor

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker: Flight auth still collapses SNI-scoped identity back to a bare username before session creation.

The PR correctly resolves the org from SNI during auth, but then stores that result as userOrg[username] = orgID in controlplane/control.go. Later, orgRoutedSessionProvider.CreateSession in controlplane/flight_ingress.go routes by looking up userOrg[username].

That means the authenticated identity is not “this connection is alice in org-acme”; it becomes “the latest authenticated org for username alice.” Usernames are only unique inside an org, so two tenants can both have alice:

  1. alice authenticates to acme.<managed-suffix> and stores userOrg["alice"] = "org-acme".
  2. another alice authenticates to beta.<managed-suffix> and overwrites it with userOrg["alice"] = "org-beta".
  3. the first connection creates a Flight session and can now be routed to Beta’s worker stack.

This preserves the exact username-collision class the SNI-only identity change is meant to remove. SNI resolves the org correctly during auth, but the auth-to-session handoff drops the org and reintroduces a mutable global username -> orgID side channel.

Can we restructure this so the authenticated principal carries both orgID and username through to session creation? A few acceptable shapes would be: return/pass an authenticated principal from the Flight auth layer, key the handoff by a per-auth/session nonce, or include org identity in the Flight session key/context. The important part is that session routing must not depend on a bare username lookup.

@fuziontech fuziontech merged commit b3d0b81 into main Jun 1, 2026
22 checks passed
@fuziontech fuziontech deleted the feat/database-as-catalog-selection branch June 1, 2026 21:38
@fuziontech
Copy link
Copy Markdown
Member Author

Addressed the Flight SNI-org-routing blocker in #652 — now re-resolves the org from the connection's SNI (deleting the username→org map entirely), with tests proving two same-username connections route to their own org.

fuziontech added a commit that referenced this pull request Jun 1, 2026
…username map (#652)

* fix(controlplane): bind Flight sessions to the connection's SNI org, not a username map

Addresses the blocker raised in review of #651: Flight auth resolved the org
from SNI correctly, but stored the result in a process-global
`orgRoutedSessionProvider.userOrg[username] -> orgID` map that `CreateSession`
read back by bare username. Usernames are only unique within an org, so two
tenants sharing a username could race: connection A (org-acme) writes
`userOrg["alice"]=org-acme`, connection B (org-beta) overwrites it, and A's
session then gets created against org-beta's worker stack. This reintroduced the
exact username-collision class the SNI-only identity change was meant to remove.

Fix: derive the org for a session from the connection's managed hostname (SNI) —
the same immutable per-connection identity auth uses — re-resolved at
session-create time, and delete the username map entirely:

- orgRoutedSessionProvider gains an injected `resolveOrg(ctx) -> (orgID, ok)` and
  drops the `userOrg` map. CreateSession resolves the org from the request
  context's SNI and fails closed if it doesn't resolve.
- Production wires resolveOrg to ControlPlane.flightOrgFromContext (extract SNI
  from the gRPC peer → resolveFlightOrgFromSNI). The Postgres-side auth resolution
  is unchanged; the Flight validator now only authenticates and stores no routing
  state.
- Tests: prove two same-username connections route to their own org by context,
  and fail closed when the SNI doesn't resolve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* review: share SNIFromContext between auth and routing; harden CreateSession

Follow-up review hardening for the Flight org-routing fix:
- Export flightsqlingress.SNIFromContext and use it from both the auth path and
  controlplane session routing, deleting the duplicated SNI-extraction helper so
  auth and routing can never silently diverge on a connection's hostname.
- Fail closed with a clear error if an orgRoutedSessionProvider is ever
  constructed without an org resolver (defensive; the one production wiring always
  sets it).
- Add TestFlightOrgFromContextResolvesViaSNI: drives the real peer→TLS
  ServerName→extractOrgFromSNI→ResolveSNIPrefix chain (managed resolves; unknown
  prefix, unmanaged hostname, and missing peer all fail closed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants