Skip to content

Experimental postgres query (PR 3/4): multi-input + error formatting#5138

Merged
simonfaltum merged 39 commits intomainfrom
simonfaltum/postgres-query-pr3-multi-input
May 5, 2026
Merged

Experimental postgres query (PR 3/4): multi-input + error formatting#5138
simonfaltum merged 39 commits intomainfrom
simonfaltum/postgres-query-pr3-multi-input

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum commented Apr 30, 2026

PR Stack

  1. #5135 — PR 1: scaffold + autoscaling targeting + text output
  2. #5136 — PR 2: provisioned + JSON/CSV streaming + types + sqlcli.ResolveFormat
  3. PR 3 (this PR)#5138 — multi-input + multi-statement rejection + error formatting + sqlcli.Collect
  4. #5143 — PR 4: cancellation + timeout + TUI

Stacked on PR 2.

Why

PR 2 shipped a single-statement, single-input command. Real workflows want multi-input (set-then-query, file-then-stdin), multi-statement rejection with a friendly hint, and rich pg error formatting.

This PR also extends experimental/libs/sqlcli with input-collection logic shared by aitools and postgres. Same architectural principle as PR 2: instead of postgres growing its own duplicate of aitools' resolveSQLs, both commands now call sqlcli.Collect.

Changes

Architectural: experimental/libs/sqlcli/input.go adds:

  • sqlcli.SQLFileExtension const (.sql).
  • sqlcli.Input{SQL, Source} type — Source is the human-readable origin label ("--file PATH", "argv[N]", "stdin").
  • sqlcli.CollectOptions{Cleaner func(string) string} — aitools passes its cleanSQL (strips comments+quotes); postgres passes the default TrimSpace because its multi-statement scanner needs comments preserved.
  • sqlcli.Collect — files-first then positionals, stdin only when neither is present, .sql autodetect on positionals.

aitools' resolveSQLs collapses to a thin wrapper around sqlcli.Collect (drops the SQL strings, ignores Source). The "SQL statement #N is empty after removing comments" wording is replaced with sqlcli's argv[N] is empty; aitools tests updated.

User-facing changes for postgres query:

  • Variadic positionals + repeatable --file + stdin fallback.
  • Multi-statement strings rejected up front with a hint (the hand-written conservative scanner ignores ; inside string literals, identifiers, line/block comments, and dollar-quoted bodies; tag must be a valid unquoted identifier so $1 and $foo-bar$ are correctly NOT treated as tags).
  • Multi-input output: per-unit blocks for text; canonical-shape JSON array {"source","sql","kind","elapsed_ms",...} for json; csv rejected pre-flight when N>1.
  • Rich pg error formatting (SEVERITY: message (SQLSTATE XXXXX) with DETAIL/HINT lines), applied on both single-input and multi-input paths.

Single-input keeps streaming. runUnitBuffered is a thin wrapper around executeOne + a bufferSink, so the row-loop and error-wrapping logic stays in one place.

Test plan

  • go test ./experimental/... (multistatement scanner: 28 cases including dollar-tag punctuation rejection, sqlcli.Collect: 12 cases including a custom-cleaner test, error formatting, multi-input renderers including byte-equal canonical-shape and first-unit-fails framing)
  • go tool ... golangci-lint run ./experimental/... (0 issues)

Move the Postgres autoscaling and provisioned target-resolution helpers
out of cmd/psql/ into a shared package so a second consumer (the new
experimental postgres query command, in a follow-up commit) can reuse
the same SDK shapes. cmd/psql keeps its interactive UX by wrapping the
shared AutoSelect* helpers with errors.As fallbacks on AmbiguousError.

No behavior change for cmd/psql; existing acceptance tests pass.

Co-authored-by: Isaac
Scaffolds 'databricks experimental postgres query', a scriptable SQL
runner against a Lakebase Postgres autoscaling endpoint that does not
require a system psql binary.

This PR ships the smallest useful slice:

  - Single positional SQL statement.
  - --target (autoscaling resource path), --project, --branch,
    --endpoint targeting forms; provisioned-shaped targets return a
    pointer at 'databricks psql' for now.
  - Connect retry on idle/waking endpoints (08xxx SQLSTATE family,
    dial errors).
  - Text output (static table for rows-producing statements, command
    tag for command-only).

Provisioned support, JSON/CSV streaming output, multi-statement input,
cancellation, and integration tests come in follow-up PRs.

Driver: github.com/jackc/pgx/v5 v5.9.1 (MIT). Already a direct dep of
the universe monorepo's Lakebase services; aligning here keeps the SDK
surface consistent.

Co-authored-by: Isaac
- Replace exported PathSegmentProjects/ExtractID with focused helpers
  (ProjectIDFromName/BranchIDFromName/EndpointIDFromName); keeps SDK
  literals out of call sites.
- Type AmbiguousError.Kind as a typed enum (KindProject/Branch/Endpoint/Instance)
  so producers and the pluralisation switch stay in sync.
- Stop setting Choice.DisplayName when it equals the ID; Error() relies on
  empty-suppression rather than mixed empty/equal-to-ID checks.
- Add 57P03 (cannot_connect_now) to the connect-retry allow-list. Postgres
  emits this during server startup and Lakebase autoscaling can plausibly
  return it during the wake-up handshake. Tests exercise 57P03/57P01/57014
  to lock the boundary.
- Require --branch when --endpoint is set. The auto-select-then-look-up
  flow produces confusing errors when the auto-selected branch does not
  contain the requested endpoint, and this command is non-interactive so
  asking the user to be explicit is friendlier.
- Reject --max-retries < 1 explicitly instead of silently clamping. Help
  text already advertised the constraint; matching it at validation time
  is consistent with the repo's "reject incompatible inputs early" rule.
- Harmonise the "endpoint is not ready" error in cmd/psql to include the
  state, matching the experimental command and giving operators something
  to act on.
- Restore comments removed during the cmd/psql refactor and add a
  breadcrumb at the GetProvisioned call site about the Name patch.
- Add doc comments to AutoSelect* helpers documenting the returned
  string shape (trailing ID for autoscaling vs full name for provisioned).
- Reject trailing components after endpoint in ParseAutoscalingPath; new
  acceptance test in cmd/psql exercises this.
- Drop dead GroupID: "" assignment.

Co-authored-by: Isaac
- Fix selectAmbiguous: fall back to ID when DisplayName is empty.
  Round-1 fix to Choice semantics left producers emitting empty
  DisplayName for branches/endpoints/instances; the psql interactive
  selector passed that straight to cmdio.Tuple.Name and rendered
  blank rows. Add the documented fallback.
- Drop unused BranchIDFromName / EndpointIDFromName exports; only
  ProjectIDFromName has callers in this PR. Re-add when first consumed.
- Convert chained ifs in isRetryableConnectError to a switch.

Co-authored-by: Isaac
This is PR 2 of the experimental postgres query stack. Builds on PR 1's
scaffold to fill in the rest of the single-input output story.

Provisioned support: --target accepts both autoscaling resource paths
(starts with "projects/") and provisioned instance names (everything
else). Granular --project/--branch/--endpoint targeting stays
autoscaling-only. resolveProvisioned validates the instance is in the
AVAILABLE state and has read/write DNS before issuing a token.

Output renderers are now sinks fed by executeOne row-by-row instead
of buffering. textSink keeps buffering (tabwriter needs the widest
cell to align); jsonSink and csvSink stream. jsonSink uses
separator-before-element writing throughout so a mid-stream error can
close the array cleanly via OnError, leaving stdout as parseable JSON
with a partial result.

JSON value rendering follows the typed mapping: numbers stay numeric
inside +- 2^53, become strings outside; NaN/Inf become "NaN"/"Infinity"/
"-Infinity"; timestamps render in RFC3339; jsonb passes through as
json.RawMessage so e.g. {"id": 9007199254740993} keeps its digits;
bytea base64-encodes; everything else falls back to canonical Postgres
text. CSV and text use Postgres' canonical text representation, with
NULL rendered as the literal "NULL" in text and as empty in CSV
(matches psql --csv).

Output mode auto-selection mirrors aitools query: --output text on a
non-TTY stdout falls back to JSON. DATABRICKS_OUTPUT_FORMAT is honoured
when --output is not explicitly set; invalid env values are silently
ignored. Duplicate column names are deterministically renamed (id,
id__2, id__3) with a stderr warning.

Acceptance: argument-errors loses the now-obsolete "provisioned not
yet supported" case; new provisioned-targeting test exercises
not-AVAILABLE / no-DNS / 404 paths via the SDK testserver mock.

Co-authored-by: Isaac
- Fix non-finite float text: textValue had no float branch and fell
  through to fmt.Sprintf, which emits +Inf/-Inf instead of Postgres'
  Infinity/-Infinity. Added the explicit float case + tests.
- Emit JSON object keys in column order, not alphabetical. The map
  approach inadvertently sorted keys; switched to manual ordered
  emission (write '{', encode key:value pairs in column order, write
  '}'). Added a regression test with non-alphabetical column names.
- Honor --output text on a pipe instead of silently rewriting to JSON.
  Repo rule says "reject incompatible inputs early; never silently
  ignore a flag the current mode can't honor". Auto-fallback now only
  fires when the flag was not explicitly set (or not pinned by env).
- Trim impossible Go types from jsonValue (pgx never decodes int8 /
  uint8/16/32 / uint64 from PG columns).
- Drop the redundant ReadWriteDns guard in resolveProvisioned; an
  AVAILABLE Lakebase instance is documented to have DNS, and cmd/psql
  doesn't carry the same guard.
- Build the unsupported-format error from allOutputFormats so the
  message stays in sync if a fourth format is added.
- Update execute.go's QueryExecModeExec doc to acknowledge that we now
  call rows.Values() (not RawValues), so all sinks see Go-typed input.
- Collapse empty rows-producing JSON to "[\n]\n" and matching OnError.
- Add stderr warning helper (commandTagRowCount now covered for
  MERGE/COPY/FETCH/MOVE).
- Test gaps: text +Inf, text finite float, JSON column order, OnError
  for csv/text sinks, CSV with embedded newline + quote.

Co-authored-by: Isaac
- Doc fix: textSink.OnError doc said "prints whatever rows have been
  collected" but text mode buffers everything to End. New doc states
  the buffered partial result is discarded on iteration error.
- Doc fix: textValue float comment overstated psql parity. Tightened
  to acknowledge Go's 'g' format may differ from psql in exponential
  vs fixed notation around the boundary.
- Tighten OnError contract: explicitly states it is NOT called when
  Begin itself errors.
- Replace switch-by-format in isKnownOutputFormat with slices.Contains
  on allOutputFormats so adding a fourth format is one edit.
- Tighten command-only JSON tests from JSONEq (key-order ignored) to
  byte-equal so a future field addition is caught.
- Tighten JSONSink_OnError tests to byte-equal; add the
  Begin-but-no-rows case which exercises the rowsWritten==0 branch.

Co-authored-by: Isaac
This is PR 3 of the experimental postgres query stack. Adds the rest
of the input ergonomics promised in the plan and the error-formatting
polish.

Inputs: positional args become variadic, --file is repeatable, stdin
is read when neither is present, and a positional ending in '.sql'
that exists on disk is treated as a SQL file. Execution order is
files-first then positionals (cobra/pflag does not preserve interleaved
spelling, documented in --help).

Each input unit must contain exactly one statement. checkSingleStatement
walks the SQL with a hand-written conservative scanner that ignores
';' inside single-quoted strings, double-quoted identifiers, line
comments, block comments, and dollar-quoted bodies. Multi-statement
strings are rejected before connect with a hint pointing at the
multi-input alternatives.

Multi-input output:

  - text: each per-unit result rendered inline, separated by a blank
    line (mirrors psql's compact text shape).
  - json: top-level array of per-unit result objects with shape
    {"sql","kind","elapsed_ms",...}; rows-producing units carry a
    "rows":[...] array, command-only carry "command"+"rows_affected".
    Each per-unit object is buffered to completion before write; the
    outer array streams across units. The plan accepts this trade-off:
    huge SELECTs in multi-input invocations buffer.
  - csv: rejected pre-flight when N>1 (no sensible cross-schema shape).
    Single-input csv keeps streaming.

Per-unit errors render as a {"kind":"error", ...} entry in the JSON
shape so scripts can detect failure without checking exit code.
Sequential execution stops on the first failing unit; the successful
prefix is rendered.

formatPgError renders *pgconn.PgError with SEVERITY, SQLSTATE, DETAIL,
HINT inline. Non-PgError values pass through unchanged so connect-time
errors keep their original wording.

Single-input keeps the streaming sinks from PR 2; only multi-input goes
through the buffered renderer. Session state (SET, temp tables) carries
across input units because they share one connection.

TUI for >30 rows is deferred to a follow-up. The current text path uses
the static tabwriter table for both single- and multi-input.

Co-authored-by: Isaac
MUSTs:
- Multi-input JSON error envelope: thread the failing *unitResult into
  renderJSONMulti so source/sql/elapsed_ms reflect the actual failing
  input instead of empty strings.
- Canonical key order for every per-unit object:
    {"source", "sql", "kind", "elapsed_ms", payload}
  Success and error envelopes now share the same shape so consumers
  don't have to special-case kind=="error" for missing fields.

SHOULDs:
- Single-input path now goes through formatPgError, so DETAIL/HINT
  surface consistently across single- and multi-input.
- runUnitBuffered reuses executeOne via a new bufferSink. The two
  query loops collapse to one; future error-handling changes auto-
  propagate.
- Scanner: reject `$<digit>...` as a dollar-quote tag (PG docs forbid
  digit-leading tags). Pinned with a test for `SELECT $1, $2 FROM t`
  and `SELECT $1 FROM t; SELECT 2`.
- Pin the E-string over-rejection behaviour with a test, so a future
  scanner improvement has to update the assertion.

CONSIDERs:
- Capture elapsed_ms on the error path too (was previously discarded).
- Promote multiStatementHint to a const.
- Drop jsonEscapeShort (was a fragile micro-opt for an always-ASCII
  domain); use marshalJSON for the command verb instead.
- Add TestRenderJSONMulti_FirstUnitFails to pin the empty-success-
  prefix framing.

Co-authored-by: Isaac
Round-2 reviewer noted jsonErrorObject's defensive branches around
writeJSONUnitHeader/marshalJSON are unreachable (encoding/json doesn't
error on string inputs), and the repo rule says drop "just in case"
fallbacks. Replace with panic-on-impossible helpers.

Co-authored-by: Isaac
…imental

Two reductions for an experimental command, per a maintainer comment:

- Move libs/lakebase/target into experimental/postgres/cmd/internal/target
  so the experiment is self-contained. cmd/psql is no longer touched (no
  refactor, no behavior change). When/if this command graduates from
  experimental, that's the right time to extract the shared package.
- Drop acceptance tests for the new command. Aitools (the other
  experimental command) has none either; locking down user-visible
  wording for an experimental surface is overinvestment. Unit tests
  still cover argument validation, retry classification, and rendering.
  Acceptance tests can be added when the command graduates.

Net diff on cmd/psql is now zero. The experiment lives entirely under
experimental/postgres/cmd/.

Co-authored-by: Isaac
…um/postgres-query-pr2-streaming

# Conflicts:
#	acceptance/cmd/experimental/postgres/query/argument-errors/output.txt
#	acceptance/cmd/experimental/postgres/query/argument-errors/script
…tum/postgres-query-pr3-multi-input

# Conflicts:
#	acceptance/cmd/experimental/postgres/query/argument-errors/output.txt
#	acceptance/cmd/experimental/postgres/query/argument-errors/script
Both aitools query and postgres query had near-identical output-mode
selection: same DATABRICKS_OUTPUT_FORMAT env var, same flag-vs-env
precedence, same staticTableThreshold=30, same Format type with
text/json/csv values.

Promote the shared bits to experimental/libs/sqlcli:

  - sqlcli.EnvOutputFormat, sqlcli.StaticTableThreshold consts
  - sqlcli.Format typedef + sqlcli.OutputText/JSON/CSV consts
  - sqlcli.AllFormats slice (canonical order for completions)
  - sqlcli.ResolveFormat: handles flag > env > default precedence with
    the explicit-text-on-pipe-is-honoured rule

Both consumers now import sqlcli. The package lives under
experimental/libs/ rather than libs/ so it inherits the experimental-
stability guarantee of its consumers; when both commands graduate, the
package can be promoted alongside.

The aitools migration is a pure refactor (no behavior change). The
postgres command's output.go and output_test.go are deleted; tests
moved to experimental/libs/sqlcli.

Co-authored-by: Isaac
…tum/postgres-query-pr3-multi-input

# Conflicts:
#	experimental/postgres/cmd/query.go
@arsenyinfo
Copy link
Copy Markdown
Contributor

Invalid dollar-quote tag character validation allows multi-statement SQL to bypass the single-statement guard

  • Priority: P2
  • Location: experimental/postgres/cmd/multistatement.go:136-153 (readDollarTag), called from multistatement.go:69-76 (checkSingleStatement), guard enforced at experimental/postgres/cmd/query.go:127-135
  • Scenario: User passes input such as SELECT $foo-bar$a;b$foo-bar$. readDollarTag accepts foo-bar as a valid tag because it only rejects a digit-starting first character or whitespace/; anywhere; it does not reject punctuation like -. checkSingleStatement then calls scanDollarBody and skips over the ; as if it were inside a dollar-quoted body. PostgreSQL does not allow - in a dollar-quote tag (tags must follow unquoted-identifier rules: letters, digits, underscores only), so PostgreSQL would actually parse this as two statements. The pre-flight guard passes, the SQL is sent to the server, and either two statements execute or a parse error is returned — violating the guard's contract.
  • Potential solution: After the digit-start check (line 144), validate every subsequent tag character against PostgreSQL identifier-continuation rules (Unicode letter, digit, or underscore); return ("", start) on the first character that fails. This prevents punctuation-containing pseudo-tags from being treated as real dollar-quote openers.

🔍 Reviewed by nitpicker

…sion, control-char escape

Three P2 findings from the nitpicker bot, all in code introduced or
strengthened in this PR:

- stdoutTTY now uses cmdio.IsOutputTTY (a new tiny public helper that
  wraps the existing private isTTY) instead of cmdio.SupportsColor.
  SupportsColor folds in NO_COLOR / TERM=dumb, which are colour
  preferences and have nothing to do with whether stdout is a pipe;
  using it for the auto-fall-back-to-JSON decision silently demoted
  interactive text output to JSON for users with NO_COLOR set on a
  real terminal. IsOutputTTY is the right primitive for this.
- jsonSink dup-column rename: the previous logic generated id__2 for
  the second `id` without checking whether id__2 was already taken by
  the original column list. A query returning ["id", "id__2", "id"]
  produced two id__2 keys. Now we keep bumping the suffix until unique.
- textSink escapes \t, \n, \r in cell values before tabwriter sees
  them. tabwriter uses \t as a column boundary and \n as a row
  boundary, so an embedded tab silently shifted subsequent columns and
  an embedded newline split a logical row across multiple output lines.
  psql does the same backslash-letter escape.

Co-authored-by: Isaac
…caffold' into simonfaltum/postgres-query-pr2-streaming
…treaming' into simonfaltum/postgres-query-pr3-multi-input
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 09:11 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 09:11 — with GitHub Actions Inactive
bufferSink.Begin stashed the FieldDescription slice it was handed. pgx
reuses that slice's backing array across queries on the same connection
(pgConn.fieldDescriptions is a fixed-size buffer that's re-sliced per
statement), so each buffered unit's Fields ended up pointing at the LAST
query's row description. The multi-input renderers then emitted the wrong
column names for every unit but the last.

Clone the slice so each buffered unit owns its column descriptions.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 09:13 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 09:13 — with GitHub Actions Inactive
The golangci-lint nosprintfhostport check flags fmt.Sprintf with %s:%d
for host:port in URLs. Switch to net.JoinHostPort.

Co-authored-by: Isaac
…caffold' into simonfaltum/postgres-query-pr2-streaming
…treaming' into simonfaltum/postgres-query-pr3-multi-input
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 11:14 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 11:14 — with GitHub Actions Inactive
The previous "Connecting to ..." line went to stderr but stayed in the
terminal forever, even after results arrived. Use cmdio.NewSpinner so the
status disappears once the connection succeeds.

Co-authored-by: Isaac
…caffold' into simonfaltum/postgres-query-pr2-streaming
…treaming' into simonfaltum/postgres-query-pr3-multi-input
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 12:38 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 12:38 — with GitHub Actions Inactive
simonfaltum added a commit that referenced this pull request May 5, 2026
## PR Stack

1. **PR 1 (this PR)** —
[#5135](#5135) — scaffold +
autoscaling targeting + text output
2. [#5136](#5136) — PR 2:
provisioned + JSON/CSV streaming + typed values
3. [#5138](#5138) — PR 3:
multi-input + multi-statement rejection + error formatting
4. [#5143](#5143) — PR 4:
cancellation + timeout + TUI

## Why

Talking to Lakebase Postgres from a script today goes through one of two
unpleasant paths:

1. **Shell out to `databricks psql -- -c "SQL"`.** Works on macOS /
Linux when psql is installed. Fails on Windows 11 by default and on
minimal containers / sandboxed CI. No JSON / CSV.
2. **Write Python with `psycopg`.** Forces every consumer to manage
OAuth tokens, SSL mode, autocommit, etc.

This series adds a third path: a native CLI command that runs SQL
against any Lakebase endpoint, returns results in text/JSON/CSV (later
PRs), and works without a system psql.

`databricks psql` keeps owning the interactive REPL surface; this PR
does **not** touch psql.

## Changes

**Before:** No CLI command runs SQL against Lakebase from Go. Users
either shell out to `psql` (requires the system binary) or write
`psycopg` glue.

**Now:** `databricks experimental postgres query --target
projects/foo/branches/main/endpoints/primary "SELECT 1"` returns a
text-rendered result. Provisioned instances and richer output formats
land in follow-up PRs.

The experimental command is fully contained under
`experimental/postgres/cmd/`:

- `experimental/postgres/cmd/cmd.go`, `query.go`, `targeting.go`,
`connect.go`, `execute.go`, `render.go` — command implementation.
- `experimental/postgres/cmd/internal/target/` — Lakebase target
resolution helpers (parsing, SDK wrappers,
auto-select-when-exactly-one). Internal sub-package so it can't
accidentally be imported from outside the experiment. When/if this
command graduates from experimental, that's the right time to consider
extracting to `libs/`.

Single positional SQL, autoscaling targeting only (`--target`,
`--project`, `--branch`, `--endpoint`), `--max-retries`,
`--connect-timeout`, `--database`. Driver is `github.com/jackc/pgx/v5
v5.9.1` (MIT). Connect retry uses a typed predicate (08xxx SQLSTATE
family + `57P03` cannot_connect_now + `net.OpError` with `Op ==
"dial"`); auth (28xxx) and permission (42501) errors do not retry. Text
rendering is buffered (no streaming yet); rows-producing vs command-only
is decided at runtime via `FieldDescriptions()`.

Outside the experimental tree, this PR only:
- Registers the command in `cmd/experimental/experimental.go` (2 lines).
- Adds the pgx direct dependency (`go.mod` SPDX annotation, `NOTICE`
entry, `NEXT_CHANGELOG.md` dependency-updates entry).

`pgx` is already a direct dep of the universe monorepo's Lakebase
services; aligning here keeps the SDK surface consistent.

## Test plan

- [x] `go test ./experimental/postgres/...` (target parsing,
validateTargeting, retry classification, render)
- [x] `go test ./internal/build/...` (license + NOTICE completeness)
- [x] `go tool ... golangci-lint run ./experimental/postgres/...` (0
issues)
- [x] `./task checks` (whitespace, links, deadcode)
…query-pr2-streaming

# Conflicts:
#	experimental/postgres/cmd/execute.go
#	experimental/postgres/cmd/query.go
#	experimental/postgres/cmd/render.go
#	experimental/postgres/cmd/render_test.go
#	experimental/postgres/cmd/targeting.go
…treaming' into simonfaltum/postgres-query-pr3-multi-input
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 15:03 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 15:03 — with GitHub Actions Inactive
simonfaltum added a commit that referenced this pull request May 5, 2026
…g + types (#5136)

## PR Stack

1. [#5135](#5135) — PR 1: scaffold
+ autoscaling targeting + text output
2. **PR 2 (this PR)** —
[#5136](#5136) — provisioned +
JSON/CSV streaming + typed values + `experimental/libs/sqlcli` for
output handling
3. [#5138](#5138) — PR 3:
multi-input + multi-statement rejection + error formatting
4. [#5143](#5143) — PR 4:
cancellation + timeout + TUI

Stacked on PR 1.

## Why

Two things in this PR. The user-facing one: postgres query learns
JSON/CSV streaming and provisioned-instance support. The architectural
one: aitools query and postgres query had near-identical output-mode
handling (same env var, same flag/env precedence, same threshold).
Promote the duplication to a shared `experimental/libs/sqlcli` package
before the second consumer ossifies the divergence.

## Changes

**Architectural:** `experimental/libs/sqlcli/` is a new package under
`experimental/libs/` (not `libs/`) so it inherits the
experimental-stability guarantee of its consumers. Exposes:

- `sqlcli.EnvOutputFormat`, `sqlcli.StaticTableThreshold` constants.
- `sqlcli.Format` typedef + `sqlcli.OutputText/JSON/CSV` consts +
`sqlcli.AllFormats`.
- `sqlcli.ResolveFormat` — flag > env > default precedence with the
explicit-text-on-pipe-is-honoured rule.

aitools query migrates to use sqlcli (pure refactor, no behavior
change). postgres query was about to add its own copy of all of this;
instead it uses sqlcli from day one.

**User-facing changes for postgres query:**

- `--target my-instance` now resolves a provisioned instance.
- `--output json` streams typed values: numbers stay numeric, jsonb
stays nested, NaN/Inf/bigints-outside-2^53 become strings.
- `--output csv` streams (no buffering).
- `DATABRICKS_OUTPUT_FORMAT` honoured.
- Auto-fallback to JSON when stdout is piped.
- Duplicate column names get deterministic `__N` suffixes with a stderr
warning.

Also adds `cmdio.IsOutputTTY` (a small public wrapper around the
existing private `isTTY`) so commands can ask "is stdout a terminal?"
without folding in `NO_COLOR` / `TERM=dumb` (both of which
`cmdio.SupportsColor` ANDs in for the colour-rendering decision).

## Test plan

- [x] `go test ./experimental/aitools/... ./experimental/postgres/...
./experimental/libs/...` (unit, sinks, value mapping, format selection,
aitools tests still pass after migration)
- [x] `go tool ... golangci-lint run ./experimental/...` (0 issues)
Base automatically changed from simonfaltum/postgres-query-pr2-streaming to main May 5, 2026 15:39
…query-pr3-multi-input

# Conflicts:
#	experimental/postgres/cmd/query.go
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 15:42 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 15:42 — with GitHub Actions Inactive
@simonfaltum simonfaltum merged commit 67d4a5e into main May 5, 2026
23 of 24 checks passed
@simonfaltum simonfaltum deleted the simonfaltum/postgres-query-pr3-multi-input branch May 5, 2026 18:54
simonfaltum added a commit that referenced this pull request May 6, 2026
## PR Stack

1. [#5135](#5135) — PR 1: scaffold
+ autoscaling targeting + text output
2. [#5136](#5136) — PR 2:
provisioned + JSON/CSV streaming + types
3. [#5138](#5138) — PR 3:
multi-input + multi-statement rejection + error formatting
4. **PR 4 (this PR)** —
[#5143](#5143) — cancellation +
timeout + TUI for >30 rows

Stacked on PR 3.

## Why

PR 3 finished the input ergonomics. The remaining commitments before
this command earns the "experimental" label:

- A long SELECT shouldn't survive Ctrl+C. Today the pgx default closes
the TCP socket but leaves the server-side query running.
- CI scripts want `--timeout` so a single statement can't pin a runner.
- Interactive users with >30 rows benefit from a scrollable browser
instead of a wall of text.

## Changes

**Before:** Ctrl+C tears down TCP but the query runs to completion
server-side. `--timeout` doesn't exist. >30 rows scroll past in the
terminal.

**Now:** Ctrl+C cancels the in-flight query at the server. `--timeout
30s` enforces a per-statement deadline. >30 rows on a TTY open the
libs/tableview viewer.

Specifically:

- **Cancellation watcher.** `buildPgxConfig` now installs
`CancelRequestContextWatcherHandler` with `CancelRequestDelay=0,
DeadlineDelay=5s`. Zero `DeadlineDelay` would race the cancel-request
and could leave the connection unusable; 5s gives the cancel round-trip
time to land.
- **Signal handler.** Per-invocation goroutine watches SIGINT and
SIGTERM. Calls cancel; defer'd stop drains the channel.
- **--timeout flag.** Per-statement `context.WithTimeout` child of the
signal-scoped ctx. `reportCancellation` distinguishes Ctrl+C ("Query
cancelled."), timeout ("Query timed out after Xs."), and plain query
errors. Returns `(msg, invocationScoped)` so the multi-input loop can
drop the source prefix on user-cancel.
- **TUI for >30 rows.** `textSink` now has an `interactive` mode;
`runQuery` enables it when stdout is a prompt-capable TTY. Static
tabwriter table for small results and pipes; libs/tableview for big
interactive ones. If `tableview.Run` fails (TUI startup, terminal resize
race) the sink falls through to the static tabwriter path so the user
still sees the rows.

Integration tests aren't included: aitools (the other experimental
command) doesn't have them either. Acceptance + unit tests cover
argument validation, targeting resolution (SDK-mocked), and output
shapes; cancellation/timeout are unit-tested via the seam in
`cancel_test.go`. Real-wire integration tests are the right addition
when this command graduates from experimental.

## Test plan

- [x] `go test ./experimental/postgres/...` (cancel/timeout/signal
helpers, race-precedence pinning)
- [x] `go test ./acceptance -run
TestAccept/cmd/(psql|experimental/postgres)` (no regressions)
- [x] `go tool ... golangci-lint run ./experimental/postgres/...` (0
issues)
- [x] Manual: Ctrl+C during `SELECT pg_sleep(60)` cancels the
server-side query within ~5s. (Smoked on
`chatbot-lakebase-dev-simon-faltum` (e2-dogfood): SIGINT to `SELECT
pg_sleep(60)` exited the client in 0.53s with `Error: Query cancelled.`;
subsequent `pg_stat_activity` query returned zero rows.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants