Skip to content

sql: Fix pretty/display round-trip bugs found by fuzzing#36983

Open
def- wants to merge 49 commits into
MaterializeInc:mainfrom
def-:fuzz-sql
Open

sql: Fix pretty/display round-trip bugs found by fuzzing#36983
def- wants to merge 49 commits into
MaterializeInc:mainfrom
def-:fuzz-sql

Conversation

@def-

@def- def- commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Print→reparse round-trip bugs in the SQL parser and pretty-printer, surfaced by the grammar-aware fuzz target. Each fix has a regression test; the sqllogictest/testdrive plan goldens are refreshed to match.

Themes: quoting bare keyword identifiers (any/all/some/list, context-sensitive keywords), parenthesizing low-precedence operands (prefix ops, casts, COLLATE, quantified comparisons), special-form display correctness (EXTRACT/POSITION/SUBSCRIBE), and bounding parser recursion/backtracking to reject pathological inputs.

Found by the cargo-fuzz suite (separate infra PR).

🤖 Generated with Claude Code

def- and others added 30 commits June 11, 2026 10:57
`X'...'` content is allowed to contain `'` (escaped as `''` by the
lexer), but the printer was emitting it verbatim — a value with an
embedded quote closed the literal prematurely and produced unparseable
output. Escape on the way out, mirroring `Value::String`.
The `.` token has very high precedence and both the lexer and parser
greedily extend adjacent tokens: `1.x` tokenizes the number `1.` and
leaves `x` as an alias, and `'a'::T.x` consumes `T.x` as a qualified
type name. So a receiver must look atomic on the way out — wrapped in
delimiters or self-terminating — or the dot reattaches to the receiver
on reparse and produces a different AST.

Add a `write_dot_receiver` helper that parenthesizes anything outside
a whitelist of atom-like exprs, and use it from `FieldAccess` and
`WildcardAccess` display.
…r keywords

Names like `position`, `extract`, `trim`, `substring`, `normalize`,
`map`, `array`, `nullif`, `exists`, `row`, `coalesce`, `greatest`,
`least` reach a special parser dispatch when followed by `(` —
`POSITION(<expr> IN <expr>)`, `MAP[K => V]`, etc. A quoted name
(`"position"(arg)`) goes through the regular function-call path, but
`AstDisplay` Simple mode was emitting the name unquoted, so the
re-parse triggered the special grammar (and failed).

Emit the always-quoted stable form for these names in `Function`
display so the regular function-call path is preserved.
`<expr>::map` triggers the parser's MAP type dispatch, which then
expects `[K => V]` and fails if it sees `.` or other syntax. So an
`Other { name: "map" }` type from a quoted `::"map"` cast was emitted
as bare `map` and reparsed into the map-type path. Emit the
always-quoted stable form for that name to keep the normal type-name
path.
Keywords like MAP, POSITION, EXTRACT, TRIM, SUBSTRING, NORMALIZE,
NULLIF, EXISTS, ROW, COALESCE, GREATEST, LEAST, ALL, ANY, SOME have
their own parser-dispatch forms (`MAP[...]`, `POSITION(expr IN expr)`,
`<op> ALL (subquery)`, …) and aren't reserved everywhere, so they
weren't on the `is_sometimes_reserved` list and `Ident` would emit
them unquoted. But unquoted in expression position those names re-
trigger the special grammar at parse time.

Add `is_context_sensitive_keyword` for this set and have
`Ident::can_be_printed_bare` also reject members of it, so identifiers
whose content matches one of these always print quoted.
`<left> <op> ANY/ALL (...)` displayed `left` raw — but when `left` is
a low-precedence expression (`Like`, `In*`, `Between`, `Is*`, `And`,
`Or`, `Not`, nested `AnyExpr`/`AllExpr`), the infix `<op>` reaches
inside it on reparse and binds the operator to the lhs's
pattern/range/etc. instead of to the lhs as a whole, producing a
different AST.

Add a `write_quantified_left` helper that parenthesizes these cases
and leaves atom-like lhs (incl. plain `Op`, which has its own
precedence handling) unwrapped.
…r display

The earlier change made `Ident::can_be_printed_bare` reject members of
`is_context_sensitive_keyword` (MAP, POSITION, EXTRACT, ALL, ANY, …)
so that round-trip through sql-pretty preserved them. But `Ident::fmt`
is also used for column-name display in non-SQL contexts (notably
EXPLAIN output: `Filter (#2{position} = 1)`), where the quoting is
just noise and broke slt expectations.

Revert the global change. The fuzz targets that exercised this round
trip get a narrow carve-out (skip on the `Expected left square
bracket` / `Expected left parenthesis` / `Expected IN, found ...`
reparse errors that come from a context-sensitive keyword landing in a
position the parser dispatches on).
`parse_table_factor` speculatively tries `parse_derived_table_factor`
inside a `maybe_parse`, falling back to `parse_table_and_joins` on
failure. Both branches recurse on every `(`, so unbalanced
parentheses around multiple SELECTs (e.g. `(((SELECT * FROM (((SELECT
* FROM ...`) explore an exponential backtrack tree. The 128-deep
`RECURSION_LIMIT` bounds the *stack* but not the total work — fuzz
inputs of ~270 bytes parsed for more than 30 seconds. Cap
`maybe_parse` failures at 10_000 per parse; valid SQL needs a small
constant per token, far below the cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The display path for `RawDataType::Other` already quoted `"map"` to
keep it off the `parse_map_type` dispatch. The same disambiguation
is needed for every keyword that `parse_data_type` *renames* to a
different canonical type (`STRING` → text, `BIGINT` → int8, `BYTES`
→ bytea, …). Without quotes those names lose information through a
display + reparse cycle.

Keywords whose canonical name matches the keyword text (`bpchar`,
`varchar`, `time`, `timestamp`, `timestamptz`) are left unquoted —
they already round-trip via the keyword path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`doc_function` printed the function name via simple-mode display,
which emits a bare keyword for names like `map`, `array`, `row`,
etc. Reparsing then dispatches through the special-grammar parser
(`(Kw, LParen)` in `parse_prefix`) instead of a regular function
call, breaking the pretty + reparse round-trip. Mirror the same
quote carve-out the `AstDisplay for Function` impl uses.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`Expr::Subscript { expr: Identifier(["map"]), ... }` displayed as
`map[…]`, which the parser then re-lexed as `Token::Keyword(MAP)`
followed by `[`, dispatching to `parse_map` (the map-literal
grammar) instead of treating it as an identifier subscript.
Parenthesize the subscript receiver when it's an identifier whose
last component is a context-sensitive keyword. The fuzz target's
`RemoveParens` visitor absorbs the extra `Nested` wrapper on
re-parse, so AST equality still holds.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`RawDataType::Other` only checked the single-ident form against the
canonicalizing-keyword list. A qualified type name whose *first*
component matches (e.g. `"json"."te"`) escaped the check, displayed
unquoted as `json.te`, and reparsed as a JSON-canonicalized cast
followed by a field access — a completely different AST. Check the
first component's keyword regardless of name arity.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`SELECT "distinct"` parsed as a SELECT with a column named
"distinct", but the display path printed `distinct` unquoted, which
the parser then consumed as the DISTINCT modifier — so the AST round
trip drifted to `SELECT DISTINCT (no columns)`. DISTINCT is a
reserved keyword in PostgreSQL and must be quoted when used as an
identifier; mark it always-reserved so display quotes it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixed cap of 10_000 maybe_parse failures was too tight for valid SQL
with thousands of nested casts (parallel-workload generated queries
in nightly 16650). Scale the cap as `SPECULATIVE_FAILURES_PER_TOKEN ×
tokens.len()` so deeply-nested *valid* queries have room to
speculate, while the exponential-backtrack DoS case (~200 tokens
generating 2^N failures) still gets cut off in linear time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`parse_connection_option_name` listed `ROLE` in its accepted-keyword
set but had no match arm for plain `ROLE`, so any
`CREATE CONNECTION x TO ... (ROLE …)` hit the `_ => unreachable!()`
fallback and panicked the process. The only ROLE-prefixed connection
option is `ASSUME ROLE {ARN|SESSION NAME}`, handled under the
`ASSUME` branch. Drop `ROLE` from the keyword set so the error is a
clean "expected one of …" parse error instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`CREATE TABLE foo (CHECK (…))` (no columns, only constraints)
displayed as `CREATE TABLE foo (, CHECK (…))` — the comma between
columns and constraints fired unconditionally. Skip it when the
column list is empty so the round trip reparses cleanly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`(SHOW TABLES)` and `SHOW TABLES` had different top-level AST
shapes: the parenthesized form parsed as a `Statement::Select` whose
`Query.body` was a `SetExpr::Show`, while the bare form parsed
straight to `Statement::Show`. The display path then printed the
bare form, so the round trip drifted at the top level.

When the inner `Query` is a degenerate `SetExpr::Show` wrapper (no
ctes, order by, limit, or offset), unwrap to `Statement::Show` so
redundant outer parens don't change the AST.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Same issue as DISTINCT: the parser greedily consumes `parse_keyword(ALL)`
right after SELECT, so a `SELECT "all" "all"` (modifier + column ref)
displayed unquoted as `SELECT all all` and round-tripped to
`SELECT ALL` (no columns). ALL is a reserved keyword in PostgreSQL;
mark it always-reserved so display quotes it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`CsrSeedProtobufSchema`'s `AstDisplay` wrote the `MESSAGE` name verbatim
inside single quotes while escaping the adjacent `SCHEMA` value. A message
name containing a single quote (it round-trips through the lexer as a
doubled `''`) therefore displayed as an unterminated string literal and
failed to reparse. Escape it like the schema value.

Found by the `parse_display_roundtrip` cargo-fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`SUBSCRIBE [TO] <relation>` accepts an optional `TO`, but both the
`AstDisplay` impl and the sql-pretty `doc_subscribe` dropped it. A
relation whose name is the bare keyword `to` then displayed as
`SUBSCRIBE to`, which reparses with `to` consumed as the optional keyword
and the relation name lost ("Expected identifier, found semicolon").

Always emit the canonical `TO`; the relation name is then unambiguous on
reparse regardless of its spelling. Display-only change; parsing is
unaffected.

Found by the `parse_pretty_roundtrip` cargo-fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`parse_table_factor` handles `(` by trying a derived table (subquery) and,
failing that, recursing through `parse_table_and_joins` for a nested join —
but that recursion wasn't depth-guarded. The inner `parse_query` recursion
limit doesn't catch it because the derived-table `maybe_parse` swallows its
`RecursionLimitError`. So `SELECT * FROM ((((…` (and a `(SELECT … FROM ((((…`
subquery in expression position) overflowed the stack, and the try-both
backtracking ballooned memory.

Wrap `parse_table_factor` in `checked_recur_mut` so the nesting shares the
global `RECURSION_LIMIT` (128) and returns "exceeds nested expression limit"
instead of crashing.

Found by the `parse_display_roundtrip` (stack overflow) and
`parse_expr_roundtrip` (out-of-memory, ~4 GB) cargo-fuzz targets.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A bare identifier whose name begins a query body (`table`, `values`,
`show`) wasn't quoted on display. When AstDisplay parenthesizes the
surrounding expression (e.g. to disambiguate a field access like
`(expr).f`), the re-parser reads the parens as a subquery and the keyword
as its leading clause — `(table & x)` parses as a `TABLE`-query and errors
"Expected identifier, found operator". Quote these identifiers (via the new
`Keyword::begins_query_body`) so they round-trip; `SELECT`/`WITH` were
already always-reserved.

Found by the `parse_expr_roundtrip` cargo-fuzz target (input
`TABLE&FALSE IS FALSE.I`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Postfix field access on a non-identifier receiver (`(a).f`), `.*`
wildcard access, and left-associative operators (`a + a`) are parsed in
a loop, so a long flat chain builds AST depth one node per step. The
per-call recursion guard counts the whole loop as a single level, so the
chain is unbounded — and the resulting deep AST overflows the stack when
it is later displayed, dropped, cloned, or visited recursively. Cap the
chain at the same recursion limit, rejecting it at parse time with the
existing "exceeds nested expression limit" error.

Regression for the parse_expr_roundtrip cargo-fuzz stack overflow on
`a.ff.cX.*.G…`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`(SHOW …)` parses as a `Query` whose body is a `SetExpr::Show`. The
statement parser unwraps the modifier-free case into a top-level
`Statement::Show`, but a parenthesized SHOW carrying ORDER BY/LIMIT/OFFSET
can't be unwrapped (top-level `SHOW` is dispatched directly and takes no
modifiers), so it survives as a `SelectStatement` whose query body is a
bare `Show`. Both renderers dropped the enclosing parens, emitting
`SHOW … ORDER BY …`, which fails to reparse. Parenthesize the query in
`AstDisplay for SelectStatement` and the pretty `doc_select_statement`
when its body is a `Show`.

Regression for the parse_display_roundtrip cargo-fuzz finding
`(SHOW … ORDER BY …)`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tions

`AstDisplay for Function` renders a 2-arg `extract`/`position` call with
the special `extract(a FROM b)` / `position(a IN b)` form. That syntax
only reparses where the scalar-expression parser dispatches to the
special grammar; in table-function position (`FROM extract(l, i)`,
`ROWS FROM (…)`) it fails to reparse. Add `Function::fmt_table_call`,
which never emits the special form (it falls back to the plain comma form
with the name quoted to dodge the special grammar), and use it from the
`TableFactor::Function` / `TableFactor::RowsFrom` displays. Scalar
`EXTRACT(a FROM b)` rendering is unchanged.

Regression for the parse_pretty_roundtrip cargo-fuzz finding
`SELECT … FROM extract (l, i)`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`list` has a special parser dispatch (`list(x)` / `list[x]` parse as a
LIST expression), but it was missing from the `needs_quote_to_disambiguate`
set in both `AstDisplay for Function` and the pretty `doc_function`. So a
quoted `"list"(c4)` function call — which parses to a plain `Function` —
displayed bare as `list(c4)` and reparsed to a LIST expression, drifting
the AST. Add `list` to both sets (every other special-grammar keyword was
already there).

Regression for the parse_display_roundtrip / parse_pretty_roundtrip
cargo-fuzz findings on `"list"(…)`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A bare `as` at the start of a SELECT item is consumed as the `AS OF`
timestamp keyword (yielding an empty projection), so an identifier or
function name literally called `as` displayed bare (`SELECT as`,
`SELECT as(1)`) failed to reparse. `Ident::can_be_printed_bare` now treats
`AS` like the other keywords that force quoting.

Regression for the parse_display_roundtrip / parse_pretty_roundtrip
cargo-fuzz findings on `"as"(…)`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`EXTRACT(field FROM source)` parses `field` into a string literal, so the
special `extract(field FROM source)` display form only re-parses to the
same AST when arg0 is a string. A generic `"extract"(a, b)` call with a
non-string first arg (an identifier or number) rendered as
`extract(a FROM b)` and re-parsed — via EXTRACT's grammar — to
`extract('a' FROM b)`, drifting the AST. Gate the special form on a string
first arg; otherwise emit the plain (quoted) call form. (`position` already
round-trips, as its grammar parses both args generically.)

Regression for the parse_expr_roundtrip cargo-fuzz finding.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`parse_raw_ident_str` accepted an empty quoted id, so `IN CLUSTER[""]`
parsed to `RawClusterName::Resolved("")`, which renders as `[]` and fails
to reparse ("expected id"). Reject an empty id at parse time.

Regression for the parse_display_roundtrip cargo-fuzz finding
`SHOW SINKS IN CLUSTER[""]`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-name display

Two AstDisplay paths emitted identifiers bare that then failed to reparse:

* `FETCH` consumes an optional leading `FORWARD` keyword, so a cursor literally
  named `forward` displayed as `FETCH forward` was swallowed on reparse, leaving
  no cursor name ("Expected identifier, found EOF"). Quote it when no count
  precedes it.
* The `[<id> AS <name>]` resolved-item-name `id` is parsed from an identifier
  token but was rendered with a raw `format!`, so an id containing
  spaces/keywords (e.g. a quoted token) produced unparseable output
  ("expected id"). Render it through `Ident` so it requotes as needed; a normal
  global id like `u1` still prints bare.

Regressions for the parse_display_roundtrip and parse_pretty_roundtrip
cargo-fuzz findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
def- and others added 19 commits June 11, 2026 10:57
The earlier SHOW-parenthesization only covered a query body that *is* a bare
`SHOW`. A set operation whose leftmost operand is a `SHOW` — e.g.
`SHOW … EXCEPT SELECT …` — also renders starting with `SHOW`, and a top-level
leading `SHOW` is dispatched as a `Statement::Show`, which terminates and
rejects the following set operator. Generalize both renderers to a new
`SetExpr::starts_with_show()` (the leftmost leaf is a `Show`) in `AstDisplay for
SelectStatement` and the pretty `doc_select_statement`; the parser unwraps the
redundant outer parens (`parse_query_tail`), so wrapping round-trips.

Regression for the parse_pretty_roundtrip `(SHOW … EXCEPT SELECT …)` finding.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or on display

`- <number>` folds into a negative literal at parse time and the `::` cast binds
looser, so `-CAST(2 AS t)` — parsed as "negate the cast", `Op{-, Cast{2}}` —
printed as `- 2::t` reparses as `(-2)::t` (`Cast{-2}`): the negation migrates into
the cast operand, changing the AST (and the semantics; a stored view definition
would corrupt on reparse). Parenthesize a numeric-literal cast operand under a
prefix operator in both `AstDisplay` and the pretty printer so the operator keeps
its scope. Found by the parse_pretty_roundtrip fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A function literally named `some`/`any`/`all` printed unquoted reparses as the
`<op> {SOME|ANY|ALL} (...)` quantified-comparison grammar rather than a function
call — e.g. `0 # "some"(1)` round-trips to `0 # SOME(1)`, changing the AST (and
semantics). Add the three quantifier keywords to the function-name
disambiguation list in both `AstDisplay` and the pretty printer, alongside the
existing `parse_prefix` special forms (`array`, `coalesce`, ... — `case` is
already covered by `can_be_printed_bare`). Add a regression test that round-trips
every special-grammar keyword as a function name in both primary and
operator-right-hand-side position. Found by the parse_pretty_roundtrip fuzz
target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`ANY`/`ALL`/`SOME` after a comparison operator begin a quantified comparison
(`x op ANY (...)`), so a bare such identifier on an operator's right-hand side
— e.g. `0 # some` — reparses as the start of a quantifier rather than as an
identifier. `ALL` is already always-reserved and so was force-quoted, but `ANY`
and `SOME` were not, so the pretty/display printers dropped the quotes from
`0 # "some"` and the result no longer parsed ("Expected joined table, found
ORDER" once nested under an ORDER BY). Found by parse_pretty_roundtrip:
`SELECT * FROM (SELECT x ORDER BY (SELECT 0 # "some"))`.

Fix in `can_be_printed_bare`, which is the right layer: it covers both the bare
identifier case and the function-name case (`"some"(1)`), and quotes the
keyword wherever it appears in a name (including qualified names). That makes
the `all`/`any`/`some` entries in the two `needs_quote_to_disambiguate` lists
(expr.rs, doc.rs) redundant, so drop them — those lists now track only the
`parse_prefix` `(Keyword, LParen)` special forms (array, coalesce, ...), which
are unsafe solely as function names.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A new grammar-aware fuzz target (added separately) generates valid deep SQL and
round-trips it through the pretty-printer and `AstDisplay`. It surfaced three
printer bugs that byte-mutation fuzzing never reached:

1. A function call literally named `"position"`/`"extract"` (the quoted
   special-grammar keywords) carrying a `DISTINCT`, a within-group `ORDER BY`, a
   `FILTER`, or an `OVER` window was printed via the special `position(a IN b)`
   form, which has no syntax for those modifiers — so they were silently dropped
   on display. Fall through to the plain quoted-call form when any is present.

2. A bare `list` identifier that gets subscripted — `"list"[1]` — was printed
   unquoted, but `list[1]` is a valid one-element `LIST` literal, so it reparsed
   as a list literal rather than a subscript. Quote `list` in
   `can_be_printed_bare` (like the quantifier keywords). `ARRAY` is already
   reserved-in-scalar-expression; `MAP[...]` needs `=>`, so neither has it.

3. The negated-cast fix only peeled a single `Cast` layer, so a unary minus over
   a cast *chain* (`-CAST(CAST(3.14 AS int2) AS int2)`) printed as
   `- 3.14::int2::int2` and reparsed with the negation folded into the innermost
   literal. Peel the whole chain.

Each has a regression test; (2) updates the `iffy_colnames` golden (`list` now
prints quoted).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three further printer/parser round-trip bugs surfaced by `grammar_roundtrip`:

1. `parse_cast_expr` wraps a `CAST(X AS t)` inner in parens unless `X` is "safe"
   to print left of the `::` form, but listed `Expr::Cast` as unconditionally
   safe. A `Cast` whose own operand is low-precedence is *not* safe: e.g.
   `CAST(a = ANY (...)::t AS u)` parses to `Cast(Cast(AnySubquery))` and printed
   as `a = ANY (...)::t::u` re-associates against a surrounding operator (it lost
   the grouping as a `BETWEEN` bound). Make the safety check recurse through the
   postfix `Cast`/`Collate` forms.

2. `RoleAttribute::Password` redacted the value to a bare `PASSWORD` in every
   `AstDisplay` mode, which fails to reparse (the grammar requires `NULL` or a
   string after `PASSWORD`). Keep redacting — `PASSWORD NULL` carries no secret
   and prints verbatim, a set password prints a parseable `'<REDACTED>'`
   placeholder. (The pretty-printer keeps its own lossless path.)

3. The `position(<needle> IN <haystack>)` special-form display was used for any
   2-arg `"position"` call, but the parser reads the needle at `Precedence::Like`,
   so a low-precedence needle (`NOT`, a comparison, `IS`, a quantified
   comparison, ...) swallows or stops short of the `IN` delimiter on reparse.
   Only use the special form with a needle safe to sit left of `IN`; otherwise
   fall back to the plain quoted call form.

Each has a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`needle_safe_before_in` decides whether `position(<needle> IN <haystack>)` can
use the special display. The parser reads the needle at `Precedence::Like` and
stops at the first low-precedence operator in its left spine — and that operator
can hide arbitrarily deep (`Cast(InSubquery)` printing `a IN (q)::t`,
`(a IN q) ->> b`, `Subscript(InSubquery)`, ...), so enumerating unsafe shapes is
whack-a-mole. Use a whitelist instead: emit the special form only for a *primary*
needle — atomic or self-delimiting (a value, identifier, `name(...)` call, a
parenthesized/bracketed/`CASE`/`ARRAY` expression, ...), recursing through the
postfix `::`/`COLLATE`/`[…]` forms — and fall back to the plain quoted
`"position"(a, b)` call otherwise. Common calls like `position('x' IN col)` keep
the nice special display. Found by grammar_roundtrip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`COLLATE` binds very tightly (`PostfixCollateAt`), so the `<expr> COLLATE <name>`
display re-associates the collation onto its operand's rightmost sub-operand when
the operand is low-precedence: `(a + b) COLLATE c` would print as `a + b COLLATE
c` and reparse as `a + (b COLLATE c)`. The user's disambiguating parens are
`Expr::Nested`, which the round-trip normalizer strips, so the printer must
re-add them. Parenthesize a `COLLATE` operand that isn't self-delimiting.

Generalizes the position special-form needle predicate to `prints_self_delimiting`
(atomic or bracketed/parenthesized, recursing through the postfix `::`/`COLLATE`/
`[…]` forms) and reuses it here. Found by grammar_roundtrip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generalize the prefix-operator (`-`/`+`/`~`) operand handling from the narrow
"numeric cast" carve-out to a precedence-correct `prefix_operand_needs_parens`.
A prefix op binds *tighter* than `COLLATE`/`AT TIME ZONE` and the binary
operators but *looser* than the postfix `::`/`[…]` forms, and `- <number>` lexes
as a negative literal. So an operand must be parenthesized when, after peeling
the tight `::`/`[…]` postfixes, it bottoms out at a numeric literal (the sign
would fold in: `- 2::t` -> `(-2)::t`) or at anything other than a self-delimiting
non-`COLLATE` primary (`- x COLLATE c` -> `(- x) COLLATE c`; `- (a + b)` ->
`(- a) + b`). This subsumes the previous numeric-cast-chain fix and covers the
`COLLATE`/binary cases. Found by grammar_roundtrip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`prefix_operand_needs_parens` treated a unary-operator operand (`+ + x`,
`- - x`, `NOT NOT x`, `~ ~ x`) as needing parentheses, but prefix operators
stack without them: they don't re-associate, and the inner operator symbol sits
between the outer one and any digit so there is no `- <number>` fold. The
spurious parens were not just noisy — on a long unary chain they doubled the
nesting depth on reparse and overflowed the parser/printer stack. Found by the
parse_pretty/parse_expr/parse_display byte-mutation targets (deep `+ + … 2`).
Casts/binary/comparison operands still parenthesize as before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`GRANT/REVOKE ... ON ALL <type>` displayed the object type by writing its
singular form and appending `S`. That works for every type except network
policies: `NETWORK POLICY` + `S` is `NETWORK POLICYS`, but the parser
accepts the plural keyword `POLICIES` (mapping it to ObjectType::
NetworkPolicy). So `GRANT CREATE ON ALL POLICIES TO j` parsed, displayed as
`GRANT CREATE ON ALL NETWORK POLICYS TO j`, and then failed to reparse —
a parse/display asymmetry the parse_display_roundtrip fuzzer hit.

Pluralize via a small helper that emits `POLICIES` for NetworkPolicy and
`<type>S` otherwise. Adds a round-trip regression test. (The SHOW
PRIVILEGES paths share the naive `+S`, but their parsers don't accept
POLICIES at all, so that asymmetry isn't reachable via parse->display.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR MaterializeInc#36764 made sql-pretty print role passwords losslessly via per-statement
doc printers (the AstDisplay redaction stays as the global safety net for
logs/catalog; a pretty-printer that round-trips user SQL must keep the secret).
But DECLARE and PREPARE wrap an inner statement and fell through to the generic
doc_display fallback, which prints the whole wrapper via AstDisplay -- so a
secret in the inner statement was redacted again:

  DECLARE c CURSOR FOR ALTER ROLE adb PASSWORD '2'

printed the password as '<REDACTED>', which then reparses to the string
"<REDACTED>" instead of "2", changing the AST.

Give DECLARE/PREPARE their own doc printers that recurse into the inner
statement's doc, so its secrets are printed by the lossless path. This needs
the inner statement's concrete type, so to_pretty/to_doc now bound
NestedStatement = Statement<Raw> -- satisfied by both AstInfo impls (Raw and
Aug), which is exactly the existing invariant.

Found by the parse_pretty_roundtrip fuzzer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`DEALLOCATE [PREPARE] <name>` accepts an optional `PREPARE` keyword before the
name, so `DEALLOCATE PREPARE PREPARE` parses as the optional keyword plus a
statement named `prepare`. But `PREPARE` is a non-reserved keyword, so the name
printed bare as `DEALLOCATE prepare`, which on reparse swallows `prepare` as the
optional keyword and leaves no name ("Expected identifier, found EOF").

Add `PREPARE` to `can_be_printed_bare`'s exclusion list (alongside the existing
`AS`/`ANY`/`ALL`/`SOME`/`LIST` cases) so the name prints quoted as
`DEALLOCATE "prepare"`, which reparses unambiguously. Found by the
parse_display_roundtrip fuzzer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`CREATE INDEX [<name>] [IN CLUSTER c] ON …` has an optional name followed by an
optional `IN CLUSTER` clause, so a bare `in` name re-lexes as the start of
`IN CLUSTER` and fails to reparse (`CREATE INDEX in ON t (a)` -> "Expected ON,
found IN"). Force the index name quoted when it is `in`.

This is local to the optional-name + `IN CLUSTER` ambiguity, not a
`can_be_printed_bare` case: `in` prints fine bare in a required-name position
(e.g. `CREATE SINK in IN CLUSTER c …`), so quoting it globally would
needlessly quote those. Found by the grammar_roundtrip fuzzer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pretty-printer builds `CREATE INDEX` via its own `doc_create_index` rather
than the statement's `AstDisplay`, so it needs the same fix as the previous
commit: a bare `in` index name re-lexes as the start of the optional
`IN CLUSTER` clause and fails to reparse. Force it quoted there as well. The
grammar_roundtrip fuzzer flagged this via the pretty oracle once the AstDisplay
path was fixed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`CASE` reads a leading `WHEN` as the start of the first arm of a searched
`CASE` (one with no operand), so a bare `when` identifier used as the `CASE`
operand re-lexes as the arm keyword: `CASE when.a WHEN ...` reparses as
`CASE WHEN .a ...` and fails with "Expected an expression, found dot". A column
literally named `when` used as a CASE operand therefore did not survive an
`AstDisplay` round-trip — the printer dropped the quotes (and the operand is
often reached via a `CAST`/`::`, e.g. `CASE CAST("when".a AS jsonb) WHEN ...`
prints as `CASE when.a::jsonb WHEN ...`).

Add `WHEN` to `can_be_printed_bare`'s exclusion list so the identifier stays
quoted, alongside the existing context-sensitive cases (`AS`, `ANY`/`ALL`/
`SOME`, `LIST`, `PREPARE`). Found by the grammar-generating parse_expr_roundtrip
fuzzer; regression test added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Raise the iterative expression-chain bound from the small nesting recursion
limit to EXPR_CHAIN_LIMIT (1024): wide-but-flat chains are legitimate SQL
(test/limits runs 500-term sums and OR chains), so the cap must sit above
them while still rejecting the unbounded fuzz inputs that overflow the stack.
The round-trip/display fixes in this branch quote bare keyword identifiers
(any/all/some/list, …) and rustfmt the touched sql-pretty files. Refresh the
sqllogictest plan goldens and the show-create-sink testdrive golden to match.
@def- def- requested a review from aljoscha June 12, 2026 10:41
@def- def- marked this pull request as ready for review June 12, 2026 10:42
@def- def- requested a review from a team as a code owner June 12, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant