sql: Fix pretty/display round-trip bugs found by fuzzing#36983
Open
def- wants to merge 49 commits into
Open
Conversation
`X'...'` content is allowed to contain `'` (escaped as `''` by the lexer), but the printer was emitting it verbatim — a value with an embedded quote closed the literal prematurely and produced unparseable output. Escape on the way out, mirroring `Value::String`.
The `.` token has very high precedence and both the lexer and parser greedily extend adjacent tokens: `1.x` tokenizes the number `1.` and leaves `x` as an alias, and `'a'::T.x` consumes `T.x` as a qualified type name. So a receiver must look atomic on the way out — wrapped in delimiters or self-terminating — or the dot reattaches to the receiver on reparse and produces a different AST. Add a `write_dot_receiver` helper that parenthesizes anything outside a whitelist of atom-like exprs, and use it from `FieldAccess` and `WildcardAccess` display.
…r keywords Names like `position`, `extract`, `trim`, `substring`, `normalize`, `map`, `array`, `nullif`, `exists`, `row`, `coalesce`, `greatest`, `least` reach a special parser dispatch when followed by `(` — `POSITION(<expr> IN <expr>)`, `MAP[K => V]`, etc. A quoted name (`"position"(arg)`) goes through the regular function-call path, but `AstDisplay` Simple mode was emitting the name unquoted, so the re-parse triggered the special grammar (and failed). Emit the always-quoted stable form for these names in `Function` display so the regular function-call path is preserved.
`<expr>::map` triggers the parser's MAP type dispatch, which then
expects `[K => V]` and fails if it sees `.` or other syntax. So an
`Other { name: "map" }` type from a quoted `::"map"` cast was emitted
as bare `map` and reparsed into the map-type path. Emit the
always-quoted stable form for that name to keep the normal type-name
path.
Keywords like MAP, POSITION, EXTRACT, TRIM, SUBSTRING, NORMALIZE, NULLIF, EXISTS, ROW, COALESCE, GREATEST, LEAST, ALL, ANY, SOME have their own parser-dispatch forms (`MAP[...]`, `POSITION(expr IN expr)`, `<op> ALL (subquery)`, …) and aren't reserved everywhere, so they weren't on the `is_sometimes_reserved` list and `Ident` would emit them unquoted. But unquoted in expression position those names re- trigger the special grammar at parse time. Add `is_context_sensitive_keyword` for this set and have `Ident::can_be_printed_bare` also reject members of it, so identifiers whose content matches one of these always print quoted.
`<left> <op> ANY/ALL (...)` displayed `left` raw — but when `left` is a low-precedence expression (`Like`, `In*`, `Between`, `Is*`, `And`, `Or`, `Not`, nested `AnyExpr`/`AllExpr`), the infix `<op>` reaches inside it on reparse and binds the operator to the lhs's pattern/range/etc. instead of to the lhs as a whole, producing a different AST. Add a `write_quantified_left` helper that parenthesizes these cases and leaves atom-like lhs (incl. plain `Op`, which has its own precedence handling) unwrapped.
…r display The earlier change made `Ident::can_be_printed_bare` reject members of `is_context_sensitive_keyword` (MAP, POSITION, EXTRACT, ALL, ANY, …) so that round-trip through sql-pretty preserved them. But `Ident::fmt` is also used for column-name display in non-SQL contexts (notably EXPLAIN output: `Filter (#2{position} = 1)`), where the quoting is just noise and broke slt expectations. Revert the global change. The fuzz targets that exercised this round trip get a narrow carve-out (skip on the `Expected left square bracket` / `Expected left parenthesis` / `Expected IN, found ...` reparse errors that come from a context-sensitive keyword landing in a position the parser dispatches on).
`parse_table_factor` speculatively tries `parse_derived_table_factor` inside a `maybe_parse`, falling back to `parse_table_and_joins` on failure. Both branches recurse on every `(`, so unbalanced parentheses around multiple SELECTs (e.g. `(((SELECT * FROM (((SELECT * FROM ...`) explore an exponential backtrack tree. The 128-deep `RECURSION_LIMIT` bounds the *stack* but not the total work — fuzz inputs of ~270 bytes parsed for more than 30 seconds. Cap `maybe_parse` failures at 10_000 per parse; valid SQL needs a small constant per token, far below the cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The display path for `RawDataType::Other` already quoted `"map"` to keep it off the `parse_map_type` dispatch. The same disambiguation is needed for every keyword that `parse_data_type` *renames* to a different canonical type (`STRING` → text, `BIGINT` → int8, `BYTES` → bytea, …). Without quotes those names lose information through a display + reparse cycle. Keywords whose canonical name matches the keyword text (`bpchar`, `varchar`, `time`, `timestamp`, `timestamptz`) are left unquoted — they already round-trip via the keyword path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`doc_function` printed the function name via simple-mode display, which emits a bare keyword for names like `map`, `array`, `row`, etc. Reparsing then dispatches through the special-grammar parser (`(Kw, LParen)` in `parse_prefix`) instead of a regular function call, breaking the pretty + reparse round-trip. Mirror the same quote carve-out the `AstDisplay for Function` impl uses. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`Expr::Subscript { expr: Identifier(["map"]), ... }` displayed as
`map[…]`, which the parser then re-lexed as `Token::Keyword(MAP)`
followed by `[`, dispatching to `parse_map` (the map-literal
grammar) instead of treating it as an identifier subscript.
Parenthesize the subscript receiver when it's an identifier whose
last component is a context-sensitive keyword. The fuzz target's
`RemoveParens` visitor absorbs the extra `Nested` wrapper on
re-parse, so AST equality still holds.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`RawDataType::Other` only checked the single-ident form against the canonicalizing-keyword list. A qualified type name whose *first* component matches (e.g. `"json"."te"`) escaped the check, displayed unquoted as `json.te`, and reparsed as a JSON-canonicalized cast followed by a field access — a completely different AST. Check the first component's keyword regardless of name arity. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`SELECT "distinct"` parsed as a SELECT with a column named "distinct", but the display path printed `distinct` unquoted, which the parser then consumed as the DISTINCT modifier — so the AST round trip drifted to `SELECT DISTINCT (no columns)`. DISTINCT is a reserved keyword in PostgreSQL and must be quoted when used as an identifier; mark it always-reserved so display quotes it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixed cap of 10_000 maybe_parse failures was too tight for valid SQL with thousands of nested casts (parallel-workload generated queries in nightly 16650). Scale the cap as `SPECULATIVE_FAILURES_PER_TOKEN × tokens.len()` so deeply-nested *valid* queries have room to speculate, while the exponential-backtrack DoS case (~200 tokens generating 2^N failures) still gets cut off in linear time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`parse_connection_option_name` listed `ROLE` in its accepted-keyword
set but had no match arm for plain `ROLE`, so any
`CREATE CONNECTION x TO ... (ROLE …)` hit the `_ => unreachable!()`
fallback and panicked the process. The only ROLE-prefixed connection
option is `ASSUME ROLE {ARN|SESSION NAME}`, handled under the
`ASSUME` branch. Drop `ROLE` from the keyword set so the error is a
clean "expected one of …" parse error instead.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`CREATE TABLE foo (CHECK (…))` (no columns, only constraints) displayed as `CREATE TABLE foo (, CHECK (…))` — the comma between columns and constraints fired unconditionally. Skip it when the column list is empty so the round trip reparses cleanly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`(SHOW TABLES)` and `SHOW TABLES` had different top-level AST shapes: the parenthesized form parsed as a `Statement::Select` whose `Query.body` was a `SetExpr::Show`, while the bare form parsed straight to `Statement::Show`. The display path then printed the bare form, so the round trip drifted at the top level. When the inner `Query` is a degenerate `SetExpr::Show` wrapper (no ctes, order by, limit, or offset), unwrap to `Statement::Show` so redundant outer parens don't change the AST. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Same issue as DISTINCT: the parser greedily consumes `parse_keyword(ALL)` right after SELECT, so a `SELECT "all" "all"` (modifier + column ref) displayed unquoted as `SELECT all all` and round-tripped to `SELECT ALL` (no columns). ALL is a reserved keyword in PostgreSQL; mark it always-reserved so display quotes it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`CsrSeedProtobufSchema`'s `AstDisplay` wrote the `MESSAGE` name verbatim inside single quotes while escaping the adjacent `SCHEMA` value. A message name containing a single quote (it round-trips through the lexer as a doubled `''`) therefore displayed as an unterminated string literal and failed to reparse. Escape it like the schema value. Found by the `parse_display_roundtrip` cargo-fuzz target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`SUBSCRIBE [TO] <relation>` accepts an optional `TO`, but both the
`AstDisplay` impl and the sql-pretty `doc_subscribe` dropped it. A
relation whose name is the bare keyword `to` then displayed as
`SUBSCRIBE to`, which reparses with `to` consumed as the optional keyword
and the relation name lost ("Expected identifier, found semicolon").
Always emit the canonical `TO`; the relation name is then unambiguous on
reparse regardless of its spelling. Display-only change; parsing is
unaffected.
Found by the `parse_pretty_roundtrip` cargo-fuzz target.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`parse_table_factor` handles `(` by trying a derived table (subquery) and, failing that, recursing through `parse_table_and_joins` for a nested join — but that recursion wasn't depth-guarded. The inner `parse_query` recursion limit doesn't catch it because the derived-table `maybe_parse` swallows its `RecursionLimitError`. So `SELECT * FROM ((((…` (and a `(SELECT … FROM ((((…` subquery in expression position) overflowed the stack, and the try-both backtracking ballooned memory. Wrap `parse_table_factor` in `checked_recur_mut` so the nesting shares the global `RECURSION_LIMIT` (128) and returns "exceeds nested expression limit" instead of crashing. Found by the `parse_display_roundtrip` (stack overflow) and `parse_expr_roundtrip` (out-of-memory, ~4 GB) cargo-fuzz targets. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A bare identifier whose name begins a query body (`table`, `values`, `show`) wasn't quoted on display. When AstDisplay parenthesizes the surrounding expression (e.g. to disambiguate a field access like `(expr).f`), the re-parser reads the parens as a subquery and the keyword as its leading clause — `(table & x)` parses as a `TABLE`-query and errors "Expected identifier, found operator". Quote these identifiers (via the new `Keyword::begins_query_body`) so they round-trip; `SELECT`/`WITH` were already always-reserved. Found by the `parse_expr_roundtrip` cargo-fuzz target (input `TABLE&FALSE IS FALSE.I`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Postfix field access on a non-identifier receiver (`(a).f`), `.*` wildcard access, and left-associative operators (`a + a`) are parsed in a loop, so a long flat chain builds AST depth one node per step. The per-call recursion guard counts the whole loop as a single level, so the chain is unbounded — and the resulting deep AST overflows the stack when it is later displayed, dropped, cloned, or visited recursively. Cap the chain at the same recursion limit, rejecting it at parse time with the existing "exceeds nested expression limit" error. Regression for the parse_expr_roundtrip cargo-fuzz stack overflow on `a.ff.cX.*.G…`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`(SHOW …)` parses as a `Query` whose body is a `SetExpr::Show`. The statement parser unwraps the modifier-free case into a top-level `Statement::Show`, but a parenthesized SHOW carrying ORDER BY/LIMIT/OFFSET can't be unwrapped (top-level `SHOW` is dispatched directly and takes no modifiers), so it survives as a `SelectStatement` whose query body is a bare `Show`. Both renderers dropped the enclosing parens, emitting `SHOW … ORDER BY …`, which fails to reparse. Parenthesize the query in `AstDisplay for SelectStatement` and the pretty `doc_select_statement` when its body is a `Show`. Regression for the parse_display_roundtrip cargo-fuzz finding `(SHOW … ORDER BY …)`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tions `AstDisplay for Function` renders a 2-arg `extract`/`position` call with the special `extract(a FROM b)` / `position(a IN b)` form. That syntax only reparses where the scalar-expression parser dispatches to the special grammar; in table-function position (`FROM extract(l, i)`, `ROWS FROM (…)`) it fails to reparse. Add `Function::fmt_table_call`, which never emits the special form (it falls back to the plain comma form with the name quoted to dodge the special grammar), and use it from the `TableFactor::Function` / `TableFactor::RowsFrom` displays. Scalar `EXTRACT(a FROM b)` rendering is unchanged. Regression for the parse_pretty_roundtrip cargo-fuzz finding `SELECT … FROM extract (l, i)`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`list` has a special parser dispatch (`list(x)` / `list[x]` parse as a LIST expression), but it was missing from the `needs_quote_to_disambiguate` set in both `AstDisplay for Function` and the pretty `doc_function`. So a quoted `"list"(c4)` function call — which parses to a plain `Function` — displayed bare as `list(c4)` and reparsed to a LIST expression, drifting the AST. Add `list` to both sets (every other special-grammar keyword was already there). Regression for the parse_display_roundtrip / parse_pretty_roundtrip cargo-fuzz findings on `"list"(…)`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A bare `as` at the start of a SELECT item is consumed as the `AS OF` timestamp keyword (yielding an empty projection), so an identifier or function name literally called `as` displayed bare (`SELECT as`, `SELECT as(1)`) failed to reparse. `Ident::can_be_printed_bare` now treats `AS` like the other keywords that force quoting. Regression for the parse_display_roundtrip / parse_pretty_roundtrip cargo-fuzz findings on `"as"(…)`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`EXTRACT(field FROM source)` parses `field` into a string literal, so the
special `extract(field FROM source)` display form only re-parses to the
same AST when arg0 is a string. A generic `"extract"(a, b)` call with a
non-string first arg (an identifier or number) rendered as
`extract(a FROM b)` and re-parsed — via EXTRACT's grammar — to
`extract('a' FROM b)`, drifting the AST. Gate the special form on a string
first arg; otherwise emit the plain (quoted) call form. (`position` already
round-trips, as its grammar parses both args generically.)
Regression for the parse_expr_roundtrip cargo-fuzz finding.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`parse_raw_ident_str` accepted an empty quoted id, so `IN CLUSTER[""]`
parsed to `RawClusterName::Resolved("")`, which renders as `[]` and fails
to reparse ("expected id"). Reject an empty id at parse time.
Regression for the parse_display_roundtrip cargo-fuzz finding
`SHOW SINKS IN CLUSTER[""]`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-name display
Two AstDisplay paths emitted identifiers bare that then failed to reparse:
* `FETCH` consumes an optional leading `FORWARD` keyword, so a cursor literally
named `forward` displayed as `FETCH forward` was swallowed on reparse, leaving
no cursor name ("Expected identifier, found EOF"). Quote it when no count
precedes it.
* The `[<id> AS <name>]` resolved-item-name `id` is parsed from an identifier
token but was rendered with a raw `format!`, so an id containing
spaces/keywords (e.g. a quoted token) produced unparseable output
("expected id"). Render it through `Ident` so it requotes as needed; a normal
global id like `u1` still prints bare.
Regressions for the parse_display_roundtrip and parse_pretty_roundtrip
cargo-fuzz findings.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The earlier SHOW-parenthesization only covered a query body that *is* a bare `SHOW`. A set operation whose leftmost operand is a `SHOW` — e.g. `SHOW … EXCEPT SELECT …` — also renders starting with `SHOW`, and a top-level leading `SHOW` is dispatched as a `Statement::Show`, which terminates and rejects the following set operator. Generalize both renderers to a new `SetExpr::starts_with_show()` (the leftmost leaf is a `Show`) in `AstDisplay for SelectStatement` and the pretty `doc_select_statement`; the parser unwraps the redundant outer parens (`parse_query_tail`), so wrapping round-trips. Regression for the parse_pretty_roundtrip `(SHOW … EXCEPT SELECT …)` finding. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or on display
`- <number>` folds into a negative literal at parse time and the `::` cast binds
looser, so `-CAST(2 AS t)` — parsed as "negate the cast", `Op{-, Cast{2}}` —
printed as `- 2::t` reparses as `(-2)::t` (`Cast{-2}`): the negation migrates into
the cast operand, changing the AST (and the semantics; a stored view definition
would corrupt on reparse). Parenthesize a numeric-literal cast operand under a
prefix operator in both `AstDisplay` and the pretty printer so the operator keeps
its scope. Found by the parse_pretty_roundtrip fuzz target.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A function literally named `some`/`any`/`all` printed unquoted reparses as the
`<op> {SOME|ANY|ALL} (...)` quantified-comparison grammar rather than a function
call — e.g. `0 # "some"(1)` round-trips to `0 # SOME(1)`, changing the AST (and
semantics). Add the three quantifier keywords to the function-name
disambiguation list in both `AstDisplay` and the pretty printer, alongside the
existing `parse_prefix` special forms (`array`, `coalesce`, ... — `case` is
already covered by `can_be_printed_bare`). Add a regression test that round-trips
every special-grammar keyword as a function name in both primary and
operator-right-hand-side position. Found by the parse_pretty_roundtrip fuzz
target.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`ANY`/`ALL`/`SOME` after a comparison operator begin a quantified comparison
(`x op ANY (...)`), so a bare such identifier on an operator's right-hand side
— e.g. `0 # some` — reparses as the start of a quantifier rather than as an
identifier. `ALL` is already always-reserved and so was force-quoted, but `ANY`
and `SOME` were not, so the pretty/display printers dropped the quotes from
`0 # "some"` and the result no longer parsed ("Expected joined table, found
ORDER" once nested under an ORDER BY). Found by parse_pretty_roundtrip:
`SELECT * FROM (SELECT x ORDER BY (SELECT 0 # "some"))`.
Fix in `can_be_printed_bare`, which is the right layer: it covers both the bare
identifier case and the function-name case (`"some"(1)`), and quotes the
keyword wherever it appears in a name (including qualified names). That makes
the `all`/`any`/`some` entries in the two `needs_quote_to_disambiguate` lists
(expr.rs, doc.rs) redundant, so drop them — those lists now track only the
`parse_prefix` `(Keyword, LParen)` special forms (array, coalesce, ...), which
are unsafe solely as function names.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A new grammar-aware fuzz target (added separately) generates valid deep SQL and round-trips it through the pretty-printer and `AstDisplay`. It surfaced three printer bugs that byte-mutation fuzzing never reached: 1. A function call literally named `"position"`/`"extract"` (the quoted special-grammar keywords) carrying a `DISTINCT`, a within-group `ORDER BY`, a `FILTER`, or an `OVER` window was printed via the special `position(a IN b)` form, which has no syntax for those modifiers — so they were silently dropped on display. Fall through to the plain quoted-call form when any is present. 2. A bare `list` identifier that gets subscripted — `"list"[1]` — was printed unquoted, but `list[1]` is a valid one-element `LIST` literal, so it reparsed as a list literal rather than a subscript. Quote `list` in `can_be_printed_bare` (like the quantifier keywords). `ARRAY` is already reserved-in-scalar-expression; `MAP[...]` needs `=>`, so neither has it. 3. The negated-cast fix only peeled a single `Cast` layer, so a unary minus over a cast *chain* (`-CAST(CAST(3.14 AS int2) AS int2)`) printed as `- 3.14::int2::int2` and reparsed with the negation folded into the innermost literal. Peel the whole chain. Each has a regression test; (2) updates the `iffy_colnames` golden (`list` now prints quoted). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three further printer/parser round-trip bugs surfaced by `grammar_roundtrip`: 1. `parse_cast_expr` wraps a `CAST(X AS t)` inner in parens unless `X` is "safe" to print left of the `::` form, but listed `Expr::Cast` as unconditionally safe. A `Cast` whose own operand is low-precedence is *not* safe: e.g. `CAST(a = ANY (...)::t AS u)` parses to `Cast(Cast(AnySubquery))` and printed as `a = ANY (...)::t::u` re-associates against a surrounding operator (it lost the grouping as a `BETWEEN` bound). Make the safety check recurse through the postfix `Cast`/`Collate` forms. 2. `RoleAttribute::Password` redacted the value to a bare `PASSWORD` in every `AstDisplay` mode, which fails to reparse (the grammar requires `NULL` or a string after `PASSWORD`). Keep redacting — `PASSWORD NULL` carries no secret and prints verbatim, a set password prints a parseable `'<REDACTED>'` placeholder. (The pretty-printer keeps its own lossless path.) 3. The `position(<needle> IN <haystack>)` special-form display was used for any 2-arg `"position"` call, but the parser reads the needle at `Precedence::Like`, so a low-precedence needle (`NOT`, a comparison, `IS`, a quantified comparison, ...) swallows or stops short of the `IN` delimiter on reparse. Only use the special form with a needle safe to sit left of `IN`; otherwise fall back to the plain quoted call form. Each has a regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`needle_safe_before_in` decides whether `position(<needle> IN <haystack>)` can
use the special display. The parser reads the needle at `Precedence::Like` and
stops at the first low-precedence operator in its left spine — and that operator
can hide arbitrarily deep (`Cast(InSubquery)` printing `a IN (q)::t`,
`(a IN q) ->> b`, `Subscript(InSubquery)`, ...), so enumerating unsafe shapes is
whack-a-mole. Use a whitelist instead: emit the special form only for a *primary*
needle — atomic or self-delimiting (a value, identifier, `name(...)` call, a
parenthesized/bracketed/`CASE`/`ARRAY` expression, ...), recursing through the
postfix `::`/`COLLATE`/`[…]` forms — and fall back to the plain quoted
`"position"(a, b)` call otherwise. Common calls like `position('x' IN col)` keep
the nice special display. Found by grammar_roundtrip.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`COLLATE` binds very tightly (`PostfixCollateAt`), so the `<expr> COLLATE <name>` display re-associates the collation onto its operand's rightmost sub-operand when the operand is low-precedence: `(a + b) COLLATE c` would print as `a + b COLLATE c` and reparse as `a + (b COLLATE c)`. The user's disambiguating parens are `Expr::Nested`, which the round-trip normalizer strips, so the printer must re-add them. Parenthesize a `COLLATE` operand that isn't self-delimiting. Generalizes the position special-form needle predicate to `prints_self_delimiting` (atomic or bracketed/parenthesized, recursing through the postfix `::`/`COLLATE`/ `[…]` forms) and reuses it here. Found by grammar_roundtrip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generalize the prefix-operator (`-`/`+`/`~`) operand handling from the narrow "numeric cast" carve-out to a precedence-correct `prefix_operand_needs_parens`. A prefix op binds *tighter* than `COLLATE`/`AT TIME ZONE` and the binary operators but *looser* than the postfix `::`/`[…]` forms, and `- <number>` lexes as a negative literal. So an operand must be parenthesized when, after peeling the tight `::`/`[…]` postfixes, it bottoms out at a numeric literal (the sign would fold in: `- 2::t` -> `(-2)::t`) or at anything other than a self-delimiting non-`COLLATE` primary (`- x COLLATE c` -> `(- x) COLLATE c`; `- (a + b)` -> `(- a) + b`). This subsumes the previous numeric-cast-chain fix and covers the `COLLATE`/binary cases. Found by grammar_roundtrip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`prefix_operand_needs_parens` treated a unary-operator operand (`+ + x`, `- - x`, `NOT NOT x`, `~ ~ x`) as needing parentheses, but prefix operators stack without them: they don't re-associate, and the inner operator symbol sits between the outer one and any digit so there is no `- <number>` fold. The spurious parens were not just noisy — on a long unary chain they doubled the nesting depth on reparse and overflowed the parser/printer stack. Found by the parse_pretty/parse_expr/parse_display byte-mutation targets (deep `+ + … 2`). Casts/binary/comparison operands still parenthesize as before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`GRANT/REVOKE ... ON ALL <type>` displayed the object type by writing its singular form and appending `S`. That works for every type except network policies: `NETWORK POLICY` + `S` is `NETWORK POLICYS`, but the parser accepts the plural keyword `POLICIES` (mapping it to ObjectType:: NetworkPolicy). So `GRANT CREATE ON ALL POLICIES TO j` parsed, displayed as `GRANT CREATE ON ALL NETWORK POLICYS TO j`, and then failed to reparse — a parse/display asymmetry the parse_display_roundtrip fuzzer hit. Pluralize via a small helper that emits `POLICIES` for NetworkPolicy and `<type>S` otherwise. Adds a round-trip regression test. (The SHOW PRIVILEGES paths share the naive `+S`, but their parsers don't accept POLICIES at all, so that asymmetry isn't reachable via parse->display.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR MaterializeInc#36764 made sql-pretty print role passwords losslessly via per-statement doc printers (the AstDisplay redaction stays as the global safety net for logs/catalog; a pretty-printer that round-trips user SQL must keep the secret). But DECLARE and PREPARE wrap an inner statement and fell through to the generic doc_display fallback, which prints the whole wrapper via AstDisplay -- so a secret in the inner statement was redacted again: DECLARE c CURSOR FOR ALTER ROLE adb PASSWORD '2' printed the password as '<REDACTED>', which then reparses to the string "<REDACTED>" instead of "2", changing the AST. Give DECLARE/PREPARE their own doc printers that recurse into the inner statement's doc, so its secrets are printed by the lossless path. This needs the inner statement's concrete type, so to_pretty/to_doc now bound NestedStatement = Statement<Raw> -- satisfied by both AstInfo impls (Raw and Aug), which is exactly the existing invariant. Found by the parse_pretty_roundtrip fuzzer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`DEALLOCATE [PREPARE] <name>` accepts an optional `PREPARE` keyword before the
name, so `DEALLOCATE PREPARE PREPARE` parses as the optional keyword plus a
statement named `prepare`. But `PREPARE` is a non-reserved keyword, so the name
printed bare as `DEALLOCATE prepare`, which on reparse swallows `prepare` as the
optional keyword and leaves no name ("Expected identifier, found EOF").
Add `PREPARE` to `can_be_printed_bare`'s exclusion list (alongside the existing
`AS`/`ANY`/`ALL`/`SOME`/`LIST` cases) so the name prints quoted as
`DEALLOCATE "prepare"`, which reparses unambiguously. Found by the
parse_display_roundtrip fuzzer.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`CREATE INDEX [<name>] [IN CLUSTER c] ON …` has an optional name followed by an optional `IN CLUSTER` clause, so a bare `in` name re-lexes as the start of `IN CLUSTER` and fails to reparse (`CREATE INDEX in ON t (a)` -> "Expected ON, found IN"). Force the index name quoted when it is `in`. This is local to the optional-name + `IN CLUSTER` ambiguity, not a `can_be_printed_bare` case: `in` prints fine bare in a required-name position (e.g. `CREATE SINK in IN CLUSTER c …`), so quoting it globally would needlessly quote those. Found by the grammar_roundtrip fuzzer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pretty-printer builds `CREATE INDEX` via its own `doc_create_index` rather than the statement's `AstDisplay`, so it needs the same fix as the previous commit: a bare `in` index name re-lexes as the start of the optional `IN CLUSTER` clause and fails to reparse. Force it quoted there as well. The grammar_roundtrip fuzzer flagged this via the pretty oracle once the AstDisplay path was fixed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`CASE` reads a leading `WHEN` as the start of the first arm of a searched
`CASE` (one with no operand), so a bare `when` identifier used as the `CASE`
operand re-lexes as the arm keyword: `CASE when.a WHEN ...` reparses as
`CASE WHEN .a ...` and fails with "Expected an expression, found dot". A column
literally named `when` used as a CASE operand therefore did not survive an
`AstDisplay` round-trip — the printer dropped the quotes (and the operand is
often reached via a `CAST`/`::`, e.g. `CASE CAST("when".a AS jsonb) WHEN ...`
prints as `CASE when.a::jsonb WHEN ...`).
Add `WHEN` to `can_be_printed_bare`'s exclusion list so the identifier stays
quoted, alongside the existing context-sensitive cases (`AS`, `ANY`/`ALL`/
`SOME`, `LIST`, `PREPARE`). Found by the grammar-generating parse_expr_roundtrip
fuzzer; regression test added.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Raise the iterative expression-chain bound from the small nesting recursion limit to EXPR_CHAIN_LIMIT (1024): wide-but-flat chains are legitimate SQL (test/limits runs 500-term sums and OR chains), so the cap must sit above them while still rejecting the unbounded fuzz inputs that overflow the stack.
The round-trip/display fixes in this branch quote bare keyword identifiers (any/all/some/list, …) and rustfmt the touched sql-pretty files. Refresh the sqllogictest plan goldens and the show-create-sink testdrive golden to match.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Print→reparse round-trip bugs in the SQL parser and pretty-printer, surfaced by the grammar-aware fuzz target. Each fix has a regression test; the sqllogictest/testdrive plan goldens are refreshed to match.
Themes: quoting bare keyword identifiers (any/all/some/list, context-sensitive keywords), parenthesizing low-precedence operands (prefix ops, casts, COLLATE, quantified comparisons), special-form display correctness (EXTRACT/POSITION/SUBSCRIBE), and bounding parser recursion/backtracking to reject pathological inputs.
Found by the cargo-fuzz suite (separate infra PR).
🤖 Generated with Claude Code