Skip to content

fix: prevent unparser stack overflow on deeply nested expressions#23058

Open
adriangb wants to merge 2 commits into
apache:mainfrom
pydantic:fix-unparser-recursion
Open

fix: prevent unparser stack overflow on deeply nested expressions#23058
adriangb wants to merge 2 commits into
apache:mainfrom
pydantic:fix-unparser-recursion

Conversation

@adriangb

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Unparser::expr_to_sql overflows the OS stack on deeply nested expressions, and — surprisingly — it does so even with the recursive_protection feature enabled (the default for the datafusion crate).

While investigating I found the root cause is subtler than "a function is missing #[recursive]":

expr_to_sql_inner is already annotated and is re-entered on the function-argument / dialect-override paths, so the trampoline does fire. The real problem is that the recursive crate's default red zone is 128 KiB, but a single expr_to_sql_inner stack frame in debug builds is ~130 KiBlarger than the red zone. stacker::maybe_grow only grows when remaining < red_zone, so a frame can pass the check and then overflow before the next checkpoint. The unparser installed no StackGuard, unlike the planner, which already does exactly this in query.rs:

// query.rs
let _guard = StackGuard::new(256 * 1024);

with a comment noting the same debug-frame-size issue (PR #13310).

Secondary leaks: several internal recursion sites recursed through the public, un-annotated expr_to_sql (function arguments, make_array, array_element, named_struct, map, and the array_has / date_part dialect overrides), and remove_unnecessary_nesting (pretty mode) was not annotated at all — it would be the next overflow site once the first is fixed.

What changes are included in this PR?

The recursive design is kept (an iterative rewrite of the tree-building unparser would be a large, high-risk change for no benefit over the trampoline). The fix mirrors the planner:

  • Install StackGuard::new(256 * 1024) once at the top-level expr_to_sql.
  • Add an annotated pub(crate) expr_to_sql_with_nesting recursion core that all internal recursion sites call. This keeps pretty-mode behavior identical (each level still runs remove_unnecessary_nesting) while making every nesting level a stack-growth checkpoint.
  • Annotate remove_unnecessary_nesting with #[recursive].

The PR is split into two commits for review: the failing regression test first, then the fix.

Are these changes tested?

Yes. test_deeply_nested_expr_does_not_overflow_stack (gated on recursive_protection) unparses a depth-2000 array_has chain on PostgreSqlDialect and a depth-2000 binary chain on a 2 MiB thread. It aborts the process on main and passes with the fix. I verified empirically that with the default 128 KiB red zone the test still aborts, and that 256 KiB carries both paths past depth 5000 on a realistic thread.

Note: the guarantee holds on realistically-sized thread stacks (≥1–2 MiB). The 512 KiB-stack figure in the issue's repro is too small to bootstrap the heavier array_has override path even after the fix, but that is the same memory floor the planner side has and is not a configuration used by DataFusion in practice.

Are there any user-facing changes?

No public API changes. Deeply nested expressions that previously aborted the process now unparse successfully when recursive_protection is enabled.

🤖 Generated with Claude Code

https://claude.ai/code/session_01QyHc4z88pC8nQLpFx7z2vM

adriangb and others added 2 commits June 20, 2026 19:19
Adds `test_deeply_nested_expr_does_not_overflow_stack`, gated on the
`recursive_protection` feature, which unparses a deeply nested expression
through both a dialect scalar-function override (`array_has` on
`PostgreSqlDialect`) and a plain binary-operator chain.

Even with `recursive_protection` enabled this aborts the process with a
stack overflow today (issue apache#23056): the per-level stack cost of the
unparser exceeds the `recursive` crate's default red zone, so the
stack-growing trampoline never engages in time. The fix follows in the
next commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QyHc4z88pC8nQLpFx7z2vM
…d core

Makes the regression test from the previous commit pass.

Although `expr_to_sql_inner` is annotated with `#[recursive]`, two things
defeated the stack-growth protection:

1. The unparser installed no `StackGuard`, so the `recursive` crate used
   its default 128 KiB red zone. In debug builds a single
   `expr_to_sql_inner` frame is ~130 KiB, i.e. larger than the red zone,
   so `stacker::maybe_grow` could pass its check and then overflow before
   reaching the next checkpoint. The planner already installs
   `StackGuard::new(256 * 1024)` in `query.rs` for the same reason.
2. Several internal recursion sites (scalar-function arguments, arrays,
   maps, and the `array_has` / `date_part` dialect overrides) recursed
   through the public, un-annotated `expr_to_sql`, and
   `remove_unnecessary_nesting` (used in pretty mode) was not annotated.

The fix keeps the recursive design but:

- installs a `StackGuard` with a 256 KiB red zone once at the top-level
  `expr_to_sql`, matching the planner;
- adds an annotated `pub(crate) expr_to_sql_with_nesting` recursion core
  that all internal recursion sites call (preserving pretty-mode
  behavior), so every nesting level is a stack-growth checkpoint;
- annotates `remove_unnecessary_nesting`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QyHc4z88pC8nQLpFx7z2vM
@github-actions github-actions Bot added the sql SQL Planner label Jun 20, 2026
@adriangb adriangb marked this pull request as ready for review June 21, 2026 07:08
@adriangb adriangb requested a review from kosiew June 21, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unparser stack overflow: recursive_protection is bypassed for nested function arguments / dialect overrides (public expr_to_sql not #[recursive])

1 participant