Skip to content

Conversation

@lichuang
Copy link

@lichuang lichuang commented Jan 24, 2026

Which issue does this PR close?

This PR refactors ORDER BY planning to use the merged schema approach (similar to HAVING), unifying the SQL planning logic and simplifying the generated ex
ecution plans.

Currently, DataFusion has two different code paths for handling ORDER BY:

  1. add_missing_columns (in LogicalPlanBuilder::sort_with_limit): Traverses the plan tree looking for Projection nodes and adds missing columns to them
  2. Merged Schema approach (used by HAVING): Uses a merged schema (SELECT list + FROM clause) to resolve expressions and directly adds missing columns to the SELECT list

Having both paths coexist leads to:

• Complex and hard-to-maintain code
• Non-intuitive handling of simple queries like SELECT x FROM foo ORDER BY y
• Generated execution plans with unnecessary subquery wrapping

Solution

Implement the approach proposed in #10326: handle ORDER BY similarly to HAVING by using the merged schema and adding missing columns directly to the SELECT
list instead of traversing the plan tree.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sql SQL Planner label Jan 24, 2026
@lichuang lichuang force-pushed the issue-10326 branch 3 times, most recently from 618a289 to 6a0bfa1 Compare January 28, 2026 02:36
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 28, 2026
@lichuang lichuang marked this pull request as ready for review January 28, 2026 07:05
@lichuang
Copy link
Author

@alamb @jonahgao

@alamb
Copy link
Contributor

alamb commented Jan 28, 2026

This seems to add more code (rather than unify the code paths as the PR comment suggests) 🤔

@alamb
Copy link
Contributor

alamb commented Jan 28, 2026

In other words, this PR seems to add more code paths, when the idea was to reduce the code / duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify SQL planning for ORDER BY, HAVING, DISTINCT, etc

2 participants