refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc #19974
+426
−22
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
ORDER BY,HAVING,DISTINCT, etc #10326This PR refactors ORDER BY planning to use the merged schema approach (similar to HAVING), unifying the SQL planning logic and simplifying the generated ex
ecution plans.
Currently, DataFusion has two different code paths for handling ORDER BY:
add_missing_columns(inLogicalPlanBuilder::sort_with_limit): Traverses the plan tree looking for Projection nodes and adds missing columns to themHaving both paths coexist leads to:
• Complex and hard-to-maintain code
• Non-intuitive handling of simple queries like SELECT x FROM foo ORDER BY y
• Generated execution plans with unnecessary subquery wrapping
Solution
Implement the approach proposed in #10326: handle ORDER BY similarly to HAVING by using the merged schema and adding missing columns directly to the SELECT
list instead of traversing the plan tree.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?