-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-28911: Improve SEARCH expansion to exploit <> operator #6503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
827b879
2142444
7a73f3b
952a039
d31ba37
cbe2b37
55bb8cc
da58715
cdec829
2635139
8a9a283
0578182
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -628,14 +628,23 @@ private RexNode makeLiteral(C value) { | |
| private double compute() { | ||
| final List<RexNode> inLiterals = new ArrayList<>(); | ||
| final List<Double> rangeSelectivities = new ArrayList<>(); | ||
| for (Range<C> range : sarg.rangeSet.asRanges()) { | ||
| if (!range.hasLowerBound() && !range.hasUpperBound()) { | ||
| return 1.0; // "all" range | ||
| final List<Double> searchSelectivities = new ArrayList<>(); | ||
|
|
||
| if (sarg.isComplementedPoints()) { | ||
| // Generate 'ref <> value1 AND ... AND ref <> valueN' | ||
| List<RexNode> notEq = sarg.rangeSet.complement().asRanges().stream() | ||
| .map(range -> rexBuilder.makeCall(SqlStdOperatorTable.NOT_EQUALS, ref, makeLiteral(range.lowerEndpoint()))) | ||
| .toList(); | ||
| searchSelectivities.add(RexUtil.composeConjunction(rexBuilder, notEq).accept(FilterSelectivityEstimator.this)); | ||
| } else { | ||
|
Comment on lines
+633
to
+639
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this form lead to better selectivity estimates than what we had before? If not then we don't necessarily need to change it. The purpose of this code is to compute a good selectivity for the SEARCH predicate not to transform it to something else.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Uhmmm... you're right, the original version (which uses histograms) would have a more accurate estimation than the proposed one with NOT_EQUALS (which is estimated simply with ndv-1/ndv , which can be quite off). However, it might be possible that histograms are not available in general (so the original version would default to a hadcoded selectivity), whereas the sub-optimal optimization with NOT_EQUALS uses a more generally available ndv estimated value (and this estimation, although not perfect, would be better than the hardcoded value of the original version). Having said that, I guess we should try to aim for the better solution, and trust that statistics would be available, so I lean towards reverting the change in this file. |
||
| for (Range<C> range : sarg.rangeSet.asRanges()) { | ||
| if (!range.hasLowerBound() && !range.hasUpperBound()) { | ||
| return 1.0; // "all" range | ||
| } | ||
| processRangeSelectivity(range, rangeSelectivities, inLiterals); | ||
| } | ||
| processRangeSelectivity(range, rangeSelectivities, inLiterals); | ||
| } | ||
|
|
||
| final List<Double> searchSelectivities = new ArrayList<>(); | ||
| if (!rangeSelectivities.isEmpty() && rangeSelectivities.stream().noneMatch(Objects::isNull)) { | ||
| // Aggregate all ranges selectivity, respecting the max value of 1 | ||
| double total = Math.min(1.0, rangeSelectivities.stream().mapToDouble(Double::doubleValue).sum()); | ||
|
|
@@ -655,7 +664,8 @@ private double compute() { | |
| List<RexNode> operands = new ArrayList<>(inLiterals.size() + 1); | ||
| operands.add(ref); | ||
| operands.addAll(inLiterals); | ||
| searchSelectivities.add(rexBuilder.makeCall(HiveIn.INSTANCE, operands).accept(FilterSelectivityEstimator.this)); | ||
| searchSelectivities.add( | ||
| rexBuilder.makeCall(HiveIn.INSTANCE, operands).accept(FilterSelectivityEstimator.this)); | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -664,7 +674,9 @@ private double compute() { | |
| rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, ref).accept(FilterSelectivityEstimator.this)); | ||
| } | ||
|
|
||
| return searchSelectivities.size() == 1 ? searchSelectivities.get(0) : computeDisjunctionSelectivity(searchSelectivities); | ||
| return searchSelectivities.size() == 1 | ||
| ? searchSelectivities.get(0) | ||
| : computeDisjunctionSelectivity(searchSelectivities); | ||
| } | ||
|
|
||
| private void processRangeSelectivity(Range<C> range, List<Double> rangeSelectivities, List<RexNode> inLiterals) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the sarg is not strictly a complement then I guess we cannot handle it. Examples:
One way to cover those would be to apply the existing
RangeConverteronsarg.rangeSet.complement()and invert the EQUALS and HiveIn operators in the switch. The challenge here is when to use thesarg.rangeSetand when its complement. A naive choice could be to base the decision onsarg.rangeSet.asRanges().size()or something along these lines.Anyways, the change here is fine as it is so we can log a another ticket and follow-up there if its worth to do it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact we could negate the entire sarg (not only the rangeset) if that leads to a simpler more efficient expansion. This would evolve negating the operators (IS NULL, =, IN, etc.), and changing the
orListinto anandList.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What that would look like? Try systematically the "normal" sarg expansion and the negated one, and pick the "simpler" one?
I guess we could try... but could we consider that a follow-up, separate task, out of the scope of the current PR?