Skip to content

feat(stats): Set distinct_count to Exact(1) when filter narrows column to single value#21077

Open
asolimando wants to merge 1 commit intoapache:mainfrom
asolimando:asolimando/ndv-single-value-filter
Open

feat(stats): Set distinct_count to Exact(1) when filter narrows column to single value#21077
asolimando wants to merge 1 commit intoapache:mainfrom
asolimando:asolimando/ndv-single-value-filter

Conversation

@asolimando
Copy link
Member

@asolimando asolimando commented Mar 20, 2026

Which issue does this PR close?

Related: #20789 (uses NDV for equality filter selectivity, complementary - this PR improves the NDV output stats, that PR consumes them)

Rationale for this change

When a filter predicate collapses a column interval to a single value (e.g. d_qoy = 1), the output column can only have one distinct value. Currently distinct_count is always demoted to Inexact, losing this information.

This matters for downstream optimizers that rely on distinct_count, such as join cardinality estimation in estimate_inner_join_cardinality.

What changes are included in this PR?

In collect_new_statistics (filter.rs), when the post-filter interval has lower == upper (both non-null), set distinct_count to Precision::Exact(1) instead of demoting the input NDV to Inexact.

Are these changes tested?

Yes, 4 unit tests:

  • Equality predicate (a = 42) -> NDV becomes Exact(1)
  • OR predicate (a = 42 OR a = 22) -> interval does not collapse, NDV stays Inexact
  • AND with mixed predicates (a = 42 AND b > 10 AND c = 7) -> a and c get Exact(1), b stays Inexact
  • Equality with absent bounds (a = 42, no min/max) -> interval analysis still resolves to Exact(1)

Are there any user-facing changes?

No breaking changes. Statistics consumers will now see Exact(1) for distinct_count on columns constrained to a single value by filter predicates.

Disclaimer: I used AI to assist in the code generation, I have manually reviewed the output and it matches my intention and understanding.

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Mar 20, 2026
When a filter predicate collapses a column's interval to a single
value (e.g. d_qoy = 1), the output has exactly 1 distinct value.
Previously the original Parquet NDV was propagated, inflating
GROUP BY output estimates for CTE self-join patterns like Q31.
@asolimando asolimando force-pushed the asolimando/ndv-single-value-filter branch from 061df5a to 0613f1a Compare March 20, 2026 12:24
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 20, 2026
@asolimando asolimando changed the title feat: Set distinct_count to Exact(1) when filter narrows column to single value feat(stats): Set distinct_count to Exact(1) when filter narrows column to single value Mar 20, 2026
@asolimando
Copy link
Member Author

cc: @jonathanc-n

@@ -59,7 +59,7 @@ query TT
EXPLAIN SELECT * FROM test_table WHERE column1 = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests with e.g. floating point numbers, strings? I think it should work just the same but....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants