Add any_value aggregate function#23043
Conversation
| query I | ||
| SELECT any_value(column2) FROM any_value_test; | ||
| ---- | ||
| 10 | ||
|
|
||
| query IIT rowsort | ||
| SELECT column1, any_value(column2), any_value(column3) | ||
| FROM any_value_test | ||
| GROUP BY column1; | ||
| ---- | ||
| 1 10 first | ||
| 2 NULL NULL | ||
| 3 30 third |
There was a problem hiding this comment.
are these two tests technically deterministic?
|
Thanks @Kevin-Li-2025 @Jefffrey , LGTM. One follow-up idea is to impl a native |
|
Thanks, agreed that a native GroupsAccumulator is the right follow-up for grouped any_value. I’ll keep this PR scoped to adding the function and handle the grouped fast path separately, since it needs type-specific state handling plus targeted grouped-aggregation benchmarks to demonstrate the improvement without delaying this PR. |
| (3, 30, 'third'); | ||
|
|
||
| query B | ||
| SELECT any_value(column2) IN (10, 20) FROM any_value_test; |
There was a problem hiding this comment.
| SELECT any_value(column2) IN (10, 20) FROM any_value_test; | |
| SELECT any_value(column2) is not null FROM any_value_test; |
or any_value(column2) IN (10, 20, 30) since we have value 30 too
|
I implemented the native grouped fast path as the separate follow-up discussed here: #23065. It includes focused state/merge/filter/EmitTo tests and Criterion coverage against |
Which issue does this PR close?
any_valueaggregate function #22799.Rationale for this change
any_valueis a common aggregate in SQL engines for queries that need one representative non-null value from each group without imposing an ordering requirement. DataFusion currently hasfirst_value, but that aggregate is order-sensitive, so exposingany_valuegives users the intended arbitrary-value semantics directly.What changes are included in this PR?
any_value(expression)aggregate UDF and registers it with the default aggregate functions.Are these changes tested?
Yes. I ran:
Are there any user-facing changes?
Yes. This adds a new SQL aggregate function,
any_value.I used AI assistance to help inspect the codebase and run validation, and I reviewed the resulting implementation and tests.