Skip to content

feat: add native GroupsAccumulator for any_value#23065

Draft
Kevin-Li-2025 wants to merge 4 commits into
apache:mainfrom
Kevin-Li-2025:kevin/any-value-groups-accumulator
Draft

feat: add native GroupsAccumulator for any_value#23065
Kevin-Li-2025 wants to merge 4 commits into
apache:mainfrom
Kevin-Li-2025:kevin/any-value-groups-accumulator

Conversation

@Kevin-Li-2025

Copy link
Copy Markdown

Which issue does this PR close?

Follow-up to #23043. This draft is stacked on that PR and should be reviewed after it merges.

Rationale for this change

Grouped any_value currently falls back to GroupsAccumulatorAdapter, which creates one boxed Accumulator per group, collects row indices, materializes per-group slices, and performs dynamic dispatch for every group. That overhead dominates high-cardinality GROUP BY workloads even though any_value only needs to retain one non-null value per group.

This PR adds a native GroupsAccumulator that:

  • stores one ScalarValue and one is_set bit per group;
  • scans each input batch once and stops updating a group after its first valid value;
  • preserves the existing two-column partial-state contract;
  • supports filters, state merging, EmitTo::First, and convert_to_state; and
  • works for every Arrow type accepted by any_value.

What changes are included?

  • Native grouped accumulator implementation and four focused unit tests.
  • Criterion benchmark comparing the native path with GroupsAccumulatorAdapter for Int64 and Utf8 at 8,192 rows / 4,096 groups.

Local Apple Silicon benchmark medians:

Type Native Adapter Improvement
Int64 0.245 ms 4.92 ms ~20x
Utf8 0.515 ms 12-13 ms >10x

The Utf8 adapter result is allocator-sensitive, so the claim is intentionally conservative.

Validation

  • cargo test -p datafusion-functions-aggregate --lib (146 passed)
  • cargo clippy -p datafusion-functions-aggregate --all-targets -- -D warnings
  • cargo fmt --all -- --check
  • cargo bench -p datafusion-functions-aggregate --bench any_value -- --noplot
  • git diff --check

Kevin-Li-2025 added 4 commits June 19, 2026 13:14
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant