perf: Optimize `array_sort()` by neilconway · Pull Request #21083 · apache/datafusion

neilconway · 2026-03-20T19:45:24Z

Which issue does this PR close?

Closes Optimize array_sort #21005.
Closes Optimize array_sort with nested-list-aware sort kernel #21041.

Rationale for this change

The previous array_sort implementation called the Arrow sort kernel for every row, and then used concat to produce the final results. This was quite inefficient. Instead, we employee three different techniques depending on the input:

(1) For arrays of primitives types without null elements, we copy all values into a single Vec, sort each row's slice of the Vec in-place, and then wrap the Vec in a GenericListArray.

(2) For arrays of primitives types with null elements, we use a similar approach but we need to incur some more bookkeeping to place null elements in the right place and construct the null buffer.

(3) For arrays of non-primitive types, we use RowConverter to convert the entire input into the row format in one call, sort row indices by comparing the encoded row values, and then use a single take() to construct the result of the sort.

Benchmarks (8192 rows, vs main):

int32/5 elements:          886 µs →  57 µs  (-94%)
int32/20 elements:        1.64 ms → 846 µs  (-48%)
int32/100 elements:       4.03 ms → 3.22 ms (-20%)
int32_null_elements/5:    1.17 ms → 168 µs  (-86%)
int32_null_elements/1000: 47.2 ms → 44.1 ms  (-7%)
string/5 elements:        2.12 ms → 727 µs  (-66%)
string/1000 elements:      405 ms → 293 ms  (-28%)

What changes are included in this PR?

New array_sort benchmark
Extended unit test coverage
Improve docs
Implement optimizations as described above

Are these changes tested?

No.

Are there any user-facing changes?

No.

neilconway · 2026-03-20T19:47:12Z

FYI @Dandandan -- thanks for the suggestion about avoiding the per-row sort kernel, seems quite effective.

Dandandan · 2026-03-20T21:34:41Z

datafusion/functions-nested/src/sort.rs

 }

-fn array_sort_inner(args: &[ArrayRef]) -> Result<ArrayRef> {
+pub fn array_sort_inner(args: &[ArrayRef]) -> Result<ArrayRef> {


Claude flagged this:
4. array_sort_inner made pub just for benchmarks

Solution which could I think could work

[...] restructuring the benchmark to use the public ArraySort UDF instead

pub(crate) should work, no?

Dandandan · 2026-03-20T21:42:01Z

datafusion/functions-nested/src/sort.rs

+    let values_start = offsets[0].as_usize();
+    let total_values = offsets[row_count].as_usize() - values_start;
+
+    let converter = RowConverter::new(vec![SortField::new_with_options(


Why using a RowConverter for this one? I think the path in arrow-rs is something like:

partition on nulls

create the indices (begin...end)

sort by the strings (using the indices)

Add the nulls if it starts with nulls

Add the values (using take)

Add the nulls if it ends with nulls

I briefly considered something like that, but I figured that all the pointer chasing would be pretty expensive. You're right that it's worth comparing though.

Here's a quick Claude-generated version -- lmk if you had something else in mind.

Benchmarking it against the RowComparator approach, RowComparator wins for medium-sized arrays (20 elements) and larger, and loses to the index-based comparison approach for small arrays:

┌─────────────┬──────────┬─────────────────┬─────────────────┐ │ Benchmark │ main │ RowConverter │ make_comparator │ ├─────────────┼──────────┼─────────────────┼─────────────────┤ │ string/5 │ 2.12 ms │ 727 µs (-66%) │ 608 µs (-71%) │ ├─────────────┼──────────┼─────────────────┼─────────────────┤ │ string/20 │ 5.94 ms │ 4.42 ms (-26%) │ 4.76 ms (-20%) │ ├─────────────┼──────────┼─────────────────┼─────────────────┤ │ string/100 │ 26.8 ms │ 22.6 ms (-16%) │ 25.1 ms (-6%) │ ├─────────────┼──────────┼─────────────────┼─────────────────┤ │ string/1000 │ 404.9 ms │ 293.1 ms (-28%) │ 403.9 ms (~0%) │ └─────────────┴──────────┴─────────────────┴─────────────────┘

Not sure offhand which typical real-world workloads look like; lmk if you have a view.

Dandandan · 2026-03-20T21:47:26Z

Woah, great!

neilconway added 2 commits March 20, 2026 13:59

Add benchmark for array_sort()

7396c4c

Optimize array_sort

294d769

github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 20, 2026

neilconway mentioned this pull request Mar 20, 2026

perf: Optimize array_sort for arrays of non-primitive types #21006

Closed

Merge branch 'main' into neilc/array-sort-custom-kernel

8326cb1

Dandandan reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize `array_sort()`#21083

perf: Optimize `array_sort()`#21083
neilconway wants to merge 3 commits intoapache:mainfrom
neilconway:neilc/array-sort-custom-kernel

neilconway commented Mar 20, 2026

Uh oh!

neilconway commented Mar 20, 2026

Uh oh!

Dandandan Mar 20, 2026

Uh oh!

neilconway Mar 20, 2026

Uh oh!

Dandandan Mar 20, 2026 •

edited

Loading

Uh oh!

neilconway Mar 21, 2026

Uh oh!

Dandandan commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neilconway commented Mar 20, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Mar 20, 2026

Uh oh!

Dandandan Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neilconway Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dandandan Mar 20, 2026 •

edited

Loading