Conversation
|
FYI @Dandandan -- thanks for the suggestion about avoiding the per-row sort kernel, seems quite effective. |
| } | ||
|
|
||
| fn array_sort_inner(args: &[ArrayRef]) -> Result<ArrayRef> { | ||
| pub fn array_sort_inner(args: &[ArrayRef]) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Claude flagged this:
4. array_sort_inner made pub just for benchmarks
Solution which could I think could work
- [...] restructuring the benchmark to use the public ArraySort UDF instead
There was a problem hiding this comment.
pub(crate) should work, no?
| let values_start = offsets[0].as_usize(); | ||
| let total_values = offsets[row_count].as_usize() - values_start; | ||
|
|
||
| let converter = RowConverter::new(vec![SortField::new_with_options( |
There was a problem hiding this comment.
Why using a RowConverter for this one? I think the path in arrow-rs is something like:
- partition on nulls
- create the indices (begin...end)
- sort by the strings (using the indices)
- Add the nulls if it starts with nulls
- Add the values (using take)
- Add the nulls if it ends with nulls
There was a problem hiding this comment.
I briefly considered something like that, but I figured that all the pointer chasing would be pretty expensive. You're right that it's worth comparing though.
Here's a quick Claude-generated version -- lmk if you had something else in mind.
Benchmarking it against the RowComparator approach, RowComparator wins for medium-sized arrays (20 elements) and larger, and loses to the index-based comparison approach for small arrays:
┌─────────────┬──────────┬─────────────────┬─────────────────┐
│ Benchmark │ main │ RowConverter │ make_comparator │
├─────────────┼──────────┼─────────────────┼─────────────────┤
│ string/5 │ 2.12 ms │ 727 µs (-66%) │ 608 µs (-71%) │
├─────────────┼──────────┼─────────────────┼─────────────────┤
│ string/20 │ 5.94 ms │ 4.42 ms (-26%) │ 4.76 ms (-20%) │
├─────────────┼──────────┼─────────────────┼─────────────────┤
│ string/100 │ 26.8 ms │ 22.6 ms (-16%) │ 25.1 ms (-6%) │
├─────────────┼──────────┼─────────────────┼─────────────────┤
│ string/1000 │ 404.9 ms │ 293.1 ms (-28%) │ 403.9 ms (~0%) │
└─────────────┴──────────┴─────────────────┴─────────────────┘
Not sure offhand which typical real-world workloads look like; lmk if you have a view.
|
Woah, great! |
Which issue does this PR close?
array_sort#21005.Rationale for this change
The previous
array_sortimplementation called the Arrow sort kernel for every row, and then usedconcatto produce the final results. This was quite inefficient. Instead, we employee three different techniques depending on the input:(1) For arrays of primitives types without null elements, we copy all values into a single
Vec, sort each row's slice of theVecin-place, and then wrap theVecin aGenericListArray.(2) For arrays of primitives types with null elements, we use a similar approach but we need to incur some more bookkeeping to place null elements in the right place and construct the null buffer.
(3) For arrays of non-primitive types, we use
RowConverterto convert the entire input into the row format in one call, sort row indices by comparing the encoded row values, and then use a singletake()to construct the result of the sort.Benchmarks (8192 rows, vs main):
What changes are included in this PR?
array_sortbenchmarkAre these changes tested?
No.
Are there any user-facing changes?
No.