Improve performance of elementwise ByteViewArray concatenation#10161
Improve performance of elementwise ByteViewArray concatenation#10161pepijnve wants to merge 5 commits into
Conversation
|
I'll fix the MSRV issue as soon as I can. |
|
@neilconway thought you might find this one interesting as well. I'm thinking of making a similar PR for the concat functions in DataFusion. The semantics wrt null handling are different compared to |
e718dee to
8b2a515
Compare
Jefffrey
left a comment
There was a problem hiding this comment.
could you post the full benchmark results for reference?
| if let Some(n) = &null_buffer { | ||
| if n.null_count() == 0 { | ||
| null_buffer = None | ||
| } | ||
| } |
There was a problem hiding this comment.
I wonder if its beneficial to push this logic inside NullBuffer::union
There was a problem hiding this comment.
I've gone ahead and made that change. null_count() > 0 check added in union and union_many.
| .map(|len| len as usize) | ||
| .sum(); | ||
|
|
||
| if data_size > u32::MAX as usize { |
There was a problem hiding this comment.
I think it might actually be i32::MAX
There was a problem hiding this comment.
Indeed, #6172 has the relevant spec citation for view:
All integers (length, buffer index, and offset) are signed.
So we should indeed limit to i32::MAX.
Of course. Here's what I'm seeing on a MBP M2.
Raw criterion output``` pepijn@Flandrien arrow-rs % cargo bench --bench concatenate_elements -- utf8_view Compiling arrow-string v59.0.0 (/Users/pepijn/RustroverProjects/arrow-rs/arrow-string) Compiling arrow v59.0.0 (/Users/pepijn/RustroverProjects/arrow-rs/arrow) Finished `bench` profile [optimized] target(s) in 5.62s Running benches/concatenate_elements.rs (target/release/deps/concatenate_elements-3ba9b7123e418511) concat utf8_view all_inline max_str_len=12 null_density=0 time: [169.57 µs 170.05 µs 170.53 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mildconcat utf8_view max_str_len=20 null_density=0 concat utf8_view max_str_len=128 null_density=0 concat utf8_view all_inline max_str_len=12 null_density=0.2 concat utf8_view max_str_len=20 null_density=0.2 concat utf8_view max_str_len=128 null_density=0.2 pepijn@Flandrien arrow-rs % cargo bench --bench concatenate_elements -- utf8_view concat utf8_view max_str_len=20 null_density=0 concat utf8_view max_str_len=128 null_density=0 concat utf8_view all_inline max_str_len=12 null_density=0.2 concat utf8_view max_str_len=20 null_density=0.2 concat utf8_view max_str_len=128 null_density=0.2 |
1a3373a to
e4ddb27
Compare
| .map(|len| len as usize) | ||
| .sum(); | ||
|
|
||
| if data_size > u32::MAX as usize { |
There was a problem hiding this comment.
| if data_size > u32::MAX as usize { | |
| if data_size > i32::MAX as usize { |
There was a problem hiding this comment.
I just noticed I had missed that one as well. I've restructured the code a little bit to eliminate the duplicated check.
Which issue does this PR close?
None
Rationale for this change
During profiling of a DataFusion query String concatenation, in particular of two StringView arrays, proved to be a hotspot.
This MR proposes a revised version of
concat_elements_string_view_arraywhich eliminates some overhead that comes from using a fairly generic implementation strategy.Benchmarking shows improvement of 20-40%.
What changes are included in this PR?
StringViewBuilderbased concatenation implementation with one that directly writes the various buffers of the arrayNullBuffer::unionandNullBuffer::union_manyto treat buffers withnull_count == 0as equivalent toNoneso thatNonecan be returned in more cases.Are these changes tested?
NullBuffer::unionandNullBuffer::union_manyAre there any user-facing changes?
No