Skip to content

Improve performance of elementwise ByteViewArray concatenation#10161

Open
pepijnve wants to merge 5 commits into
apache:mainfrom
pepijnve:concat_view
Open

Improve performance of elementwise ByteViewArray concatenation#10161
pepijnve wants to merge 5 commits into
apache:mainfrom
pepijnve:concat_view

Conversation

@pepijnve

@pepijnve pepijnve commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

None

Rationale for this change

During profiling of a DataFusion query String concatenation, in particular of two StringView arrays, proved to be a hotspot.
This MR proposes a revised version of concat_elements_string_view_array which eliminates some overhead that comes from using a fairly generic implementation strategy.
Benchmarking shows improvement of 20-40%.

What changes are included in this PR?

  • Replace StringViewBuilder based concatenation implementation with one that directly writes the various buffers of the array
  • Add checks in NullBuffer::union and NullBuffer::union_many to treat buffers with null_count == 0 as equivalent to None so that None can be returned in more cases.

Are these changes tested?

  • Covered by existing tests, and some additional test cases added to ensure newly added code is covered
  • Added additional test cases to cover the changes to NullBuffer::union and NullBuffer::union_many

Are there any user-facing changes?

No

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Jun 20, 2026
@pepijnve

Copy link
Copy Markdown
Contributor Author

I'll fix the MSRV issue as soon as I can.

@pepijnve

Copy link
Copy Markdown
Contributor Author

@neilconway thought you might find this one interesting as well. I'm thinking of making a similar PR for the concat functions in DataFusion. The semantics wrt null handling are different compared to ||, but I think the same optimisations will apply there as well.

@pepijnve pepijnve force-pushed the concat_view branch 4 times, most recently from e718dee to 8b2a515 Compare June 20, 2026 15:09

@Jefffrey Jefffrey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you post the full benchmark results for reference?

Comment thread arrow-string/src/concat_elements.rs Outdated
Comment on lines +253 to +257
if let Some(n) = &null_buffer {
if n.null_count() == 0 {
null_buffer = None
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if its beneficial to push this logic inside NullBuffer::union

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and made that change. null_count() > 0 check added in union and union_many.

Comment thread arrow-string/src/concat_elements.rs Outdated
.map(|len| len as usize)
.sum();

if data_size > u32::MAX as usize {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might actually be i32::MAX

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, #6172 has the relevant spec citation for view:

All integers (length, buffer index, and offset) are signed.

So we should indeed limit to i32::MAX.

@pepijnve

Copy link
Copy Markdown
Contributor Author

could you post the full benchmark results for reference?

Of course. Here's what I'm seeing on a MBP M2.

Bench main concat_view Difference
concat utf8_view all_inline max_str_len=12 null_density=0 170.05 µs 93.549 µs -45.475%
concat utf8_view  max_str_len=20 null_density=0 181.60 µs 113.11 µs -37.650%
concat utf8_view  max_str_len=128 null_density=0 229.82 µs 158.62 µs -30.546%
concat utf8_view all_inline max_str_len=12 null_density=0.2 160.90 µs 90.85 µs -43.547%
concat utf8_view  max_str_len=20 null_density=0.2 178.43 µs 104.13 µs -41.637%
concat utf8_view  max_str_len=128 null_density=0.2 161.06 µs 147.52 µs −8.3218%
Raw criterion output ``` pepijn@Flandrien arrow-rs % cargo bench --bench concatenate_elements -- utf8_view Compiling arrow-string v59.0.0 (/Users/pepijn/RustroverProjects/arrow-rs/arrow-string) Compiling arrow v59.0.0 (/Users/pepijn/RustroverProjects/arrow-rs/arrow) Finished `bench` profile [optimized] target(s) in 5.62s Running benches/concatenate_elements.rs (target/release/deps/concatenate_elements-3ba9b7123e418511) concat utf8_view all_inline max_str_len=12 null_density=0 time: [169.57 µs 170.05 µs 170.53 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild

concat utf8_view max_str_len=20 null_density=0
time: [180.87 µs 181.60 µs 182.34 µs]

concat utf8_view max_str_len=128 null_density=0
time: [227.06 µs 229.82 µs 232.56 µs]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild

concat utf8_view all_inline max_str_len=12 null_density=0.2
time: [160.52 µs 160.90 µs 161.27 µs]
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
3 (3.00%) high mild

concat utf8_view max_str_len=20 null_density=0.2
time: [177.61 µs 178.43 µs 179.28 µs]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe

concat utf8_view max_str_len=128 null_density=0.2
time: [159.62 µs 161.06 µs 162.71 µs]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

pepijn@Flandrien arrow-rs % cargo bench --bench concatenate_elements -- utf8_view
Compiling arrow-string v59.0.0 (/Users/pepijn/RustroverProjects/arrow-rs/arrow-string)
Compiling arrow v59.0.0 (/Users/pepijn/RustroverProjects/arrow-rs/arrow)
Finished bench profile [optimized] target(s) in 5.24s
Running benches/concatenate_elements.rs (target/release/deps/concatenate_elements-3ba9b7123e418511)
concat utf8_view all_inline max_str_len=12 null_density=0
time: [92.899 µs 93.549 µs 94.453 µs]
change: [−45.860% −45.475% −45.079%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) low mild
1 (1.00%) high severe

concat utf8_view max_str_len=20 null_density=0
time: [112.70 µs 113.11 µs 113.52 µs]
change: [−38.054% −37.650% −37.270%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
5 (5.00%) low mild

concat utf8_view max_str_len=128 null_density=0
time: [157.85 µs 158.62 µs 159.50 µs]
change: [−31.641% −30.546% −29.429%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe

concat utf8_view all_inline max_str_len=12 null_density=0.2
time: [90.539 µs 90.850 µs 91.182 µs]
change: [−43.852% −43.547% −43.263%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

concat utf8_view max_str_len=20 null_density=0.2
time: [103.80 µs 104.13 µs 104.46 µs]
change: [−42.146% −41.637% −41.185%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
2 (2.00%) high mild

concat utf8_view max_str_len=128 null_density=0.2
time: [147.19 µs 147.52 µs 147.89 µs]
change: [−9.1648% −8.3218% −7.4702%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
3 (3.00%) high mild

</details>

@pepijnve pepijnve force-pushed the concat_view branch 2 times, most recently from 1a3373a to e4ddb27 Compare June 22, 2026 08:50
Comment thread arrow-string/src/concat_elements.rs Outdated
.map(|len| len as usize)
.sum();

if data_size > u32::MAX as usize {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if data_size > u32::MAX as usize {
if data_size > i32::MAX as usize {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed I had missed that one as well. I've restructured the code a little bit to eliminate the duplicated check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants