Skip to content

Conversation

@Tushar7012
Copy link
Contributor

Rationale for this change

The ArrowBytesViewMap is a high-performance structure used for interning string and binary views in operators like GROUP BY and COUNT DISTINCT. This PR addresses performance bottlenecks identified during maintainer review:

  1. Redundant Memory Access: Previously, the input value (values.value(i)) was fetched and converted to a byte slice multiple times during hash collisions and insertion. This added unnecessary overhead to the interning loop.
  2. Suboptimal Comparison: Inlined values (strings ≤ 12 bytes) were being converted to byte slices for comparison rather than leveraging direct u128 bitwise equality.

What changes are included in this PR?

  • Direct u128 View Comparison: Updated the interning logic to perform bitwise equality checks on the u128 views for inlined values, providing a single-cycle fast path.
  • "Single Fetch" Optimization: Refactored insert_if_new_inner to fetch and convert the input value exactly once per row. This value is reused for both lookup and insertion branches.
  • Optimized Comparison Sequence:
    1. Direct hash match.
    2. Bitwise u128 view equality for inlined strings (≤12 bytes).
    3. Prefix comparison for larger strings before falling back to full byte-slice comparison.
  • Generic Type Resolution: Fixed compilation errors by correctly leveraging GenericByteViewArray<B> to provide specialized access for StringView/BinaryView types.

Are these changes tested?

Yes. I have verified the changes using the existing unit test suite:

  • cargo test -p datafusion-physical-expr-common --lib binary_view_map::tests
  • Result: 8 passed; 0 failed

Are there any user-facing changes?

No. This is a internal performance optimization.

- Use values.views() instead of values.iter() for direct u128 access
- Use is_valid(i) for efficient null checking via validity bitmap
- Avoid dereferencing overhead for inline strings
- No additional memory overhead in Entry struct

Closes apache#19961
Following review feedback, changed from usize to u32 for the builder index field. This saves 4 bytes per entry on 64-bit systems while still supporting up to 4 billion distinct values - more than enough for practical workloads.
Per reviewer feedback, replaced Vec<bool> with BooleanBufferBuilder for tracking null values. This uses bit-packed storage (1 bit per value) instead of byte-per-value, reducing memory usage by 8x for the null bitmap. Also fixed clippy warnings for mem_replace_with_default.
…etch

This change optimizes the interning process in ArrowBytesViewMap by:\n1. Performing direct u128 view comparisons for inlined values.\n2. Fetching the input value only once per row to avoid redundant work during hash collisions and insertion.\n3. Addresses feedback from @Dandandan regarding redundant value fetches.
Copilot AI review requested due to automatic review settings January 28, 2026 22:30
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jan 28, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the ArrowBytesViewMap by replacing the GenericByteViewBuilder with manual buffer management, enabling direct u128 view comparisons for inline strings and reducing redundant memory access during hash collisions.

Changes:

  • Replaced GenericByteViewBuilder with manual buffer management (views, in_progress, completed, nulls vectors)
  • Implemented direct u128 bitwise equality checks for inline strings (≤12 bytes) as a fast path
  • Refactored comparison logic to use view-based comparisons before falling back to byte-slice comparison
Comments suppressed due to low confidence (5)

datafusion/physical-expr-common/src/binary_view_map.rs:447

  • For accurate memory accounting, the nulls size calculation should use self.nulls.capacity() * size_of::<bool>() to match the pattern used for in_progress_size at line 445 and to account for actual allocated capacity rather than just used length. Similarly, views_size at line 444 should consider using capacity for consistency, though this depends on whether the intent is to track allocated or used memory.
        let views_size = self.views.len() * size_of::<u128>();
        let in_progress_size = self.in_progress.capacity();
        let completed_size: usize = self.completed.iter().map(|b| b.len()).sum();
        let nulls_size = self.nulls.len();

datafusion/physical-expr-common/src/binary_view_map.rs:117

  • The constant BYTE_VIEW_MAX_BLOCK_SIZE is defined in an unusual location between the documentation comment and the struct definition. For better code organization and readability, this constant should be moved to the module level (e.g., after the imports around line 30) or before the struct's documentation comment. This follows Rust conventions where constants are typically defined at the module level or clearly separated from type definitions.
/// Max size of the in-progress buffer before flushing to completed buffers
const BYTE_VIEW_MAX_BLOCK_SIZE: usize = 2 * 1024 * 1024;

datafusion/physical-expr-common/src/binary_view_map.rs:386

  • The unsafe call to BinaryViewArray::new_unchecked lacks a SAFETY comment explaining why it's safe. Consider adding a comment explaining that: (1) all views are correctly constructed via make_view to point to valid data in the buffers, (2) buffer indices and offsets are valid by construction through append_value, and (3) the null buffer has the correct length matching the views. This is separate from the SAFETY comment for to_string_view_unchecked at line 391.
        let array =
            unsafe { BinaryViewArray::new_unchecked(views, self.completed, null_buffer) };

datafusion/physical-expr-common/src/binary_view_map.rs:415

  • The cast to u32 at line 414 could theoretically overflow if self.completed.len() exceeds u32::MAX. While this is extremely unlikely in practice (would require billions of 2MB buffers), consider adding a debug assertion like debug_assert!(self.completed.len() <= u32::MAX as usize, "Too many completed buffers") to catch this during development. The same applies to the offset cast at line 415.
            let buffer_index = self.completed.len() as u32;
            let offset = self.in_progress.len() as u32;

datafusion/physical-expr-common/src/binary_view_map.rs:382

  • The null buffer construction iterates over self.nulls twice: once to check if any nulls exist, and once to invert the boolean values. Consider optimizing by inverting and collecting in a single pass, then checking if all values are true to decide whether to create a NullBuffer. However, this is a minor performance consideration and may not be worth the code complexity.
        let null_buffer = if self.nulls.iter().any(|&is_null| is_null) {
            Some(NullBuffer::from(
                self.nulls
                    .iter()
                    .map(|&is_null| !is_null)
                    .collect::<Vec<_>>(),
            ))
        } else {
            None
        };

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 329 to 330
} else {
&in_progress[offset..offset + stored_len]
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition checking which buffer to read from should be more defensive. If buffer_index > completed.len(), this would incorrectly try to access in_progress with potentially invalid offsets. While this shouldn't happen with the current append logic, consider adding an assertion or explicit error handling to catch any future bugs. For example: else if buffer_index == completed.len() followed by else { panic!("Invalid buffer_index") }.

Suggested change
} else {
&in_progress[offset..offset + stored_len]
} else if buffer_index == completed.len() {
&in_progress[offset..offset + stored_len]
} else {
panic!("Invalid buffer_index: {}", buffer_index);

Copilot uses AI. Check for mistakes.
@Dandandan
Copy link
Contributor

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (6a73668) to 36c0cda diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@Dandandan
Copy link
Contributor

Run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (f105b53) to 36c0cda diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2480.89 ms │                    2428.81 ms │     no change │
│ QQuery 1 │   944.90 ms │                     843.83 ms │ +1.12x faster │
│ QQuery 2 │  1889.05 ms │                    1725.28 ms │ +1.09x faster │
│ QQuery 3 │  1052.65 ms │                    1070.28 ms │     no change │
│ QQuery 4 │  2198.62 ms │                    2194.78 ms │     no change │
│ QQuery 5 │ 28637.17 ms │                   28719.42 ms │     no change │
│ QQuery 6 │  4076.90 ms │                    4049.42 ms │     no change │
│ QQuery 7 │  2797.94 ms │                    2927.17 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 44078.13ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 43958.99ms │
│ Average Time (HEAD)                          │  5509.77ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5494.87ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.62 ms │                       2.69 ms │     no change │
│ QQuery 1  │    52.73 ms │                      54.12 ms │     no change │
│ QQuery 2  │   133.34 ms │                     136.30 ms │     no change │
│ QQuery 3  │   140.26 ms │                     159.87 ms │  1.14x slower │
│ QQuery 4  │   971.39 ms │                    1078.28 ms │  1.11x slower │
│ QQuery 5  │  1254.86 ms │                    1345.20 ms │  1.07x slower │
│ QQuery 6  │     9.45 ms │                       8.82 ms │ +1.07x faster │
│ QQuery 7  │    57.41 ms │                      58.04 ms │     no change │
│ QQuery 8  │  1351.61 ms │                    1445.92 ms │  1.07x slower │
│ QQuery 9  │  1700.21 ms │                    1823.44 ms │  1.07x slower │
│ QQuery 10 │   346.95 ms │                     355.66 ms │     no change │
│ QQuery 11 │   392.14 ms │                     406.56 ms │     no change │
│ QQuery 12 │  1201.48 ms │                    1273.44 ms │  1.06x slower │
│ QQuery 13 │  1913.26 ms │                    2027.02 ms │  1.06x slower │
│ QQuery 14 │  1222.62 ms │                    1260.92 ms │     no change │
│ QQuery 15 │  1146.42 ms │                    1228.10 ms │  1.07x slower │
│ QQuery 16 │  2463.52 ms │                    2557.65 ms │     no change │
│ QQuery 17 │  2435.25 ms │                    2502.09 ms │     no change │
│ QQuery 18 │  6631.66 ms │                    5006.87 ms │ +1.32x faster │
│ QQuery 19 │   124.39 ms │                     126.71 ms │     no change │
│ QQuery 20 │  1961.87 ms │                    1927.78 ms │     no change │
│ QQuery 21 │  2252.49 ms │                    2241.97 ms │     no change │
│ QQuery 22 │  8076.86 ms │                    3900.91 ms │ +2.07x faster │
│ QQuery 23 │ 22553.78 ms │                   12479.56 ms │ +1.81x faster │
│ QQuery 24 │   225.99 ms │                     218.63 ms │     no change │
│ QQuery 25 │   472.84 ms │                     477.00 ms │     no change │
│ QQuery 26 │   226.54 ms │                     225.63 ms │     no change │
│ QQuery 27 │  2766.21 ms │                    2693.38 ms │     no change │
│ QQuery 28 │ 25519.10 ms │                   23768.97 ms │ +1.07x faster │
│ QQuery 29 │   988.07 ms │                     961.67 ms │     no change │
│ QQuery 30 │  1254.70 ms │                    1296.86 ms │     no change │
│ QQuery 31 │  1346.86 ms │                    1352.48 ms │     no change │
│ QQuery 32 │  5152.56 ms │                    4326.84 ms │ +1.19x faster │
│ QQuery 33 │  6259.69 ms │                    5556.50 ms │ +1.13x faster │
│ QQuery 34 │  5972.95 ms │                    5839.08 ms │     no change │
│ QQuery 35 │  1863.14 ms │                    1944.44 ms │     no change │
│ QQuery 36 │   195.38 ms │                     200.12 ms │     no change │
│ QQuery 37 │    82.91 ms │                      81.56 ms │     no change │
│ QQuery 38 │   121.32 ms │                     122.68 ms │     no change │
│ QQuery 39 │   358.20 ms │                     375.62 ms │     no change │
│ QQuery 40 │    48.93 ms │                      46.61 ms │     no change │
│ QQuery 41 │    40.01 ms │                      41.85 ms │     no change │
│ QQuery 42 │    36.00 ms │                      36.74 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 111327.95ms │
│ Total Time (optimize-arrow-bytes-view-map)   │  92974.57ms │
│ Average Time (HEAD)                          │   2589.02ms │
│ Average Time (optimize-arrow-bytes-view-map) │   2162.20ms │
│ Queries Faster                               │           7 │
│ Queries Slower                               │           8 │
│ Queries with No Change                       │          28 │
│ Queries with Failure                         │           0 │
└──────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 101.37 ms │                     102.68 ms │    no change │
│ QQuery 2  │  34.02 ms │                      33.83 ms │    no change │
│ QQuery 3  │  40.25 ms │                      40.76 ms │    no change │
│ QQuery 4  │  31.05 ms │                      32.22 ms │    no change │
│ QQuery 5  │  90.06 ms │                      93.52 ms │    no change │
│ QQuery 6  │  20.87 ms │                      21.16 ms │    no change │
│ QQuery 7  │ 160.64 ms │                     166.19 ms │    no change │
│ QQuery 8  │  39.34 ms │                      42.78 ms │ 1.09x slower │
│ QQuery 9  │ 107.65 ms │                     107.30 ms │    no change │
│ QQuery 10 │  68.20 ms │                      69.09 ms │    no change │
│ QQuery 11 │  19.65 ms │                      19.18 ms │    no change │
│ QQuery 12 │  52.30 ms │                      52.23 ms │    no change │
│ QQuery 13 │  49.46 ms │                      51.41 ms │    no change │
│ QQuery 14 │  15.77 ms │                      15.82 ms │    no change │
│ QQuery 15 │  31.03 ms │                      30.90 ms │    no change │
│ QQuery 16 │  27.88 ms │                      28.99 ms │    no change │
│ QQuery 17 │ 146.52 ms │                     146.41 ms │    no change │
│ QQuery 18 │ 296.43 ms │                     295.10 ms │    no change │
│ QQuery 19 │  39.65 ms │                      40.26 ms │    no change │
│ QQuery 20 │  56.79 ms │                      58.93 ms │    no change │
│ QQuery 21 │ 184.65 ms │                     194.60 ms │ 1.05x slower │
│ QQuery 22 │  22.56 ms │                      22.90 ms │    no change │
└───────────┴───────────┴───────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1636.14ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1666.27ms │
│ Average Time (HEAD)                          │   74.37ms │
│ Average Time (optimize-arrow-bytes-view-map) │   75.74ms │
│ Queries Faster                               │         0 │
│ Queries Slower                               │         2 │
│ Queries with No Change                       │        20 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (f105b53) to 36c0cda diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2445.04 ms │                    2536.19 ms │     no change │
│ QQuery 1 │   973.35 ms │                     882.22 ms │ +1.10x faster │
│ QQuery 2 │  1953.55 ms │                    1764.09 ms │ +1.11x faster │
│ QQuery 3 │  1041.14 ms │                    1072.32 ms │     no change │
│ QQuery 4 │  2199.48 ms │                    2265.90 ms │     no change │
│ QQuery 5 │ 29112.82 ms │                   29218.84 ms │     no change │
│ QQuery 6 │  4155.12 ms │                    4097.98 ms │     no change │
│ QQuery 7 │  2918.67 ms │                    2777.71 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 44799.16ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 44615.25ms │
│ Average Time (HEAD)                          │  5599.89ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5576.91ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.60 ms │                       2.99 ms │  1.15x slower │
│ QQuery 1  │    53.01 ms │                      55.05 ms │     no change │
│ QQuery 2  │   139.78 ms │                     140.74 ms │     no change │
│ QQuery 3  │   158.47 ms │                     159.48 ms │     no change │
│ QQuery 4  │  1040.17 ms │                    1121.20 ms │  1.08x slower │
│ QQuery 5  │  1306.75 ms │                    1380.96 ms │  1.06x slower │
│ QQuery 6  │     8.69 ms │                       9.23 ms │  1.06x slower │
│ QQuery 7  │    56.93 ms │                      63.44 ms │  1.11x slower │
│ QQuery 8  │  1402.00 ms │                    1461.28 ms │     no change │
│ QQuery 9  │  1717.86 ms │                    1838.35 ms │  1.07x slower │
│ QQuery 10 │   345.84 ms │                     369.34 ms │  1.07x slower │
│ QQuery 11 │   390.67 ms │                     414.89 ms │  1.06x slower │
│ QQuery 12 │  1239.75 ms │                    1308.45 ms │  1.06x slower │
│ QQuery 13 │  1956.47 ms │                    2049.41 ms │     no change │
│ QQuery 14 │  1254.50 ms │                    1311.13 ms │     no change │
│ QQuery 15 │  1204.39 ms │                    1238.96 ms │     no change │
│ QQuery 16 │  2518.90 ms │                    2594.79 ms │     no change │
│ QQuery 17 │  2510.70 ms │                    2566.38 ms │     no change │
│ QQuery 18 │  5043.63 ms │                    4989.71 ms │     no change │
│ QQuery 19 │   124.10 ms │                     127.82 ms │     no change │
│ QQuery 20 │  1951.70 ms │                    1966.30 ms │     no change │
│ QQuery 21 │  2263.21 ms │                    2237.19 ms │     no change │
│ QQuery 22 │  3866.46 ms │                    3877.37 ms │     no change │
│ QQuery 23 │ 20191.14 ms │                   12575.62 ms │ +1.61x faster │
│ QQuery 24 │   219.95 ms │                     240.60 ms │  1.09x slower │
│ QQuery 25 │   502.01 ms │                     512.03 ms │     no change │
│ QQuery 26 │   246.92 ms │                     223.60 ms │ +1.10x faster │
│ QQuery 27 │  3219.46 ms │                    2754.39 ms │ +1.17x faster │
│ QQuery 28 │ 25176.04 ms │                   23925.23 ms │     no change │
│ QQuery 29 │   984.53 ms │                     990.80 ms │     no change │
│ QQuery 30 │  1349.17 ms │                    1317.50 ms │     no change │
│ QQuery 31 │  1379.06 ms │                    1357.17 ms │     no change │
│ QQuery 32 │  3730.30 ms │                    4323.99 ms │  1.16x slower │
│ QQuery 33 │  5724.61 ms │                    5732.13 ms │     no change │
│ QQuery 34 │  6215.14 ms │                    6040.15 ms │     no change │
│ QQuery 35 │  1981.14 ms │                    1982.16 ms │     no change │
│ QQuery 36 │   210.71 ms │                     202.49 ms │     no change │
│ QQuery 37 │    82.66 ms │                      82.14 ms │     no change │
│ QQuery 38 │   131.44 ms │                     120.33 ms │ +1.09x faster │
│ QQuery 39 │   382.69 ms │                     368.06 ms │     no change │
│ QQuery 40 │    47.85 ms │                      46.00 ms │     no change │
│ QQuery 41 │    43.41 ms │                      41.78 ms │     no change │
│ QQuery 42 │    37.07 ms │                      36.12 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 102411.87ms │
│ Total Time (optimize-arrow-bytes-view-map)   │  94156.74ms │
│ Average Time (HEAD)                          │   2381.67ms │
│ Average Time (optimize-arrow-bytes-view-map) │   2189.69ms │
│ Queries Faster                               │           4 │
│ Queries Slower                               │          11 │
│ Queries with No Change                       │          28 │
│ Queries with Failure                         │           0 │
└──────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 108.85 ms │                     102.73 ms │ +1.06x faster │
│ QQuery 2  │  33.95 ms │                      32.87 ms │     no change │
│ QQuery 3  │  38.12 ms │                      41.74 ms │  1.09x slower │
│ QQuery 4  │  31.71 ms │                      32.29 ms │     no change │
│ QQuery 5  │  92.34 ms │                      93.83 ms │     no change │
│ QQuery 6  │  21.09 ms │                      20.95 ms │     no change │
│ QQuery 7  │ 167.40 ms │                     163.55 ms │     no change │
│ QQuery 8  │  41.36 ms │                      43.97 ms │  1.06x slower │
│ QQuery 9  │ 110.24 ms │                     111.75 ms │     no change │
│ QQuery 10 │  68.69 ms │                      71.41 ms │     no change │
│ QQuery 11 │  20.02 ms │                      27.77 ms │  1.39x slower │
│ QQuery 12 │  52.26 ms │                      66.88 ms │  1.28x slower │
│ QQuery 13 │  50.39 ms │                      59.94 ms │  1.19x slower │
│ QQuery 14 │  15.52 ms │                      18.91 ms │  1.22x slower │
│ QQuery 15 │  31.42 ms │                      36.88 ms │  1.17x slower │
│ QQuery 16 │  28.54 ms │                      34.12 ms │  1.20x slower │
│ QQuery 17 │ 149.16 ms │                     184.82 ms │  1.24x slower │
│ QQuery 18 │ 290.39 ms │                     324.05 ms │  1.12x slower │
│ QQuery 19 │  39.66 ms │                      50.56 ms │  1.27x slower │
│ QQuery 20 │  57.75 ms │                      56.14 ms │     no change │
│ QQuery 21 │ 194.32 ms │                     189.52 ms │     no change │
│ QQuery 22 │  23.37 ms │                      23.20 ms │     no change │
└───────────┴───────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1666.56ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1787.90ms │
│ Average Time (HEAD)                          │   75.75ms │
│ Average Time (optimize-arrow-bytes-view-map) │   81.27ms │
│ Queries Faster                               │         1 │
│ Queries Slower                               │        11 │
│ Queries with No Change                       │        10 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor

run benchmarks

let len = view_u128 as u32;

// Check if value already exists
let mut value: Option<&[u8]> = None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you’re right to flag this.

The intent here was to defer materializing the byte slice until we know it’s needed, but the current structure makes that unclear and introduces unnecessary indirection.

I’ll refactor this so the slice is explicitly materialized once (at the top of the loop) and then reused consistently for both lookup and insertion paths. That should make the logic clearer and align better with the original optimization goal.

} else {
// no existing value, make a new one
let value: &[u8] = values.value(i).as_ref();
let value = value.unwrap_or_else(|| values.value(i).as_ref());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still accesses it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you’re absolutely right — this still ends up calling values.value(i) in the insert path.

That contradicts the goal of fetching the value once per row. I’ll rework this to eagerly bind the slice to a local variable and thread that through both the lookup and insert branches so values.value(i) is only accessed a single time.

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize-arrow-bytes-view-map (f105b53) to 36c0cda diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and optimize-arrow-bytes-view-map
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2390.72 ms │                    2487.46 ms │     no change │
│ QQuery 1 │   951.77 ms │                     826.81 ms │ +1.15x faster │
│ QQuery 2 │  1832.70 ms │                    1697.49 ms │ +1.08x faster │
│ QQuery 3 │  1029.58 ms │                    1056.66 ms │     no change │
│ QQuery 4 │  2203.75 ms │                    2206.13 ms │     no change │
│ QQuery 5 │ 28798.52 ms │                   28670.86 ms │     no change │
│ QQuery 6 │  4084.26 ms │                    4081.19 ms │     no change │
│ QQuery 7 │  2770.67 ms │                    2774.05 ms │     no change │
└──────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 44061.96ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 43800.65ms │
│ Average Time (HEAD)                          │  5507.74ms │
│ Average Time (optimize-arrow-bytes-view-map) │  5475.08ms │
│ Queries Faster                               │          2 │
│ Queries Slower                               │          0 │
│ Queries with No Change                       │          6 │
│ Queries with Failure                         │          0 │
└──────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ optimize-arrow-bytes-view-map ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.64 ms │                       2.68 ms │     no change │
│ QQuery 1  │    54.68 ms │                      53.11 ms │     no change │
│ QQuery 2  │   135.68 ms │                     134.49 ms │     no change │
│ QQuery 3  │   153.43 ms │                     162.89 ms │  1.06x slower │
│ QQuery 4  │  1050.62 ms │                    1089.93 ms │     no change │
│ QQuery 5  │  1308.25 ms │                    1352.81 ms │     no change │
│ QQuery 6  │    10.90 ms │                       9.52 ms │ +1.15x faster │
│ QQuery 7  │    58.23 ms │                      56.99 ms │     no change │
│ QQuery 8  │  1439.89 ms │                    1433.85 ms │     no change │
│ QQuery 9  │  1846.96 ms │                    1831.40 ms │     no change │
│ QQuery 10 │   354.29 ms │                     361.44 ms │     no change │
│ QQuery 11 │   405.58 ms │                     404.62 ms │     no change │
│ QQuery 12 │  1246.32 ms │                    1287.55 ms │     no change │
│ QQuery 13 │  1941.48 ms │                    1975.75 ms │     no change │
│ QQuery 14 │  1268.29 ms │                    1256.32 ms │     no change │
│ QQuery 15 │  1221.27 ms │                    1218.46 ms │     no change │
│ QQuery 16 │  2592.15 ms │                    2582.85 ms │     no change │
│ QQuery 17 │  2548.07 ms │                    2531.45 ms │     no change │
│ QQuery 18 │  5594.86 ms │                    4971.89 ms │ +1.13x faster │
│ QQuery 19 │   125.75 ms │                     126.29 ms │     no change │
│ QQuery 20 │  1920.06 ms │                    1898.67 ms │     no change │
│ QQuery 21 │  2262.85 ms │                    2205.14 ms │     no change │
│ QQuery 22 │  3876.30 ms │                    3846.07 ms │     no change │
│ QQuery 23 │ 23526.78 ms │                   12329.21 ms │ +1.91x faster │
│ QQuery 24 │   233.06 ms │                     216.03 ms │ +1.08x faster │
│ QQuery 25 │   476.28 ms │                     487.90 ms │     no change │
│ QQuery 26 │   228.29 ms │                     222.29 ms │     no change │
│ QQuery 27 │  2825.10 ms │                    2698.31 ms │     no change │
│ QQuery 28 │ 25000.05 ms │                   23635.98 ms │ +1.06x faster │
│ QQuery 29 │   980.67 ms │                     964.77 ms │     no change │
│ QQuery 30 │  1303.02 ms │                    1255.09 ms │     no change │
│ QQuery 31 │  1384.04 ms │                    1345.53 ms │     no change │
│ QQuery 32 │  4856.95 ms │                    4440.78 ms │ +1.09x faster │
│ QQuery 33 │  5721.48 ms │                    5765.76 ms │     no change │
│ QQuery 34 │  6041.19 ms │                    5986.54 ms │     no change │
│ QQuery 35 │  1956.55 ms │                    1878.35 ms │     no change │
│ QQuery 36 │   203.02 ms │                     200.77 ms │     no change │
│ QQuery 37 │    81.99 ms │                      80.74 ms │     no change │
│ QQuery 38 │   122.96 ms │                     122.79 ms │     no change │
│ QQuery 39 │   368.81 ms │                     355.43 ms │     no change │
│ QQuery 40 │    46.15 ms │                      45.93 ms │     no change │
│ QQuery 41 │    40.06 ms │                      41.90 ms │     no change │
│ QQuery 42 │    36.27 ms │                      35.35 ms │     no change │
└───────────┴─────────────┴───────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 106851.28ms │
│ Total Time (optimize-arrow-bytes-view-map)   │  92903.62ms │
│ Average Time (HEAD)                          │   2484.91ms │
│ Average Time (optimize-arrow-bytes-view-map) │   2160.55ms │
│ Queries Faster                               │           6 │
│ Queries Slower                               │           1 │
│ Queries with No Change                       │          36 │
│ Queries with Failure                         │           0 │
└──────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ optimize-arrow-bytes-view-map ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 102.07 ms │                     102.47 ms │    no change │
│ QQuery 2  │  34.22 ms │                      33.67 ms │    no change │
│ QQuery 3  │  40.92 ms │                      41.54 ms │    no change │
│ QQuery 4  │  30.93 ms │                      32.57 ms │ 1.05x slower │
│ QQuery 5  │  92.27 ms │                      92.95 ms │    no change │
│ QQuery 6  │  20.92 ms │                      20.86 ms │    no change │
│ QQuery 7  │ 161.52 ms │                     162.13 ms │    no change │
│ QQuery 8  │  42.01 ms │                      43.50 ms │    no change │
│ QQuery 9  │ 115.11 ms │                     110.37 ms │    no change │
│ QQuery 10 │  67.17 ms │                      71.33 ms │ 1.06x slower │
│ QQuery 11 │  19.91 ms │                      19.70 ms │    no change │
│ QQuery 12 │  52.96 ms │                      53.95 ms │    no change │
│ QQuery 13 │  49.58 ms │                      52.42 ms │ 1.06x slower │
│ QQuery 14 │  15.45 ms │                      15.77 ms │    no change │
│ QQuery 15 │  30.47 ms │                      32.21 ms │ 1.06x slower │
│ QQuery 16 │  29.10 ms │                      29.30 ms │    no change │
│ QQuery 17 │ 149.50 ms │                     149.86 ms │    no change │
│ QQuery 18 │ 287.92 ms │                     292.70 ms │    no change │
│ QQuery 19 │  39.68 ms │                      39.86 ms │    no change │
│ QQuery 20 │  58.49 ms │                      58.17 ms │    no change │
│ QQuery 21 │ 191.30 ms │                     192.44 ms │    no change │
│ QQuery 22 │  23.09 ms │                      23.06 ms │    no change │
└───────────┴───────────┴───────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                            │ 1654.61ms │
│ Total Time (optimize-arrow-bytes-view-map)   │ 1670.85ms │
│ Average Time (HEAD)                          │   75.21ms │
│ Average Time (optimize-arrow-bytes-view-map) │   75.95ms │
│ Queries Faster                               │         0 │
│ Queries Slower                               │         4 │
│ Queries with No Change                       │        18 │
│ Queries with Failure                         │         0 │
└──────────────────────────────────────────────┴───────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants