[claude] bench: measure FlatBuffers verification cost for Layout and Array#8014
[claude] bench: measure FlatBuffers verification cost for Layout and Array#8014joseph-isaacs wants to merge 2 commits into
Conversation
Adds a divan benchmark in vortex-layout/benches/flatbuffer_verify.rs
that compares root::<T> (checked), root_with_opts::<T>, and
root_unchecked::<T> for representative Layout and Array shapes.
Results on this machine (medians, release):
Layout (per file open):
1 x 8 (2.0 KB) -> 726 ns checked, 2.6 ns unchecked
16 x 32 (34 KB) -> 33.3 us checked
128 x 32 (295 KB) -> 277 us checked
1024 x 32 (2.5 MB)-> 2.23 ms checked
Array (per SerializedArray decode and per buffer_lengths() call):
8 fields (2.0 KB) -> 636 ns checked
100 fields(6.0 KB) -> 5.8 us checked
1000 fields(44 KB) -> 56 us checked
Key findings:
- root_unchecked is ~3 ns regardless of size; the verifier IS the cost.
- root_with_opts is the same cost as root - the Vortex VerifierOptions
knob is a DoS bound, not a perf knob.
- Verifier walks roughly O(table count + buffer size), ~100 ns/KB.
Motivates dropping the redundant root::<Array> re-verification inside
SerializedArray::buffer_lengths() (vortex-array/src/serde.rs:514), which
runs on an already-validated buffer.
Signed-off-by: Claude <noreply@anthropic.com>
…uffer_lengths `SerializedArray::buffer_lengths()` was calling `root::<fba::Array>()` on every invocation, which re-runs the full FlatBuffer verifier across the array tree. The buffer is already verified once at construction time by `validate_array_tree` (line 527), and the rest of `SerializedArray` already exploits this invariant - see `from_flatbuffer_and_segment_with_overrides` at line 614, which uses `fba::root_as_array_unchecked` with the same safety justification. Switch `buffer_lengths` to `fba::root_as_array_unchecked` for consistency. Measured on the same workload with the new buffer_lengths bench: n_fields fixed legacy(root::<Array>) speedup 1 26 ns 237 ns 9x 8 35 ns 772 ns 22x 32 59 ns 2.56 us 43x 100 140 ns 7.75 us 55x 1000 1.13 us 73.7 us 65x Post-fix cost grows with n_fields purely from the Vec<usize> allocation + iteration to extract buffer descriptor lengths; the verifier overhead is gone. Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| 🆕 | Simulation | buffer_lengths_fixed[1000] |
N/A | 16.2 µs | N/A |
| 🆕 | Simulation | buffer_lengths_fixed[1] |
N/A | 1.7 µs | N/A |
| 🆕 | Simulation | buffer_lengths_fixed[100] |
N/A | 3 µs | N/A |
| 🆕 | Simulation | buffer_lengths_fixed[32] |
N/A | 2.1 µs | N/A |
| 🆕 | Simulation | buffer_lengths_fixed[8] |
N/A | 1.8 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[1000] |
N/A | 336.3 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[100] |
N/A | 39 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[1] |
N/A | 6.4 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[32] |
N/A | 16.6 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[8] |
N/A | 8.8 µs | N/A |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
273.3 µs | 308 µs | -11.28% |
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
224.8 µs | 187.7 µs | +19.75% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/flatbuffers-memory-safety-XKbWQ (172e0c7) with develop (7b47788)
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.949x ➖ datafusion / vortex-file-compressed (0.949x ➖, 2↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.958x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.959x ➖, 1↑ 0↓)
datafusion / parquet (0.932x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.935x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.976x ➖, 0↑ 0↓)
duckdb / parquet (0.941x ➖, 2↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.934x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.945x ➖, 0↑ 0↓)
datafusion / parquet (0.961x ➖, 1↑ 0↓)
datafusion / arrow (0.926x ➖, 6↑ 0↓)
duckdb / vortex-file-compressed (0.964x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.952x ➖, 1↑ 0↓)
duckdb / parquet (0.965x ➖, 1↑ 0↓)
duckdb / duckdb (0.970x ➖, 0↑ 1↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.018x ➖, 0↑ 5↓)
datafusion / vortex-compact (1.007x ➖, 0↑ 1↓)
datafusion / parquet (1.009x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.008x ➖, 1↑ 2↓)
duckdb / vortex-compact (1.011x ➖, 0↑ 2↓)
duckdb / parquet (1.009x ➖, 0↑ 2↓)
duckdb / duckdb (1.024x ➖, 0↑ 9↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.054x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.976x ➖, 0↑ 0↓)
datafusion / parquet (1.066x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.137x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.053x ➖, 0↑ 0↓)
duckdb / parquet (1.038x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: Likely regression (medium confidence) duckdb / vortex-file-compressed (1.292x ❌, 1↑ 7↓)
duckdb / vortex-compact (1.470x ❌, 1↑ 7↓)
duckdb / parquet (0.980x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.945x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.941x ➖, 0↑ 0↓)
datafusion / parquet (0.947x ➖, 0↑ 0↓)
datafusion / arrow (0.909x ➖, 8↑ 0↓)
duckdb / vortex-file-compressed (0.936x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.962x ➖, 0↑ 0↓)
duckdb / parquet (0.958x ➖, 0↑ 0↓)
duckdb / duckdb (0.973x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.083x ➖, 0↑ 2↓)
datafusion / vortex-compact (1.052x ➖, 0↑ 2↓)
datafusion / parquet (1.082x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.030x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.030x ➖, 0↑ 0↓)
duckdb / parquet (1.085x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.895x ✅ unknown / unknown (1.035x ➖, 13↑ 5↓)
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.003x ➖, 1↑ 0↓)
datafusion / parquet (0.997x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.980x ➖, 3↑ 0↓)
duckdb / parquet (1.007x ➖, 0↑ 0↓)
duckdb / duckdb (1.004x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: CompressionVortex (geomean): 1.001x ➖ unknown / unknown (0.986x ➖, 3↑ 1↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.142x ➖, 0↑ 5↓)
datafusion / vortex-compact (1.044x ➖, 0↑ 2↓)
datafusion / parquet (1.048x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.026x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.995x ➖, 0↑ 0↓)
duckdb / parquet (1.042x ➖, 0↑ 0↓)
Full attributed analysis
|
Adds a divan benchmark in vortex-layout/benches/flatbuffer_verify.rs
that compares root:: (checked), root_with_opts::, and
root_unchecked:: for representative Layout and Array shapes.
Results on this machine (medians, release):
Layout (per file open):
1 x 8 (2.0 KB) -> 726 ns checked, 2.6 ns unchecked
16 x 32 (34 KB) -> 33.3 us checked
128 x 32 (295 KB) -> 277 us checked
1024 x 32 (2.5 MB)-> 2.23 ms checked
Array (per SerializedArray decode and per buffer_lengths() call):
8 fields (2.0 KB) -> 636 ns checked
100 fields(6.0 KB) -> 5.8 us checked
1000 fields(44 KB) -> 56 us checked
Key findings:
knob is a DoS bound, not a perf knob.
Motivates dropping the redundant root:: re-verification inside
SerializedArray::buffer_lengths() (vortex-array/src/serde.rs:514), which
runs on an already-validated buffer.
Signed-off-by: Claude noreply@anthropic.com