vortex-row: convert_columns + tests + bench scaffolding by joseph-isaacs · Pull Request #7993 · vortex-data/vortex

joseph-isaacs · 2026-05-18T16:04:42Z

Part 8 of 25 in the stacked PR series adding vortex-row.

This PR contains exactly one commit; review just that diff in isolation.

What this commit does

Wires the RowSize/RowEncode scalar functions to the user-facing API:

convert_columns accepts a slice of input arrays and per-column SortFields, constructs RowEncodeOptions + VecExecutionArgs, and returns the encoded ListViewArray<u8>.
compute_row_sizes returns just the per-row sizes (the Struct { fixed: u32, var: u32 } output of RowSize).
initialize() now registers RowSize and RowEncode on the given session so they are reachable via the expression layer.

Tests cover sort-order round-trips for bool, primitive (i64 asc/desc, u32, f64), utf8, multi-column, nulls_first/last, struct sort-order, the single-buffer invariant of the ListView output, and the structural shape of RowSize. The bench file uses divan + mimalloc and reports throughput in GB/s for primitive_i64, utf8, and struct_mixed, each with an arrow-row baseline. Per-encoding fast-path scenarios gain their triplets in PR 3.

Baseline measurements at this commit (sample-count=10):

primitive_i64_vortex ~1.97 GB/s (vs arrow-row 4.12 GB/s)
utf8_vortex ~0.87 GB/s (vs arrow-row 1.56 GB/s)
struct_mixed_vortex ~0.95 GB/s (vs arrow-row 1.19 GB/s)

Stack

#	PR	Title	Branch
1	#7986	vortex-row: crate scaffolding	`claude/row-c01-crate-scaffolding`
2	#7987	vortex-row: add SortField and RowEncodeOptions	`claude/row-c02-sortfield-options`
3	#7988	vortex-row: codec for fixed-width canonical types	`claude/row-c03-codec-fixed-width`
4	#7989	vortex-row: codec for varlen canonical types	`claude/row-c04-codec-varlen`
5	#7990	vortex-row: codec for nested canonical types	`claude/row-c05-codec-nested`
6	#7991	vortex-row: compute_sizes helper and RowSize ScalarFn	`claude/row-c06-rowsize-scalarfn`
7	#7992	vortex-row: RowEncode ScalarFn	`claude/row-c07-rowencode-scalarfn`
8	#7993	vortex-row: convert_columns + tests + bench scaffolding	`claude/row-c08-convert-columns-tests-bench`
9	#7994	Skip ListView validation in row encoder output	`claude/row-c09-skip-listview-validation`
10	#7995	Add validity fast-path helper for the four pattern-matching encoders	`claude/row-c10-validity-fast-path`
11	#7996	Skip zero-init of output buffer	`claude/row-c11-skip-zero-init`
12	#7997	Auto-vectorize pure-fixed offsets construction	`claude/row-c12-vectorize-pure-fixed-offsets`
13	#7998	Auto-vectorize mixed-path offsets construction	`claude/row-c13-vectorize-mixed-offsets`
14	#7999	Rewrite varlen 32-byte block encoder with copy_nonoverlapping	`claude/row-c14-varlen-block-copy-nonoverlapping`
15	#8000	Walk VarBinView rows directly in row encoder hot loop	`claude/row-c15-walk-varbinview-directly`
16	#8001	Add arithmetic-write fast path for fixed-before-varlen columns	`claude/row-c16-arith-write-fast-path`
17	#8002	Specialize Constant for the arithmetic-write fast path	`claude/row-c17-specialize-constant-arith`
18	#8003	RowSizeKernel and RowEncodeKernel dispatch helpers	`claude/row-c18-kernel-dispatch-helpers`
19	#8004	Inventory-based registry for downstream encoding kernels	`claude/row-c19-inventory-registry`
20	#8005	Constant row-encode kernel	`claude/row-c20-constant-kernel`
21	#8006	Dict row-encode kernel	`claude/row-c21-dict-kernel`
22	#8007	Patched row-encode kernel	`claude/row-c22-patched-kernel`
23	#8008	RunEnd row-encode kernel (vortex-runend)	`claude/row-c23-runend-kernel`
24	#8009	BitPacked row-encode kernel (vortex-fastlanes)	`claude/row-c24-bitpacked-kernel`
25	#7985	FoR and Delta row-encode kernels (vortex-fastlanes)	`claude/row-pr3-kernels`

Base of this PR: #7992 (claude/row-c07-rowencode-scalarfn)
Next in stack: #7994 (claude/row-c09-skip-listview-validation)

Combined context

For the full design + rationale, see PR #7985 (top of stack).

Wire the RowSize/RowEncode scalar functions to the user-facing API: - `convert_columns` accepts a slice of input arrays and per-column SortFields, constructs `RowEncodeOptions` + `VecExecutionArgs`, and returns the encoded `ListViewArray<u8>`. - `compute_row_sizes` returns just the per-row sizes (the `Struct { fixed: u32, var: u32 }` output of `RowSize`). - `initialize()` now registers `RowSize` and `RowEncode` on the given session so they are reachable via the expression layer. Tests cover sort-order round-trips for bool, primitive (i64 asc/desc, u32, f64), utf8, multi-column, nulls_first/last, struct sort-order, the single-buffer invariant of the ListView output, and the structural shape of `RowSize`. Tests that exercise per-encoding fast paths (`constant_path_matches_canonical`, `dict_path_matches_canonical`) land together with their respective kernels in PR 3. The bench file uses divan + mimalloc and reports throughput in GB/s of encoded output bytes for primitive_i64, utf8, and struct_mixed. Each has an `arrow_row` baseline and a `vortex` measurement. Per-encoding fast-path scenarios (constant/dict/patched/bitpacked/for/delta) gain their triplets in PR 3. Baseline measurements at this commit (sample-count=10): primitive_i64_vortex ~1.97 GB/s (vs arrow-row 4.12 GB/s) utf8_vortex ~0.87 GB/s (vs arrow-row 1.56 GB/s) struct_mixed_vortex ~0.95 GB/s (vs arrow-row 1.19 GB/s) PR 2 closes most of the gap by replacing the validating `ListViewArray::try_new` with `new_unchecked`, skipping the buffer zero-init, auto-vectorizing the offsets and varlen-block paths, etc. Signed-off-by: Claude <noreply@anthropic.com>

codspeed-hq · 2026-05-18T16:42:36Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 1219 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_varbinview_canonical_into[(1000, 10)]`	161.9 µs	197.6 µs	-18.08%
⚡	Simulation	`chunked_varbinview_opt_canonical_into[(1000, 10)]`	224.9 µs	187.6 µs	+19.85%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing claude/row-c08-convert-columns-tests-bench (87febfe) with claude/row-c07-rowencode-scalarfn (40783a6)}

joseph-isaacs closed this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vortex-row: convert_columns + tests + bench scaffolding#7993

vortex-row: convert_columns + tests + bench scaffolding#7993
joseph-isaacs wants to merge 1 commit into
claude/row-c07-rowencode-scalarfnfrom
claude/row-c08-convert-columns-tests-bench

joseph-isaacs commented May 18, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joseph-isaacs commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this commit does

Stack

Combined context

Uh oh!

codspeed-hq Bot commented May 18, 2026

Merging this PR will not alter performance

Performance Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joseph-isaacs commented May 18, 2026 •

edited

Loading