Skip to content

Allow writing Variant to files and test parquet-variant IO#7945

Open
AdamGS wants to merge 12 commits into
developfrom
adamg/variant-to-file
Open

Allow writing Variant to files and test parquet-variant IO#7945
AdamGS wants to merge 12 commits into
developfrom
adamg/variant-to-file

Conversation

@AdamGS
Copy link
Copy Markdown
Contributor

@AdamGS AdamGS commented May 15, 2026

Summary

Integrate Variant better into the compressor so that it can be used with the default write strategy.

Comment thread vortex-file/src/strategy.rs Outdated
@AdamGS AdamGS force-pushed the adamg/variant-to-file branch 2 times, most recently from f890749 to 040589f Compare May 15, 2026 15:48
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 15, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 1235 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_canonical_into[(100, 100)] 273.1 µs 307.8 µs -11.27%
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 224.9 µs 187.5 µs +19.9%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing adamg/variant-to-file (93cc893) with develop (19a1fb3)

Open in CodSpeed

@AdamGS AdamGS requested review from mprammer and robert3005 May 20, 2026 11:02
@AdamGS AdamGS changed the title [WIP] Actually write Variant to files with default strategy. Allow writing Variant to files and test parquet-variant IO May 20, 2026
@AdamGS AdamGS marked this pull request as ready for review May 20, 2026 11:03
AdamGS added 10 commits May 20, 2026 12:03
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS force-pushed the adamg/variant-to-file branch from 5d2525a to f5463e1 Compare May 20, 2026 11:03
robert3005
robert3005 previously approved these changes May 20, 2026
@robert3005 robert3005 dismissed their stale review May 20, 2026 11:17

hold on a sec

let shredded = variant_array
.shredded()
.map(|arr| self.compress_physical_slots(arr, exec_ctx))
.transpose()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you explicitly enumare the slots here instead of allowing the array to decide? Shouldn't this be

self.compress(variant_array.core_storage(), exec_ctx)
variant_array
                    .shredded()
                    .map(|arr| self.compress(arr, exec_ctx))
                    .transpose()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC because core_storage is potentially recursive, you end up with a stack overflow here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canonicalization for variant does something similar? this structure introduces some complexity for this sort of stuff.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core_storage must be variant but I think shredded does not.

Copy link
Copy Markdown
Contributor Author

@AdamGS AdamGS May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed something, WDYT?

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS force-pushed the adamg/variant-to-file branch from f5463e1 to 5efb775 Compare May 20, 2026 14:52
@robert3005
Copy link
Copy Markdown
Contributor

I think you want similar fix in the canonical code path

@AdamGS
Copy link
Copy Markdown
Contributor Author

AdamGS commented May 20, 2026

I agree, I have some other fixes there in the works as part of a separate PR.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS
Copy link
Copy Markdown
Contributor Author

AdamGS commented May 20, 2026

@robert3005 I've fixed the canonicalization issue if you want to take another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants