Allow writing Variant to files and test parquet-variant IO#7945
Allow writing Variant to files and test parquet-variant IO#7945AdamGS wants to merge 12 commits into
Conversation
25733a8 to
589413b
Compare
f890749 to
040589f
Compare
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
273.1 µs | 307.8 µs | -11.27% |
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
224.9 µs | 187.5 µs | +19.9% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/variant-to-file (93cc893) with develop (19a1fb3)
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
5d2525a to
f5463e1
Compare
| let shredded = variant_array | ||
| .shredded() | ||
| .map(|arr| self.compress_physical_slots(arr, exec_ctx)) | ||
| .transpose()?; |
There was a problem hiding this comment.
Why do you explicitly enumare the slots here instead of allowing the array to decide? Shouldn't this be
self.compress(variant_array.core_storage(), exec_ctx)
variant_array
.shredded()
.map(|arr| self.compress(arr, exec_ctx))
.transpose()?
There was a problem hiding this comment.
IIRC because core_storage is potentially recursive, you end up with a stack overflow here.
There was a problem hiding this comment.
canonicalization for variant does something similar? this structure introduces some complexity for this sort of stuff.
There was a problem hiding this comment.
core_storage must be variant but I think shredded does not.
There was a problem hiding this comment.
pushed something, WDYT?
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
f5463e1 to
5efb775
Compare
|
I think you want similar fix in the canonical code path |
|
I agree, I have some other fixes there in the works as part of a separate PR. |
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
|
@robert3005 I've fixed the canonicalization issue if you want to take another look. |
Summary
Integrate Variant better into the compressor so that it can be used with the default write strategy.