[ABA-13] fix(vortex-layout): gate is_truncated zonemap column on string dtype#8
Open
abnobdoss wants to merge 2 commits into
Open
[ABA-13] fix(vortex-layout): gate is_truncated zonemap column on string dtype#8abnobdoss wants to merge 2 commits into
abnobdoss wants to merge 2 commits into
Conversation
…r non-string columns Add failing test `issue_aba13_zonemap_skips_is_truncated_for_non_string` that asserts an i64 column's stats schema must not contain `max_is_truncated` / `min_is_truncated` fields. Confirms the bug: `StatNameArrayBuilder::finish` unconditionally emits constant-false truncation columns for every Max/Min stat regardless of dtype. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Abanoub Doss <abanoub.doss@gmail.com>
… dtype Truncation flags (`max_is_truncated` / `min_is_truncated`) are only meaningful for variable-length Utf8 and Binary min/max stats, where the stored bound may be a shorter prefix of the actual value. Numeric and other fixed-width types can never be truncated, so emitting constant-false truncation columns for them inflated every zone-map footer needlessly. Fix `stats_table_dtype` in schema.rs to guard the `*_is_truncated` fields on `matches!(dtype, DType::Utf8(_) | DType::Binary(_))`. Fix `StatNameArrayBuilder::finish` in builder.rs (used for all non-string Max/Min stats) to omit the truncation column entirely instead of emitting a constant-false `ConstantArray`. The `TruncatedMax/MinBinaryStatsBuilder` paths for Utf8/Binary are unchanged. Update three tests that documented the wrong behavior: - `schema.rs::stats_table_dtype_adds_truncation_flags` → renamed to `stats_table_dtype_no_truncation_flags_for_primitive` with inverted assertion. - `schema.rs::stats_table_dtype_uses_storage_dtype_for_extensions` → updated; a Date (i32-backed) extension should no longer carry truncation flags. - `builder.rs::always_adds_is_truncated_column` → renamed to `primitive_stats_do_not_include_truncation_columns` with inverted assertion. - `zone_map.rs::test_zone_map_prunes` → removed hardcoded `*_is_truncated` fields from the i32 test zone-map StructArray. Also adds a new `stats_table_dtype_adds_truncation_flags_for_string` test to lock the correct Utf8 path. Fixes ABA-13 / upstream vortex-data#7235. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Abanoub Doss <abanoub.doss@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
max_is_truncated/min_is_truncatedbool columns, inflating every file footer with constant-false data.Utf8andBinarymin/max stats, where a stored bound may be a truncated prefix of the actual value.*_is_truncatedcolumn emission onmatches!(dtype, DType::Utf8(_) | DType::Binary(_))in bothschema.rs(stats_table_dtype) andbuilder.rs(StatNameArrayBuilder::finish).test_zone_map_pruneszone-map test is updated to use the correct i32 schema.Linear
https://linear.app/abanoubdoss/issue/ABA-13
Validation
TDD followed:
issue_aba13_zonemap_skips_is_truncated_for_non_stringfirst — confirmed RED (schema containedmax_is_truncated/min_is_truncatedfor i64).cargo fmt(stable) run; nocargo clippy -D warningsdue to CI being disabled on fork.Pre-existing flaky tests (
test_chunked_evaluator,test_struct_layout_nested) fail intermittently due to async runtime races under parallel test execution; they pass when run in isolation or with--test-threads=1and are unrelated to this change.Attribution
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com