Skip to content

[ABA-13] fix(vortex-layout): gate is_truncated zonemap column on string dtype#8

Open
abnobdoss wants to merge 2 commits into
developfrom
fix/aba-13-zonemap-is-truncated-string-only
Open

[ABA-13] fix(vortex-layout): gate is_truncated zonemap column on string dtype#8
abnobdoss wants to merge 2 commits into
developfrom
fix/aba-13-zonemap-is-truncated-string-only

Conversation

@abnobdoss
Copy link
Copy Markdown
Owner

Summary

  • Zone-map stats schemas for non-string/binary columns (integers, floats, dates, etc.) were unconditionally carrying max_is_truncated / min_is_truncated bool columns, inflating every file footer with constant-false data.
  • Truncation flags are only meaningful for Utf8 and Binary min/max stats, where a stored bound may be a truncated prefix of the actual value.
  • Fix gates the *_is_truncated column emission on matches!(dtype, DType::Utf8(_) | DType::Binary(_)) in both schema.rs (stats_table_dtype) and builder.rs (StatNameArrayBuilder::finish).
  • Three existing tests that locked the wrong behavior are renamed and rewritten; a new positive test for the string path is added; the test_zone_map_prunes zone-map test is updated to use the correct i32 schema.

Linear

https://linear.app/abanoubdoss/issue/ABA-13

Validation

cargo test -p vortex-layout -- --test-threads=1
# result: 91 passed; 0 failed

TDD followed:

  1. Wrote issue_aba13_zonemap_skips_is_truncated_for_non_string first — confirmed RED (schema contained max_is_truncated/min_is_truncated for i64).
  2. Applied fix — confirmed GREEN (all 91 tests pass).
  3. cargo fmt (stable) run; no cargo clippy -D warnings due to CI being disabled on fork.

Pre-existing flaky tests (test_chunked_evaluator, test_struct_layout_nested) fail intermittently due to async runtime races under parallel test execution; they pass when run in isolation or with --test-threads=1 and are unrelated to this change.

Attribution

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Abanoub Doss and others added 2 commits May 21, 2026 08:11
…r non-string columns

Add failing test `issue_aba13_zonemap_skips_is_truncated_for_non_string` that
asserts an i64 column's stats schema must not contain `max_is_truncated` /
`min_is_truncated` fields. Confirms the bug: `StatNameArrayBuilder::finish`
unconditionally emits constant-false truncation columns for every Max/Min stat
regardless of dtype.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Abanoub Doss <abanoub.doss@gmail.com>
… dtype

Truncation flags (`max_is_truncated` / `min_is_truncated`) are only meaningful
for variable-length Utf8 and Binary min/max stats, where the stored bound may
be a shorter prefix of the actual value.  Numeric and other fixed-width types
can never be truncated, so emitting constant-false truncation columns for them
inflated every zone-map footer needlessly.

Fix `stats_table_dtype` in schema.rs to guard the `*_is_truncated` fields on
`matches!(dtype, DType::Utf8(_) | DType::Binary(_))`.

Fix `StatNameArrayBuilder::finish` in builder.rs (used for all non-string Max/Min
stats) to omit the truncation column entirely instead of emitting a constant-false
`ConstantArray`.  The `TruncatedMax/MinBinaryStatsBuilder` paths for Utf8/Binary
are unchanged.

Update three tests that documented the wrong behavior:
- `schema.rs::stats_table_dtype_adds_truncation_flags` → renamed to
  `stats_table_dtype_no_truncation_flags_for_primitive` with inverted assertion.
- `schema.rs::stats_table_dtype_uses_storage_dtype_for_extensions` → updated; a
  Date (i32-backed) extension should no longer carry truncation flags.
- `builder.rs::always_adds_is_truncated_column` → renamed to
  `primitive_stats_do_not_include_truncation_columns` with inverted assertion.
- `zone_map.rs::test_zone_map_prunes` → removed hardcoded `*_is_truncated`
  fields from the i32 test zone-map StructArray.

Also adds a new `stats_table_dtype_adds_truncation_flags_for_string` test to
lock the correct Utf8 path.

Fixes ABA-13 / upstream vortex-data#7235.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Abanoub Doss <abanoub.doss@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant