[core] Fix DataEvolutionFileStoreScan schema-evolution filtering#8084
[core] Fix DataEvolutionFileStoreScan schema-evolution filtering#8084duanyyyyyyy wants to merge 1 commit into
Conversation
516c526 to
cdff3c1
Compare
JingsongLi
left a comment
There was a problem hiding this comment.
LGTM. I checked the pruning change against the data-evolution read path: using field ids here matches how DataEvolutionSplitRead merges fields, so renamed columns are kept while newly-added columns that are absent from old files can still be skipped.\n\nAlso verified the focused test locally:\n\n\nmvn -pl paimon-core -Pfast-build -Dtest=DataEvolutionFileStoreScanTest test\n\n\nTests run: 8, Failures: 0, Errors: 0, Skipped: 0.
Can hold on later? |
cdff3c1 to
5daa733
Compare
The "filter irrelevant column files" optimization (apache#6647) tried to skip DE-table files that carry none of a query's requested columns. The intent — saving I/O on columnar-split groups where a row's columns are split across multiple files — is correct, but the implementation lived in filterByStats(ManifestEntry), the single-entry layer that has no view of the row-id-range grouping and no access to inputFilter. Three concrete bugs fell out of that layer choice: 1) RENAME COLUMN silently dropped pre-rename files. The check ran on column names; the renamed field's latest name does not appear in the pre-rename file's writeCols, so the comparison evaluated to "no overlap" and dropped the file. 2) ADD COLUMN + a predicate on the new column silently dropped pre-ALTER files. The check has no view of inputFilter, so it cannot tell "WHERE new_col IS NULL must keep the file" from "WHERE new_col = 'x' can drop the file" — both look like "no requested column physically present". 3) ADD COLUMN + projection of only the new column silently dropped pre-ALTER files, losing rowCount rows that should have come back as NULL-filled. The check has no view of the row-id-range grouping, so it could not know that a file dropped from a row-disjoint group was the only carrier of its rows. Relocate the optimization to postFilterManifestEntries, where inputFilter and the row-id-range grouping are both available. The same column-pruning intent now reads: - Compare by field id, not by name. RENAME and TYPE CHANGE no longer matter because field id is the stable identity across schemas. - Drop a file from a group only when another file in the same group carries one of the requested columns. If no file in the group does, keep one as a row-count representative so the reader can emit rowCount NULL-filled rows for the projection. - Predicate-based stats pruning, when inputFilter is set, runs in the list-version filterByStats over the same groups (see below). Two supporting changes make the predicate path correct against all-NULL columns: - DataEvolutionArray gains an optional missingFieldLong fallback (default NO_MISSING_FIELD_FALLBACK preserves current behavior for non-stats consumers). When set, a position whose rowOffsets entry is -1 — meaning the field is absent from every file in the group — reports isNullAt=false and getLong=fallback. - evolutionStats passes groupRowCount as that fallback for nullCounts. LeafPredicate.test then sees nullCount == rowCount, short-circuits the "unknown stats -> keep" branch, and lets IsNull keep pre-ALTER files while IsNotNull / Equal / GreaterThan / etc. drop them. The sentinel rowOffsets entry -2 ("field exists in a file but its stats were not captured") keeps the previous "unknown" semantic. Effect: - WHERE new_col IS NULL: pre-ALTER files kept, reader NULL-fills, correct. - WHERE new_col IS NOT NULL / = / > / etc.: stats prune pre-ALTER files via the "all NULL" encoding. - WHERE renamed_col <predicate>: evolutionStats matches by field id, LeafPredicate sees real min/max, stats prune correctly. - SELECT new_col with no predicate: row-disjoint groups keep their single file (NULL-filled rows preserved); columnar-split groups drop files that carry no requested column. Coverage: DataEvolutionTableTest gains ten cases — - testAddColumnIsNullKeepsPreAlterRows - testAddColumnEqualityPredicatePrunesPreAlterFiles - testRenameColumnPredicateKeepsPreRenameRows - testNoFilterProjectionPrunesColumnarSplitFiles - testNoFilterProjectionKeepsRowDisjointFiles - testColumnarSplitWithPredicateOnFileAColumn - testColumnarSplitWithPredicateOnFileBColumn - testThreeWayColumnarSplitPruning - testMixedColumnarSplitAndRowDisjoint - testSystemFieldOnlyProjectionIsNotPruned covering the three central bug repros, single-column projection, mixed projections, three-way column splits, mixed columnar-split / row-disjoint topologies, predicate composition, and the system-field-only fallthrough. Repro for the central bug: CREATE TABLE t (id INT) WITH ('data-evolution.enabled'='true'); INSERT INTO t VALUES (1); ALTER TABLE t ADD COLUMN new_col STRING; INSERT INTO t VALUES (2, 'v'); SELECT COUNT(*) FROM t WHERE new_col IS NULL; -- before: 0 (pre-ALTER file dropped at single filterByStats) -- after: 1 Repro for the rename regression: CREATE TABLE t (id INT, score INT) WITH ('data-evolution.enabled'='true'); INSERT INTO t VALUES (1, 10); ALTER TABLE t RENAME COLUMN score TO grade; INSERT INTO t VALUES (2, 100); SELECT COUNT(*) FROM t WHERE grade < 60; -- before: 0 (pre-rename file dropped at single filterByStats) -- after: 1 Signed-off-by: duanyyyyyyy <yan.duan9759@gmail.com>
5daa733 to
0e2dd89
Compare
JingsongLi
left a comment
There was a problem hiding this comment.
LGTM after another pass. I rechecked the grouped pruning path for the field-id based matching, ADD COLUMN all-NULL stats fallback, projection-only post-filtering, and the row-tracking assumptions for data-evolution tables. I did not find any blocking issue.
I also verified the focused PR cases from a clean worktree:
mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=DataEvolutionTableTest#testAddColumnIsNullKeepsPreAlterRows+testAddColumnEqualityPredicatePrunesPreAlterFiles+testRenameColumnPredicateKeepsPreRenameRows+testNoFilterProjectionPrunesColumnarSplitFiles+testNoFilterProjectionKeepsRowDisjointFiles+testColumnarSplitWithPredicateOnFileAColumn+testColumnarSplitWithPredicateOnFileBColumn+testThreeWayColumnarSplitPruning+testMixedColumnarSplitAndRowDisjoint+testSystemFieldOnlyProjectionIsNotPruned testResult: Tests run: 10, Failures: 0, Errors: 0, Skipped: 0.
Purpose
The
filterByStats(ManifestEntry)per-entry "filter irrelevant column files"optimization for DE tables (#6647) lives in the wrong layer. It cannot see the
row-id-range grouping and has no access to
inputFilter, so it inevitablymakes wrong decisions that break correctness for several common queries:
RENAME COLUMN silently drops pre-rename files. The check compares by
column name; the renamed field's latest name does not appear in the
pre-rename file's
writeCols, so the file is dropped at manifest pruning.ADD COLUMN + a predicate on the new column silently drops pre-ALTER
files. Without
inputFilterin scope, the check cannot tellWHERE new_col IS NULL(must keep) fromWHERE new_col = 'x'(can drop) —both look like "no requested column physically present".
ADD COLUMN + projection only of the new column silently drops pre-ALTER
files, losing
rowCountNULL-fill rows that should have been emitted.Without the row-id-range grouping in scope, the single-entry layer cannot
know that the dropped file was the only carrier of its rows.
This PR relocates the optimization into
postFilterManifestEntries, whereboth
inputFilterand the row-id-range grouping are available. The samecolumn-pruning intent is preserved, but now expressed correctly:
matter because field id is the stable identity across schemas.
carries one of the requested columns. If no file in the group does, keep
one as a row-count representative so the reader can emit
rowCountNULL-filled rows for the projection.
inputFilteris set, runs in thelist-version
filterByStatsover the same groups.Two supporting changes make the predicate path correct against all-NULL
columns:
DataEvolutionArraygains an optionalmissingFieldLongfallback(default
NO_MISSING_FIELD_FALLBACKpreserves current behavior fornon-stats consumers). When set, a position whose
rowOffsets[pos] == -1— the field is absent from every file in the group — reports
isNullAt = falseandgetLong = fallback.evolutionStatspassesgroupRowCountas that fallback fornullCounts.LeafPredicate.testthen seesnullCount == rowCount, short-circuitsthe "unknown stats → keep" branch, and lets
IsNullkeep pre-ALTER fileswhile
IsNotNull/Equal/GreaterThan/ etc. drop them. The sentinelrowOffsets[pos] == -2("field exists in a file but its stats were notcaptured") keeps the previous "unknown" semantic.
Effect:
|---|---|---|
|
WHERE new_col IS NULL| pre-ALTER rows lost | correct, NULL-filled ||
WHERE new_col = 'x'| pre-ALTER rows lost; pre-ALTER files dropped at wrong layer | post-ALTER row(s) returned; pre-ALTER files pruned by stats ||
WHERE renamed_col < x| pre-rename rows lost | correct, stats-pruned by field id ||
SELECT new_colno predicate | pre-ALTER rows lost | row-disjoint files kept (NULL-fill); columnar-split files pruned per-group |Tests
DataEvolutionTableTestgains ten cases:testAddColumnIsNullKeepsPreAlterRows— central repro for bug (2)testAddColumnEqualityPredicatePrunesPreAlterFiles— verifies the "all NULL"stats encoding lets
Equaldrop pre-ALTER filestestRenameColumnPredicateKeepsPreRenameRows— central repro for bug (1)testNoFilterProjectionPrunesColumnarSplitFiles— columnar-split columnpruning
testNoFilterProjectionKeepsRowDisjointFiles— central repro for bug (3):row-disjoint single-file groups stay even when their carrier is the only
source of
rowCounttestColumnarSplitWithPredicateOnFileAColumn/testColumnarSplitWithPredicateOnFileBColumn— predicate-based grouppruning composes with per-file column pruning
testThreeWayColumnarSplitPruning— three files per row id range, onecolumn each
testMixedColumnarSplitAndRowDisjoint— both group topologies in one tabletestSystemFieldOnlyProjectionIsNotPruned— system-field-onlyreadTypeis filtered out of pruning, files flow through unchanged
All 35 cases in
DataEvolutionTableTestand 199 cases across DE / blob /global-index / append-only / commit suites pass with no pre-existing UT
regressions.