fix: small fixes for parquet reader including RowRanges, PreBuffer, ColumnIndexFilter and code cleanup#330
Open
zhf999 wants to merge 7 commits into
Open
fix: small fixes for parquet reader including RowRanges, PreBuffer, ColumnIndexFilter and code cleanup#330zhf999 wants to merge 7 commits into
zhf999 wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR makes several improvements to the Parquet reader path: it deduplicates a shared exception-to-Status conversion macro into a common header, fixes a missed WhenBuffered await in the row-group pre-buffer fallback path, simplifies empty-literals handling in ColumnIndexFilter via a single early return (with new tests), moves a column-name-to-index map build under its actual usage guard, and adds a move-aware constructor to RowRanges.
Changes:
- Centralize
PAIMON_PARQUET_CATCH_AND_RETURN_STATUSmacro inparquet_format_defs.hand remove duplicates. - Consolidate empty-literals handling in
ColumnIndexFilter::VisitLeafPredicatewith a conservative early return; add tests. - Add
WhenBufferedawait to the row-group pre-buffer fallback; add rvalue ctor toRowRanges; scopecolumn_name_to_indexto its use site.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/format/parquet/row_ranges.h | New rvalue-reference RowRanges constructor. |
| src/paimon/format/parquet/row_ranges.cpp | Use the new constructor with std::move in Union/Intersection. |
| src/paimon/format/parquet/parquet_format_defs.h | Hosts the shared catch-and-return-status macro. |
| src/paimon/format/parquet/parquet_file_batch_reader.cpp | Removes duplicated macro; scopes column-name map to page-index branch. |
| src/paimon/format/parquet/page_filtered_row_group_reader.cpp | Awaits WhenBuffered in the pre-buffer fallback path. |
| src/paimon/format/parquet/file_reader_wrapper.cpp | Removes duplicated macro and includes shared header. |
| src/paimon/format/parquet/column_index_filter.cpp | Early-return all rows for empty-literals comparison/IN cases. |
| src/paimon/format/parquet/column_index_filter_test.cpp | New tests covering empty-literals fallback for EQUAL and IN. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+29
to
+31
| catch (...) { \ | ||
| return Status::UnknownError((context), ": unknown error"); \ | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR includes several small fixes and improvements for the parquet reader module:
RowRangeslacked a move constructor, causing unnecessary copies.WhenBufferedis invoked afterPreBufferinPageFilteredRowGroupReaderto correctly wait for async I/O.ColumnIndexFilterto return all rows instead of incorrectly filtering when the literal value is empty.parquet_format_defs.h.column_name_to_indexmap to avoid unnecessary overhead when it's not used.Tests
ColumnIndexFilterto verify the behavior when literal is empty (column_index_filter_test.cpp).API and Format
No. This change does not affect any public API in the include directory, storage format, or protocol.
Documentation
No.
Generative AI tooling
Claude opus 4.7