Skip to content

fix: small fixes for parquet reader including RowRanges, PreBuffer, ColumnIndexFilter and code cleanup#330

Open
zhf999 wants to merge 7 commits into
alibaba:mainfrom
zhf999:zhf-small-fix
Open

fix: small fixes for parquet reader including RowRanges, PreBuffer, ColumnIndexFilter and code cleanup#330
zhf999 wants to merge 7 commits into
alibaba:mainfrom
zhf999:zhf-small-fix

Conversation

@zhf999
Copy link
Copy Markdown
Contributor

@zhf999 zhf999 commented Jun 2, 2026

Purpose

This PR includes several small fixes and improvements for the parquet reader module:

  1. fix: add move constructor for RowRangesRowRanges lacked a move constructor, causing unnecessary copies.
  2. fix: add WhenBuffered after PreBuffer is called — Ensured WhenBuffered is invoked after PreBuffer in PageFilteredRowGroupReader to correctly wait for async I/O.
  3. fix(ColumnIndexFilter): return all rows when literal is empty — Fixed ColumnIndexFilter to return all rows instead of incorrectly filtering when the literal value is empty.
  4. style: delete duplicated macro and move it to common header files — Removed duplicated macro definitions from multiple cpp files and consolidated them into parquet_format_defs.h.
  5. fix: compute 'column_name_to_index' only when needed — Deferred the computation of column_name_to_index map to avoid unnecessary overhead when it's not used.

Tests

  • Added unit tests for ColumnIndexFilter to verify the behavior when literal is empty (column_index_filter_test.cpp).
  • Other changes are minor fixes and refactoring with existing test coverage.

API and Format

No. This change does not affect any public API in the include directory, storage format, or protocol.

Documentation

No.

Generative AI tooling

Claude opus 4.7

Copilot AI review requested due to automatic review settings June 2, 2026 06:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR makes several improvements to the Parquet reader path: it deduplicates a shared exception-to-Status conversion macro into a common header, fixes a missed WhenBuffered await in the row-group pre-buffer fallback path, simplifies empty-literals handling in ColumnIndexFilter via a single early return (with new tests), moves a column-name-to-index map build under its actual usage guard, and adds a move-aware constructor to RowRanges.

Changes:

  • Centralize PAIMON_PARQUET_CATCH_AND_RETURN_STATUS macro in parquet_format_defs.h and remove duplicates.
  • Consolidate empty-literals handling in ColumnIndexFilter::VisitLeafPredicate with a conservative early return; add tests.
  • Add WhenBuffered await to the row-group pre-buffer fallback; add rvalue ctor to RowRanges; scope column_name_to_index to its use site.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/paimon/format/parquet/row_ranges.h New rvalue-reference RowRanges constructor.
src/paimon/format/parquet/row_ranges.cpp Use the new constructor with std::move in Union/Intersection.
src/paimon/format/parquet/parquet_format_defs.h Hosts the shared catch-and-return-status macro.
src/paimon/format/parquet/parquet_file_batch_reader.cpp Removes duplicated macro; scopes column-name map to page-index branch.
src/paimon/format/parquet/page_filtered_row_group_reader.cpp Awaits WhenBuffered in the pre-buffer fallback path.
src/paimon/format/parquet/file_reader_wrapper.cpp Removes duplicated macro and includes shared header.
src/paimon/format/parquet/column_index_filter.cpp Early-return all rows for empty-literals comparison/IN cases.
src/paimon/format/parquet/column_index_filter_test.cpp New tests covering empty-literals fallback for EQUAL and IN.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/format/parquet/row_ranges.h Outdated
Comment thread src/paimon/format/parquet/row_ranges.cpp Outdated
Comment on lines +29 to +31
catch (...) { \
return Status::UnknownError((context), ": unknown error"); \
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants