Skip to content

Conversation

@callmepandey
Copy link
Contributor

Summary

Add support for projecting Arrow LargeListArray (64-bit offsets) in addition to the existing ListArray (32-bit offsets) support. This enables handling of lists with more than 2^31-1 total child elements.

Changes

  • Add templated ProjectListArrayImpl<> function for code reuse between list types
  • Add ProjectLargeListArray wrapper function for 64-bit offset lists
  • Update ProjectNestedArray to dispatch to appropriate handler based on Arrow type
  • Add test cases for LargeListArray projection

Test Results

[ RUN      ] ProjectRecordBatchTest.LargeListOfIntegers
[       OK ] ProjectRecordBatchTest.LargeListOfIntegers (0 ms)
[ RUN      ] ProjectRecordBatchTest.LargeListOfStructs
[       OK ] ProjectRecordBatchTest.LargeListOfStructs (0 ms)

All 59 tests pass.

Test Plan

  • Added LargeListOfIntegers test - verifies projection of LargeListArray<int32>
  • Added LargeListOfStructs test - verifies projection of LargeListArray with nested struct elements
  • Verified all existing tests still pass

Closes #502

Add support for projecting Arrow LargeListArray (64-bit offsets) in
addition to the existing ListArray (32-bit offsets) support. This
enables handling of lists with more than 2^31-1 total child elements.

Changes:
- Add templated ProjectListArrayImpl<> to handle both list types
- Add ProjectLargeListArray wrapper function
- Update ProjectNestedArray to dispatch to appropriate handler
- Add test cases for LargeListArray projection

Test results:
- LargeListOfIntegers: PASSED (0 ms)
- LargeListOfStructs: PASSED (0 ms)

Closes apache#502
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @callmepandey! This is just the data conversion. Do you want to create a separate PR to enable it in the parquet reader implementation?

@wgtmac wgtmac merged commit deec370 into apache:main Jan 15, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Arrow LargeListArray in Parquet data projection

2 participants