Skip to content

fix(java): add diagnostics for imported Arrow streams#6875

Open
beinan wants to merge 1 commit into
lance-format:mainfrom
beinan:user/beinan/core-arrow-reader-diagnostics
Open

fix(java): add diagnostics for imported Arrow streams#6875
beinan wants to merge 1 commit into
lance-format:mainfrom
beinan:user/beinan/core-arrow-reader-diagnostics

Conversation

@beinan
Copy link
Copy Markdown
Contributor

@beinan beinan commented May 20, 2026

Summary

Add Lance Java diagnostics around Arrow C Data stream readers and a scanner regression for the nested struct case that motivated the lance-spark fix.

Findings

The original VectorLoader failure can be caused by consumers closing or mutating the reader-owned VectorSchemaRoot between loadNextBatch() calls. When the damaged vector is a StructVector, Arrow Java may see schema children still present while the reused vector has zero children, producing should have as many children as in the schema: found 0 expected N.

Changes

  • Add LanceArrowReaders, a small wrapper for Arrow readers imported from Lance Arrow C Data streams.
  • Use the wrapper for Java APIs that call Data.importArrayStream(...), including LanceScanner, AsyncScanner, SQL query output, delta output, and file reads.
  • Preserve the underlying reader behavior while adding Lance context to batch-load failures, including the child-count mismatch diagnostic.
  • Add a Java scanner regression that writes a six-field all-null struct and scans it across multiple batches.
  • Add a unit test for the diagnostic message.

Testing

cd java
./mvnw -Dtest=org.lance.ScannerTest#testDatasetScannerAllNullStructColumnMultipleBatches,org.lance.ipc.LanceArrowReadersTest test

This run passed Java checkstyle, Spotless, both targeted Java tests, the JNI build, and the rust-maven-plugin JNI test.

@github-actions github-actions Bot added bug Something isn't working java labels May 20, 2026
@beinan beinan marked this pull request as ready for review May 21, 2026 20:35
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant