Skip to content

Support column stats for Lance base files#18902

Closed
puneetdixit200 wants to merge 1 commit into
apache:masterfrom
puneetdixit200:fix/18758-lance-column-stats
Closed

Support column stats for Lance base files#18902
puneetdixit200 wants to merge 1 commit into
apache:masterfrom
puneetdixit200:fix/18758-lance-column-stats

Conversation

@puneetdixit200
Copy link
Copy Markdown

Describe the issue this Pull Request addresses

Closes #18758.

Summary and Changelog

This adds column range stats support for Lance base files.

  • Implement LanceUtils.readColumnStatsFromMetadata by reading projected Lance columns and collecting Hudi column range metadata from Spark records.
  • Route .lance base files through HoodieTableMetadataUtil.readColumnRangeMetadataFrom, so column stats and partition stats generation can use Lance files.
  • Add a Spark Lance reader test covering min, max, null count, value count, and ignored missing columns.

Impact

Enables metadata column stats and partition stats generation for Lance base files. No new public config or API.

Risk Level

Low. The change is scoped to Lance file-format stats reads and uses the existing Hudi column range metadata collector.

Local verification:

  • git diff --check
  • JAVA_HOME=/opt/homebrew/Cellar/openjdk@17/17.0.19/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/Cellar/openjdk@17/17.0.19/libexec/openjdk.jdk/Contents/Home/bin:$PATH mvn -pl hudi-common -am -DskipTests compile

Documentation Update

None.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@puneetdixit200
Copy link
Copy Markdown
Author

Closing this duplicate in favor of #18901, which has the fuller Lance stats implementation and focused test coverage.

@puneetdixit200 puneetdixit200 deleted the fix/18758-lance-column-stats branch June 2, 2026 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support column and partition stats with Lance file format

1 participant