feat: add MemWAL sharding evaluator by jackye1995 · Pull Request #6854 · lance-format/lance

jackye1995 · 2026-05-20T02:33:00Z

Adds an Arrow-native MemWAL sharding evaluator and exposes it through the Java API/JNI.

Evaluates MemWAL sharding specs against Arrow RecordBatch values for bucket, identity, and unsharded fields.
Resolves sharding source IDs through a Java-provided source-id-to-column map.
Adds Java-facing ShardingEvaluator returning an Arrow reader for the evaluated sharding key batch.

This is needed by lance-spark to route writes using Lance's sharding semantics instead of duplicating Spark-side bucket logic.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Lift bucket sharding initialization to persist the configured shard field independently from primary-key metadata.

Remove deprecated Region compatibility aliases from the Python MemWAL API and align raw bindings with Shard naming.

codecov · 2026-05-20T03:57:19Z

Codecov Report

❌ Patch coverage is 78.84131% with 84 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/dataset/mem_wal/sharding.rs	79.16%	68 Missing and 12 partials ⚠️
rust/lance/src/dataset/mem_wal/api.rs	50.00%	3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Xuanwo

I think the Python sharding spec round-trip needs one fix before this lands.

Xuanwo · 2026-05-20T07:18:31Z

+        field_id: get_py_value(field, "field_id")?.extract::<String>()?,
+        source_ids: get_py_value(field, "source_ids")?.extract::<Vec<i32>>()?,
+        transform: optional_string(get_py_value(field, "transform")?)?,
+        expression: optional_string(get_py_value(field, "expression")?)?,


This makes dict specs returned by Dataset.mem_wal_index_details() unusable with the new evaluator.

mem_wal_index_details() currently serializes each sharding field with field_id, source_ids, transform, result_type, and parameters, but it does not include expression. Since this parser now requires expression to be present, the natural flow below fails with Missing sharding spec field 'expression':

spec = ds.mem_wal_index_details()["sharding_specs"][0] evaluate_sharding_spec(batch, spec, LanceSchema.from_pyarrow(batch.schema))

Could we either include expression in the dict returned by mem_wal_index_details() or treat a missing expression key as None here? I think adding it to mem_wal_index_details() is cleaner because that keeps the exported spec shape complete and round-trippable.

Submitted as changes requested by mistake; intended as a non-blocking review comment.

feat: add memwal sharding evaluator

147b9ca

claude Bot reviewed May 20, 2026

View reviewed changes

github-actions Bot added enhancement New feature or request java labels May 20, 2026

feat: expose memwal sharding evaluator to python

043c0a8

Lift bucket sharding initialization to persist the configured shard field independently from primary-key metadata.

github-actions Bot added the python label May 20, 2026

feat: expose shard-only python memwal api

7eb4a29

Remove deprecated Region compatibility aliases from the Python MemWAL API and align raw bindings with Shard naming.

feat: derive memwal sharding sources from schema

e35a92a

jackye1995 mentioned this pull request May 20, 2026

feat: add bucket table support for bucketed writes and SPJ lance-format/lance-spark#519

Open

9 tasks

Xuanwo previously requested changes May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MemWAL sharding evaluator#6854

feat: add MemWAL sharding evaluator#6854
jackye1995 wants to merge 4 commits into
lance-format:mainfrom
jackye1995:jack/arrow-native-memwal-sharding

jackye1995 commented May 20, 2026

Uh oh!

claude Bot left a comment

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Xuanwo left a comment

Uh oh!

Xuanwo May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jackye1995 commented May 20, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Xuanwo left a comment

Choose a reason for hiding this comment

Uh oh!

Xuanwo May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 20, 2026 •

edited

Loading