Skip to content

feat: expose multi-base config to Python and Java write_fragments API#6855

Open
zhangyue19921010 wants to merge 3 commits into
lance-format:mainfrom
zhangyue19921010:support-multi-base-config-write-fragments
Open

feat: expose multi-base config to Python and Java write_fragments API#6855
zhangyue19921010 wants to merge 3 commits into
lance-format:mainfrom
zhangyue19921010:support-multi-base-config-write-fragments

Conversation

@zhangyue19921010
Copy link
Copy Markdown
Contributor

For Python

fragments = write_fragments(
    pa.table({"id": range(10, 20)}),
    dataset,
    mode="append",
    target_bases=["base2"],
    base_store_params=base_store_params,
)

For Java

  List<FragmentMetadata> fragments =
      Fragment.write()
          .allocator(allocator)
          .datasetUri(primary)
          .data(root2)
          .mode(WriteParams.WriteMode.APPEND)
          .targetBases(Arrays.asList("base2"))
          .baseStoreParams(baseStoreParams)
          .execute();

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added enhancement New feature or request python java labels May 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 86.66667% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/fragment/write.rs 86.66% 5 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

@yanghua
Copy link
Copy Markdown
Collaborator

yanghua commented May 20, 2026

@claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-language feature plumbing that mirrors the existing dataset write pattern; tests cover the new path in all three languages, but worth a human eye on the Rust fragment write changes since they introduce an existing-dataset load and target-base resolution on a core write path.

Extended reasoning...

Overview

This PR exposes the multi-base configuration parameters (target_bases, initial_bases, base_store_params) on the fragment-level write API in both Python (write_fragments) and Java (Fragment.write() builder). It plumbs them through the Java JNI layer to the existing Rust extract_write_params machinery (which already supported them), and updates Rust FragmentCreateBuilder::write_fragments_v2_impl to load the existing dataset and resolve target bases when any of the three new params are supplied. The Python write_fragments also now inherits base_store_params from a passed LanceDataset instance if not explicitly provided.

Security risks

None observed. The new parameters control where fragment files are written and which object-store options are used; no auth, crypto, or permissions logic is touched. All new arguments are optional with backward-compatible defaults.

Level of scrutiny

This is a feature PR touching a core write path (rust/lance/src/dataset/fragment/write.rs) and JNI boundaries, so it warrants a human review even though the wiring is mechanical. The Rust change is small and closely mirrors the pattern already used in the dataset-level do_write_fragments call site (validate_and_resolve_target_bases + load existing dataset). The Java JNI changes are pass-through additions to existing signatures.

Other factors

  • The bug-hunting system found no issues.
  • Codecov reports 81% patch coverage, with 11 missing lines concentrated in rust/lance/src/dataset/fragment/write.rs.
  • Tests added: Rust unit test (test_write_fragments_with_target_base), two Java tests (testFragmentCreateWithMultiBaseParams, testFragmentWriteWithMultiBaseParams), and two Python tests for both the new base_store_params inheritance and the existing write_fragments + target_bases flow.
  • The PR does not touch any CODEOWNER-protected files.

dataset_uri = str(dataset_uri)
elif isinstance(dataset_uri, LanceDataset):
if base_store_params is None:
base_store_params = dataset_uri._base_store_params
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a fallback logic when there is a base path not in the _base_store_params so that it can fall back to the default _storage_options?

if storage_options is None:
        storage_options = dataset_uri._storage_options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants