feat: expose multi-base config to Python and Java write_fragments API#6855
feat: expose multi-base config to Python and Java write_fragments API#6855zhangyue19921010 wants to merge 3 commits into
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
@claude review |
There was a problem hiding this comment.
Multi-language feature plumbing that mirrors the existing dataset write pattern; tests cover the new path in all three languages, but worth a human eye on the Rust fragment write changes since they introduce an existing-dataset load and target-base resolution on a core write path.
Extended reasoning...
Overview
This PR exposes the multi-base configuration parameters (target_bases, initial_bases, base_store_params) on the fragment-level write API in both Python (write_fragments) and Java (Fragment.write() builder). It plumbs them through the Java JNI layer to the existing Rust extract_write_params machinery (which already supported them), and updates Rust FragmentCreateBuilder::write_fragments_v2_impl to load the existing dataset and resolve target bases when any of the three new params are supplied. The Python write_fragments also now inherits base_store_params from a passed LanceDataset instance if not explicitly provided.
Security risks
None observed. The new parameters control where fragment files are written and which object-store options are used; no auth, crypto, or permissions logic is touched. All new arguments are optional with backward-compatible defaults.
Level of scrutiny
This is a feature PR touching a core write path (rust/lance/src/dataset/fragment/write.rs) and JNI boundaries, so it warrants a human review even though the wiring is mechanical. The Rust change is small and closely mirrors the pattern already used in the dataset-level do_write_fragments call site (validate_and_resolve_target_bases + load existing dataset). The Java JNI changes are pass-through additions to existing signatures.
Other factors
- The bug-hunting system found no issues.
- Codecov reports 81% patch coverage, with 11 missing lines concentrated in
rust/lance/src/dataset/fragment/write.rs. - Tests added: Rust unit test (
test_write_fragments_with_target_base), two Java tests (testFragmentCreateWithMultiBaseParams,testFragmentWriteWithMultiBaseParams), and two Python tests for both the newbase_store_paramsinheritance and the existingwrite_fragments+target_basesflow. - The PR does not touch any CODEOWNER-protected files.
| dataset_uri = str(dataset_uri) | ||
| elif isinstance(dataset_uri, LanceDataset): | ||
| if base_store_params is None: | ||
| base_store_params = dataset_uri._base_store_params |
There was a problem hiding this comment.
Do we need to add a fallback logic when there is a base path not in the _base_store_params so that it can fall back to the default _storage_options?
if storage_options is None:
storage_options = dataset_uri._storage_options
For Python
For Java