feat: add proto serialization for ANNIvfPartitionExec and ANNIvfSubIndexExec#6349
Open
LuQQiu wants to merge 7 commits intolance-format:mainfrom
Open
feat: add proto serialization for ANNIvfPartitionExec and ANNIvfSubIndexExec#6349LuQQiu wants to merge 7 commits intolance-format:mainfrom
LuQQiu wants to merge 7 commits intolance-format:mainfrom
Conversation
…dexExec Add protobuf definitions and encode/decode functions for distributed execution of ANN IVF plan nodes, following the FilteredReadExec pattern. Key design choices: - VectorQueryProto round-trips ALL Query fields using Arrow IPC for the key array (supports Float16/Float32/Float64/UInt8, not just Float32) - DistanceType uses Display/TryFrom<&str> instead of manual match - from_proto functions take Option<Arc<Dataset>> so callers can pass from cache or let it open from storage (same as FilteredReadExec) - ANNIvfSubIndexExec from_proto takes input + prefilter_source as params — codec on the caller side handles child extraction Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move table_identifier_from_dataset, table_identifier_from_dataset_with_manifest, and open_dataset_from_table_identifier into a shared table_identifier module so both filtered_read_proto and ann_ivf_proto use the same code - Re-export from filtered_read_proto for backwards compatibility - Remove PE-specific comments from ann_ivf.proto (open source doesn't need to know PE) - Add test_ann_ivf_sub_index_proto_roundtrip with a real IVF index Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename ann_ivf_proto.rs to knn_proto.rs to match knn.rs convention - Remove re-exports from filtered_read_proto, import from table_identifier directly - Add roundtrip tests for table_identifier (proto, dataset, manifest) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wkalt
reviewed
Mar 30, 2026
protos/ann_ivf.proto
Outdated
| VectorQueryProto query = 1; | ||
| TableIdentifier table = 2; | ||
| string index_name = 3; | ||
| repeated string segment_uuids = 4; |
Contributor
There was a problem hiding this comment.
are these called "segment_uuids" and "index_uuids" on the lance side? It would be better to call them "segment_ids"/"index_ids" if possible so we don't need to put the encoding in the name. But if this matches the lance side, ok.
Contributor
Author
There was a problem hiding this comment.
Good catch, make sure align with the original exec /struct names
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…full IndexMetadata - Rename `distance_type` to `metric_type` in VectorQueryProto (matches Query.metric_type) - Replace `index_name` + `segment_uuids` with `repeated bytes indices` in ANNIvfSubIndexExecProto, serializing full IndexMetadata via prost-encoded bytes from lance.table package (avoids lossy UUID-only serialization and removes load_indices_by_name roundtrip on deserialization) - Fix unused variable warning in make_indexed_dataset test helper - Move test imports to top of test module - Remove stale doc comment in table_identifier.rs - Strengthen sub-index roundtrip test to verify IndexMetadata fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
add proto serialization for ANNIvfPartitionExec and ANNIvfSubIndexExec
so the plan can be serialize and executed by remote distributed worker