Skip to content

feat: add proto serialization for ANNIvfPartitionExec and ANNIvfSubIndexExec#6349

Open
LuQQiu wants to merge 7 commits intolance-format:mainfrom
LuQQiu:lu/ann-ivf-proto
Open

feat: add proto serialization for ANNIvfPartitionExec and ANNIvfSubIndexExec#6349
LuQQiu wants to merge 7 commits intolance-format:mainfrom
LuQQiu:lu/ann-ivf-proto

Conversation

@LuQQiu
Copy link
Copy Markdown
Contributor

@LuQQiu LuQQiu commented Mar 30, 2026

add proto serialization for ANNIvfPartitionExec and ANNIvfSubIndexExec
so the plan can be serialize and executed by remote distributed worker

LuQQiu and others added 3 commits March 30, 2026 11:53
…dexExec

Add protobuf definitions and encode/decode functions for distributed
execution of ANN IVF plan nodes, following the FilteredReadExec pattern.

Key design choices:
- VectorQueryProto round-trips ALL Query fields using Arrow IPC for the
  key array (supports Float16/Float32/Float64/UInt8, not just Float32)
- DistanceType uses Display/TryFrom<&str> instead of manual match
- from_proto functions take Option<Arc<Dataset>> so callers can pass
  from cache or let it open from storage (same as FilteredReadExec)
- ANNIvfSubIndexExec from_proto takes input + prefilter_source as
  params — codec on the caller side handles child extraction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move table_identifier_from_dataset, table_identifier_from_dataset_with_manifest,
  and open_dataset_from_table_identifier into a shared table_identifier module
  so both filtered_read_proto and ann_ivf_proto use the same code
- Re-export from filtered_read_proto for backwards compatibility
- Remove PE-specific comments from ann_ivf.proto (open source doesn't need to know PE)
- Add test_ann_ivf_sub_index_proto_roundtrip with a real IVF index

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename ann_ivf_proto.rs to knn_proto.rs to match knn.rs convention
- Remove re-exports from filtered_read_proto, import from table_identifier directly
- Add roundtrip tests for table_identifier (proto, dataset, manifest)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LuQQiu LuQQiu requested review from Xuanwo and westonpace March 30, 2026 19:54
@github-actions github-actions bot added the enhancement New feature or request label Mar 30, 2026
VectorQueryProto query = 1;
TableIdentifier table = 2;
string index_name = 3;
repeated string segment_uuids = 4;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these called "segment_uuids" and "index_uuids" on the lance side? It would be better to call them "segment_ids"/"index_ids" if possible so we don't need to put the encoding in the name. But if this matches the lance side, ok.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, make sure align with the original exec /struct names

LuQQiu and others added 4 commits March 30, 2026 13:05
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…full IndexMetadata

- Rename `distance_type` to `metric_type` in VectorQueryProto (matches Query.metric_type)
- Replace `index_name` + `segment_uuids` with `repeated bytes indices` in
  ANNIvfSubIndexExecProto, serializing full IndexMetadata via prost-encoded bytes
  from lance.table package (avoids lossy UUID-only serialization and removes
  load_indices_by_name roundtrip on deserialization)
- Fix unused variable warning in make_indexed_dataset test helper
- Move test imports to top of test module
- Remove stale doc comment in table_identifier.rs
- Strengthen sub-index roundtrip test to verify IndexMetadata fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 90.42553% with 45 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/io/exec/knn_proto.rs 89.97% 14 Missing and 21 partials ⚠️
rust/lance/src/io/exec/table_identifier.rs 93.57% 1 Missing and 6 partials ⚠️
rust/lance/src/io/exec/knn.rs 75.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants