feat: support batch flat vector queries#6828
Conversation
Add a flat KNN batch query path so callers can submit multiple query vectors and share scan work while preserving per-query top-k results. Co-authored-by: Cursor <cursoragent@cursor.com>
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Fold batch flat KNN into the existing nearest and KNN execution paths so the public API and plan nodes stay consistent with reviewer feedback. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Updated based on review feedback:
Local disk benchmark result for 8 queries, 50k rows, dim=4: separate mean 3.8834 ms vs batch mean 3.3045 ms, about 1.17x speedup. The gain is modest on local disk because the repeated reads are served from OS page cache. |
Use a larger local-disk dataset and stream benchmark data generation so batch query gains are measured under a more realistic scan workload. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Updated the benchmark scale per feedback:
Local result with OS cache accepted:
This is meaningfully higher than the previous small-data local run (~1.17x), which matches the expectation that larger scan workloads show more benefit from sharing read/decode work across queries. |
Allow the local-disk batch KNN benchmark to vary row count, dimensionality, and query count so PR results can show scaling trends. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Added a controlled benchmark matrix to make the trend clearer. Query-count scaling at 1M rows x 512d:
Dataset-size scaling at m=10, 512d:
So the relative speedup clearly increases with m. For dataset size, the absolute time saved grows from ~71 ms to ~600 ms while relative speedup stays above 2x on local disk with OS cache effects accepted. The benchmark is now parameterized with env vars so these rows can be reproduced without editing source. |
| let row_id = row_ids | ||
| .as_ref() | ||
| .map(|row_ids| row_ids.value(row_index)) | ||
| .unwrap_or(fallback_row_id + row_index as u64); |
There was a problem hiding this comment.
I don't think this would happen
There was a problem hiding this comment.
_rowid fallback was removed and batch mode now requires _rowid
| ); | ||
| } | ||
|
|
||
| fn bench_batch_flat_knn(c: &mut Criterion) { |
There was a problem hiding this comment.
can we port this to be in Python?
There was a problem hiding this comment.
it has been ported to: python/python/benchmarks/test_search.py:227
| DataType::List(_) | DataType::FixedSizeList(_, _) => { | ||
| if !matches!(vector_type, DataType::List(_)) { | ||
| return Err(Error::invalid_input(format!( | ||
| "Query is multivector but column {}({})is not multivector", |
There was a problem hiding this comment.
Can you explain more how this distinguishes between multivector query and query batch?
There was a problem hiding this comment.
Batch-vs-multivector is distinguished by the vector column type: list-like q + List column means one multivector query; list-like q + FixedSizeList column means a batch of single-vector queries.
There was a problem hiding this comment.
batch-vs-multivector is decided by vector column type with comments added in Scanner::nearest lines 1467-1475
Use the LanceDB-compatible query_index result column and move the batch flat KNN benchmark to Python so benchmark scaling can be reproduced from the binding API. Co-authored-by: Cursor <cursoragent@cursor.com>
Apply rustfmt output expected by CI for the batch query binding change. Co-authored-by: Cursor <cursoragent@cursor.com>
Move batch flat KNN benchmark configuration into pytest parameters so review and reproduction do not rely on environment variables. Co-authored-by: Cursor <cursoragent@cursor.com>
BubbleCal
left a comment
There was a problem hiding this comment.
- distance_range param is lost if it's a batch query
- this forces the query to be executed by flat KNN even there's an index, we still need to use the index if there is one (just query the index for each query vector).
plz add tests for verifying they are really fixed
Route batched queries through vector indices when available and apply distance range bounds before per-query top-k selection on the flat path. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
if the query is with:
it's expected to return an empty result, but the schema should still contain |
| In that case Lance runs a flat batch KNN query, returns up to ``k`` rows | ||
| for each query vector, and adds ``query_index`` to identify the source | ||
| query for each result row. Indexed/ANN batch search is not used in this | ||
| first implementation. |
There was a problem hiding this comment.
this comments look not correct
| q: QueryVectorLike | ||
| The query vector. | ||
| The query vector. For fixed-size vector columns, this may be a 2-D | ||
| array-like batch of query vectors. Batch queries run flat KNN, apply |
Summary
Scanner::nearestAPI to accept batched query vectors for fixed-size vector columns.KNNVectorDistanceExec, returning one stream with up tom * krows andquery_indexto identify each query's results.Closes #6821.
Benchmark
Python benchmark command:
Dataset size, dimensionality, query count, batch size, and rounds are declared in the benchmark's
@pytest.mark.parametrizevalues. Adjust those parameters inpython/benchmarks/test_search.pyto reproduce the scaling rows below.Dataset: random float32 vectors written to a real local
.lancedataset. Nomemory://dataset and no throttled/simulated object store latency. OS page cache effects are accepted.Query Count Scaling
Fixed dataset: 1,000,000 rows, dim=512, k=10. This is about 1.9 GiB of raw vector values.
m)This shows the expected trend that batching becomes more valuable as
mincreases: the shared scan/decode work is amortized over more query vectors.Dataset Size Scaling
Fixed query count: m=10, dim=512, k=10.
m)On local disk with OS page cache, relative speedup is not strictly monotonic with row count because both plans become increasingly dominated by the same cached vector decoding and distance-compute work. The robust trend in this setup is absolute time saved, which grows from ~71 ms to ~600 ms as dataset size grows.
Test plan
cargo test -p lance test_batch_knn_flat_results_include_query_indexcargo clippy -p lance --tests --benches -- -D warningsALL_FEATURES=... cargo clippy --profile ci --locked --features ${ALL_FEATURES} --tests -- -D warningsuv run pytest python/tests/test_vector_index.py::test_batch_flat_query_matches_repeated_single_queriesuv run --extra benchmarks pytest --collect-only python/benchmarks/test_search.py::test_batch_flat_knncargo fmt --all -- --checkuv run ruff format --check --diff python/benchmarks/test_search.py python/lance/dataset.py python/tests/test_vector_index.py && uv run ruff check python/benchmarks/test_search.py python/lance/dataset.py python/tests/test_vector_index.py