perf: make HNSW cheaper to load#6798
Open
wombatu-kun wants to merge 2 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Xuanwo
reviewed
May 15, 2026
Collaborator
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for working on this!
| match &self.inner.graph { | ||
| HnswGraph::Built(nodes) => nodes.clone(), | ||
| HnswGraph::Loaded(_) => { | ||
| panic!("HNSW::nodes() is only available on a freshly built graph, not a loaded one") |
Contributor
Author
There was a problem hiding this comment.
Good question. The census found no in-tree callers of nodes() anywhere (rust/, python/, java/), and nodes() was already being modified in this PR. Rather than rely on the panic being unreachable, I changed the signature to Option<Arc<Vec<GraphBuilderNode>>> — it now returns None for a disk-loaded (Arrow-backed) graph instead of panicking. Done in 719460f.
| // backends; only the view types differ. A local macro keeps the | ||
| // search loop single-sourced without a generic helper whose return | ||
| // type would have to name the view lifetime. | ||
| macro_rules! run_search { |
Collaborator
There was a problem hiding this comment.
I prefer not to use macro_rules.
| impl HNSW { | ||
| /// Whether the graph is the Arrow-backed (disk-loaded) representation | ||
| /// rather than the in-memory `Vec<GraphBuilderNode>` build representation. | ||
| pub(crate) fn is_loaded_graph(&self) -> bool { |
Collaborator
There was a problem hiding this comment.
Do we really need such function?
de746e7 to
719460f
Compare
a5385e1 to
f0cded9
Compare
Back the loaded HNSW graph directly by the on-disk Arrow buffers instead of reconstructing per-node Vec<GraphBuilderNode>/Vec<OrderedNode> on every load. Search is unchanged (same Graph/BorrowingGraph seam), the build path is untouched, and to_batch() on a loaded graph is a verbatim passthrough so the IVF partition-cache round-trip is preserved. This unblocks a future zero-copy CacheCodec (lance-format#6745). Add a load_hnsw / search_hnsw_loaded benchmark (load latency plus search on the Arrow-backed graph, directly comparable to the existing built-graph search bench) and a test_loaded_level_offsets_misalignment_invariant regression test that pins the entry-point-written-at-every-level surplus the loaded path's Sparse upper-level lookup depends on. Closes lance-format#6746 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f0cded9 to
24d7099
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #6746. Loading an HNSW partition no longer reconstructs a per-node
Vec<GraphBuilderNode>/Vec<OrderedNode>graph. The loaded graph is now backed directly by the on-disk Arrow buffers, with neighbor adjacency served as zero-copy&[u32]slices straight out of the__neighborsListArrayvalue buffer. This unblocks a future zero-copyCacheCodec(#6745).Motivation
Per #6746, loading an HNSW partition required expensive per-node reconstruction, which makes a zero-copy IPC
CacheCodec(#6745) infeasible. The fix is to keep the Arrow data and offsets as the graph's backing store while preserving current search behavior and performance.What changed
HnswCorenow holds anHnswGraphenum instead ofArc<Vec<GraphBuilderNode>>:Built(in-memory, produced by the online builder /index_vectors— build path untouched) orLoaded(Arrow-backed, search-only).LoadedHnswGraphretains the fullRecordBatchplus per-level zero-copyListArrayneighbor views and a tiny per-upper-levelid -> rowlookup; the geometrically-shrinking upper levels keep these maps negligible.Denselookup (row == __vector_id, asserted in debug); upper levels use aSparsemap keyed by__vector_idvalue, exactly mirroring the old per-nodeload— including the knownlevel_offsetsquirk where the entry-point node is written byto_batchat every level but counted only at level 0, so upper-level slices are off-by-one and duplicate ids resolve last-write-wins.Graph/BorrowingGraphseam; search is unchanged.to_batch()on a loaded graph is a verbatim passthrough (re-stamped metadata only), so the IVF partition cache (ivf/partition_serde.rs, which re-serializes loaded indices) round-trips losslessly and Implement CacheCode for HNSW indices #6745 can write/read it throughlance_arrow::ipcwithout rebuilding the graph.Correctness & compatibility
loadsemantics are preserved bit-for-bit, including duplicate-id last-write-wins across a misaligned slice boundary;build -> to_batch -> load -> to_batchis byte-stable (b1 == b2).HNSW::nodes()now panics on a loaded graph (documented;GraphBuilderNodeis internal API and there are no in-tree callers).Benchmarks
criterion --quick, 100000×128, L2, k=100, ef=300 (rust/lance-index/benches/hnsw.rs). The "before"load_hnswwas measured by running this same bench against the parent commit's reconstruction-basedbuilder.rs(only that file swapped), so it is a like-for-likeHNSW::loadcomparison.load_hnsw(100000x128)search_hnsw100000x128(built, baseline)search_hnsw_loaded100000x128Load drops from ~127 ms (allocating 100k
GraphBuilderNodes + per-nodeOrderedNodeadjacency) to ~91 µs (batch slice + tiny upper-level sparse maps), while search on the Arrow-backed graph stays on par with the in-memory build. Numbers are--quick/indicative; the ~3-orders-of-magnitude load delta is well outside noise. Re-run a fullcargo benchbefore merge for headline figures.Tests
All in
rust/lance-index/src/vector/hnsw/builder.rs:test_loaded_search_parity_and_recall(rstest: L2 single / L2 pair / L2 2048 / Dot 2048) — built vs loaded parity plus recall ≥ 0.5.test_loaded_level_offsets_misalignment_invariant— pins the entry-point-written-at-every-level surplus (batch.num_rows() > sum(level_count)), the Dense level-0 precondition, and loaded↔built search parity despite the misalignment.test_loaded_empty_index— 0-rowto_batch→load→ empty graph round-trip.test_to_batch_roundtrip_loaded— the IVF partition-cache path:to_batchon a loaded index is byte-stable and reloads/searches identically.test_loaded_graph_is_arrow_backed— loaded graph is strictly lighter than the built representation.test_builder_write_load(2048, L2, file round-trip) andtest_builder_write_load_binary_hamming(256, Hamming) continue to pass unchanged.