Skip to content

perf: make HNSW cheaper to load#6798

Open
wombatu-kun wants to merge 2 commits into
lance-format:mainfrom
wombatu-kun:perf/hnsw-cheaper-load
Open

perf: make HNSW cheaper to load#6798
wombatu-kun wants to merge 2 commits into
lance-format:mainfrom
wombatu-kun:perf/hnsw-cheaper-load

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

@wombatu-kun wombatu-kun commented May 15, 2026

Summary

Closes #6746. Loading an HNSW partition no longer reconstructs a per-node Vec<GraphBuilderNode> / Vec<OrderedNode> graph. The loaded graph is now backed directly by the on-disk Arrow buffers, with neighbor adjacency served as zero-copy &[u32] slices straight out of the __neighbors ListArray value buffer. This unblocks a future zero-copy CacheCodec (#6745).

Motivation

Per #6746, loading an HNSW partition required expensive per-node reconstruction, which makes a zero-copy IPC CacheCodec (#6745) infeasible. The fix is to keep the Arrow data and offsets as the graph's backing store while preserving current search behavior and performance.

What changed

  • HnswCore now holds an HnswGraph enum instead of Arc<Vec<GraphBuilderNode>>: Built (in-memory, produced by the online builder / index_vectors — build path untouched) or Loaded (Arrow-backed, search-only).
  • LoadedHnswGraph retains the full RecordBatch plus per-level zero-copy ListArray neighbor views and a tiny per-upper-level id -> row lookup; the geometrically-shrinking upper levels keep these maps negligible.
  • Level 0 uses a Dense lookup (row == __vector_id, asserted in debug); upper levels use a Sparse map keyed by __vector_id value, exactly mirroring the old per-node load — including the known level_offsets quirk where the entry-point node is written by to_batch at every level but counted only at level 0, so upper-level slices are off-by-one and duplicate ids resolve last-write-wins.
  • The search loop is single-sourced across both backends via a local macro, keeping the existing Graph / BorrowingGraph seam; search is unchanged.
  • to_batch() on a loaded graph is a verbatim passthrough (re-stamped metadata only), so the IVF partition cache (ivf/partition_serde.rs, which re-serializes loaded indices) round-trips losslessly and Implement CacheCode for HNSW indices #6745 can write/read it through lance_arrow::ipc without rebuilding the graph.

Correctness & compatibility

  • Loaded-graph search is bit-identical to the in-memory build across L2 / Dot / Hamming and graph sizes (single node, pair, multi-level 2048).
  • Old load semantics are preserved bit-for-bit, including duplicate-id last-write-wins across a misaligned slice boundary; build -> to_batch -> load -> to_batch is byte-stable (b1 == b2).
  • No public API signature change. HNSW::nodes() now panics on a loaded graph (documented; GraphBuilderNode is internal API and there are no in-tree callers).

Benchmarks

criterion --quick, 100000×128, L2, k=100, ef=300 (rust/lance-index/benches/hnsw.rs). The "before" load_hnsw was measured by running this same bench against the parent commit's reconstruction-based builder.rs (only that file swapped), so it is a like-for-like HNSW::load comparison.

Benchmark Before (reconstruction load) After (Arrow-backed) Δ
load_hnsw(100000x128) ~127 ms ~90.8 µs ~1,400× faster
search_hnsw100000x128 (built, baseline) ~700.7 µs ~700.7 µs unchanged
search_hnsw_loaded100000x128 n/a ~690.4 µs on par with built (within noise)

Load drops from ~127 ms (allocating 100k GraphBuilderNodes + per-node OrderedNode adjacency) to ~91 µs (batch slice + tiny upper-level sparse maps), while search on the Arrow-backed graph stays on par with the in-memory build. Numbers are --quick/indicative; the ~3-orders-of-magnitude load delta is well outside noise. Re-run a full cargo bench before merge for headline figures.

Tests

All in rust/lance-index/src/vector/hnsw/builder.rs:

  • test_loaded_search_parity_and_recall (rstest: L2 single / L2 pair / L2 2048 / Dot 2048) — built vs loaded parity plus recall ≥ 0.5.
  • test_loaded_level_offsets_misalignment_invariant — pins the entry-point-written-at-every-level surplus (batch.num_rows() > sum(level_count)), the Dense level-0 precondition, and loaded↔built search parity despite the misalignment.
  • test_loaded_empty_index — 0-row to_batchload → empty graph round-trip.
  • test_to_batch_roundtrip_loaded — the IVF partition-cache path: to_batch on a loaded index is byte-stable and reloads/searches identically.
  • test_loaded_graph_is_arrow_backed — loaded graph is strictly lighter than the built representation.
  • Pre-existing test_builder_write_load (2048, L2, file round-trip) and test_builder_write_load_binary_hamming (256, Hamming) continue to pass unchanged.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 88.96797% with 31 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/vector/hnsw/builder.rs 88.96% 24 Missing and 7 partials ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 self-assigned this May 15, 2026
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

match &self.inner.graph {
HnswGraph::Built(nodes) => nodes.clone(),
HnswGraph::Loaded(_) => {
panic!("HNSW::nodes() is only available on a freshly built graph, not a loaded one")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to panic here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The census found no in-tree callers of nodes() anywhere (rust/, python/, java/), and nodes() was already being modified in this PR. Rather than rely on the panic being unreachable, I changed the signature to Option<Arc<Vec<GraphBuilderNode>>> — it now returns None for a disk-loaded (Arrow-backed) graph instead of panicking. Done in 719460f.

// backends; only the view types differ. A local macro keeps the
// search loop single-sourced without a generic helper whose return
// type would have to name the view lifetime.
macro_rules! run_search {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to use macro_rules.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 719460f

impl HNSW {
/// Whether the graph is the Arrow-backed (disk-loaded) representation
/// rather than the in-memory `Vec<GraphBuilderNode>` build representation.
pub(crate) fn is_loaded_graph(&self) -> bool {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need such function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Removed it in 719460f

@wombatu-kun wombatu-kun force-pushed the perf/hnsw-cheaper-load branch from de746e7 to 719460f Compare May 16, 2026 00:44
@wombatu-kun wombatu-kun requested a review from Xuanwo May 16, 2026 01:09
@wombatu-kun wombatu-kun force-pushed the perf/hnsw-cheaper-load branch 2 times, most recently from a5385e1 to f0cded9 Compare May 19, 2026 02:46
Vova Kolmakov and others added 2 commits May 20, 2026 14:46
Back the loaded HNSW graph directly by the on-disk Arrow buffers instead of reconstructing per-node Vec<GraphBuilderNode>/Vec<OrderedNode> on every load. Search is unchanged (same Graph/BorrowingGraph seam), the build path is untouched, and to_batch() on a loaded graph is a verbatim passthrough so the IVF partition-cache round-trip is preserved. This unblocks a future zero-copy CacheCodec (lance-format#6745).

Add a load_hnsw / search_hnsw_loaded benchmark (load latency plus search on the Arrow-backed graph, directly comparable to the existing built-graph search bench) and a test_loaded_level_offsets_misalignment_invariant regression test that pins the entry-point-written-at-every-level surplus the loaded path's Sparse upper-level lookup depends on.

Closes lance-format#6746

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wombatu-kun wombatu-kun force-pushed the perf/hnsw-cheaper-load branch from f0cded9 to 24d7099 Compare May 20, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make HNSW cheaper to load

3 participants