feat(mem_wal): cache opened L0 flushed-generation datasets#6816
feat(mem_wal): cache opened L0 flushed-generation datasets#6816hamersaw wants to merge 3 commits into
Conversation
In the LSM scanner, every query against an L0 flushed generation re-opened that generation's Lance dataset from object storage at three identical sites (scan, point lookup, vector search), paying manifest read + metadata decode + index load each time. Add two opt-in, non-breaking pieces: - `with_session` plumbing on the scanner/planners so the first open of each generation populates and reuses the shared index/metadata caches (defaults to the base table's session). - `FlushedDatasetCache`: a moka-backed, single-flight cache of `Arc<Dataset>` keyed by resolved flushed path, injected by the consumer. After the first open, subsequent queries are a pure `Arc::clone` with zero object-store I/O. Flushed generations are written once to a globally-unique immutable path, so cached entries are never stale and need no TTL; `retain_paths` pruning at compaction is memory-only and correctness never depends on it. A single shared `open_flushed_dataset` helper covers all three sites; `None`/`None` reproduces the original cold-open exactly, so all existing tests pass untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
The cache-l0-reads change added `moka` to rust/lance and updated the root Cargo.lock, but python/ is a separate cargo workspace with its own lock. CI's "Lint Rust" step runs `cargo clippy --locked` from python/ and failed at lock resolution before clippy could run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| /// The key is the resolved absolute flushed path | ||
| /// (`{base}/_mem_wal/{shard}/{folder}`), which is globally unique, so a single | ||
| /// cache can safely span multiple tables. | ||
| pub struct FlushedDatasetCache { |
There was a problem hiding this comment.
Thanks for working on this. I think the Session plumbing is absolutely right, but I’m less convinced we should add FlushedDatasetCache to the Lance SDK.
My mental model is that each flushed memtable is already naturally a standalone Lance dataset. When that dataset is opened with the same Lance Session, the SDK-level caches should already cover the Lance-internal reusable state: object store registry/store reuse, file metadata cache, index cache, and index extensions. That part feels like the right SDK responsibility.
Caching the opened Dataset object itself feels like an application-level concern. The right owner of that cache is the calling service/application that knows the lifecycle of the L0 generations, compaction timing, memory budget, tenant/table boundaries, and whether a cache should be per-process, per-table, per-session, or scoped in some other way. I’d prefer not to make Lance SDK own that policy.
What do you think?
Problem
In the LSM scanner, every query against an L0 (frozen/flushed) generation re-opens that generation's Lance dataset from object storage. There are three identical cold-open sites — scan (
planner.rs), point lookup (point_lookup.rs), and vector search (vector_search.rs) — each doingDatasetBuilder::from_uri(path).load()with no session. Per query, per flushed generation, this pays: manifest version discovery + manifest read + decode, file-metadata decode, and scalar/vector index load. For an LSM tree, frozen generations are the single best caching target, yet they were the only data source paying full cold-open cost on every query.Key invariant
Flush writes each generation once to a globally-unique, content-addressed path (
memtable/flush.rs). Same path ⟹ same bytes, forever — a cache hit can never be stale. This is the rare cache that needs no TTL and no invalidation for correctness; pruning is desirable only to reclaim memory.Changes (OSS
lance)Two complementary, independently-useful, opt-in pieces — defaults preserve existing behavior exactly:
with_sessionplumbing — thread an existingArc<Session>into the scanner/planners so the first open of each generation populates and reuses the shared index + file-metadata caches.LsmScanner::newdefaults this to the base table's session;without_base_tabledefaults toNone.FlushedDatasetCache— amoka-backed, single-flight cache ofArc<Dataset>keyed by resolved flushed path, owned and sized by the consumer and injected per-request. After the first open, every subsequent query for that generation is a pureArc::clonewith zero object-store I/O.retain_paths(live_paths)prunes retired generations at compaction (memory-only; correctness never depends on it).A single shared
open_flushed_dataset(path, session, cache)helper replaces all three cold-open sites (repo rule: dedupe logic in 2+ places).None/Nonereproduces the original behavior precisely, so no existing test changes.data_source.rs/collector.rsare untouched — opening stays lazy inside the planner, preserving bloom-filter pruning on point lookups. Planner wiring uses chainablewith_session/with_flushed_cachebuilder methods rather than constructor changes, keepingnew()signatures (and every existing test/bench) untouched.Testing
FlushedDatasetCache: miss opens once; hit returns the sameArc(pointer eq); 16-way concurrentget_or_openopens exactly once (single-flight);retain_pathsdrops the right keys; no-cache path cold-opens each call.mem_wal::scannersuite (78 tests) passes untouched.cargo clippy -p lance --tests --benchesclean;cargo fmtclean.Notes
The
sophonconsumer side (process-bootstrap cache ownership, scanner wiring, compactionretain_paths) is out of scope for this PR. Phase 1 (with_session) is independently shippable ahead of the cache.🤖 Generated with Claude Code