Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces memory-based eviction across the revision cache and (when enabled) the delta cache, shifting eviction orchestration into RevisionCacheOrchestrator via a shared per-shard CacheMemoryController. This aims to keep combined cache memory usage within RevisionCacheOptions.MaxBytes and adds/updates tests around eviction and memory accounting.
Changes:
- Add a new
CacheMemoryControllerto track combined rev+delta cache memory and drive memory-based eviction. - Refactor
RevisionCacheOrchestratorto construct and coordinate sub-caches and perform cross-cache LRU eviction. - Update revision-cache APIs/tests to accommodate new return values and add new eviction/accounting test coverage.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| db/cache_memory_controller.go | Adds shared per-shard memory controller and global access ordering for cross-cache eviction decisions. |
| db/revision_cache_orchestrator.go | Refactors orchestrator construction and adds memory-eviction loop across rev+delta caches. |
| db/revision_cache_lru.go | Reworks revision cache memory accounting with lifecycle states and integrates memory controller. |
| db/delta_cache_lru.go | Integrates delta cache with shared memory controller and exposes cross-cache eviction helpers. |
| db/revision_cache_interface.go | Updates the RevisionCache interface signatures and revises cache construction. |
| db/revision_cache_bypass.go | Updates bypass implementation to match the new interface signature. |
| db/revision_cache_test.go | Updates existing tests and adds extensive new tests for memory-based eviction/accounting. |
| db/delta_cache_test.go | Updates tests for new Get signature and adds memory accounting/eviction tests. |
| db/util_testing.go | Updates a test helper (PutRevEntry) to remove direct rev-cache eviction/memory calls (now orchestrator-driven). |
Comments suppressed due to low confidence (1)
db/delta_cache_lru.go:121
- Delta cache entries also don’t refresh accessOrder on access: addDelta’s duplicate-key path and getCachedDelta both MoveToFront but leave accessOrder unchanged. Since the orchestrator compares accessOrder across caches, this can select the wrong eviction candidate. Update the entry’s accessOrder whenever it’s moved to the front (and in the duplicate-key early return).
dc.lock.Lock()
defer dc.lock.Unlock()
if elem := dc.cache[key]; elem != nil {
dc.lruList.MoveToFront(elem)
return
}
value := &deltaCacheValue{delta: &toDelta, itemKey: key}
elem := dc.lruList.PushFront(value)
dc.cache[key] = elem
dc.cacheNumDeltas.Add(1)
value.accessOrder.Store(nextAccessOrder()) // stamp on insert
numItemsRemoved, bytesEvicted := dc._numberCapacityEviction()
if numItemsRemoved > 0 {
dc.cacheNumDeltas.Add(-numItemsRemoved)
}
if dc.memoryController != nil {
dc.memoryController.incrementBytesCount(value.delta.totalDeltaBytes)
if bytesEvicted > 0 {
dc.memoryController.decrementBytesCount(bytesEvicted)
}
}
}
// _numberCapacityEviction will iterate removing the last element in cache til we fall below the maximum number of items
// threshold for this shard, returning the number of items evicted
func (dc *LRUDeltaCache) _numberCapacityEviction() (numItemsEvicted int64, bytesEvicted int64) {
for dc.lruList.Len() > int(dc.capacity) {
value := dc.lruList.Back()
if value == nil {
// no items to remove
break
}
deltaValue := value.Value.(*deltaCacheValue)
dc.lruList.Remove(value)
// delete item from lookup map
delete(dc.cache, deltaValue.itemKey)
numItemsEvicted++
bytesEvicted += deltaValue.delta.totalDeltaBytes
}
return numItemsEvicted, bytesEvicted
}
// getCachedDelta will retrieve a delta if cached
func (dc *LRUDeltaCache) getCachedDelta(ctx context.Context, docID, fromVersionString, toVersionString string, collectionID uint32) *RevisionDelta {
if docID == "" || fromVersionString == "" || toVersionString == "" {
return nil
}
key := createDeltaCacheKey(docID, fromVersionString, toVersionString, collectionID)
dc.lock.Lock()
defer dc.lock.Unlock()
var deltaValue *RevisionDelta
if elem := dc.cache[key]; elem != nil {
dc.lruList.MoveToFront(elem)
cachedDelta := elem.Value.(*deltaCacheValue)
deltaValue = cachedDelta.delta
}
| if elem := rc.cache[key]; elem != nil { | ||
| rc.lruList.MoveToFront(elem) | ||
| value = elem.Value.(*revCacheValue) | ||
| } else if create { | ||
| value = &revCacheValue{id: docID, collectionID: collectionID, itemKey: key} | ||
| if currentVersion, parseErr := ParseVersion(docVersionString); parseErr != nil { | ||
| value.revID = docVersionString | ||
| } else { | ||
| value.cv = currentVersion | ||
| } | ||
| value.itemLoaded.Store(false) | ||
| value.accessOrder.Store(nextAccessOrder()) // stamp on insert | ||
| rc.cache[key] = rc.lruList.PushFront(value) | ||
| rc.cacheNumItems.Add(1) |
There was a problem hiding this comment.
accessOrder is stamped on insert (nextAccessOrder), but it is never refreshed when entries are accessed (the cache-hit path calls MoveToFront without updating accessOrder). Because the orchestrator compares accessOrder across the revision and delta caches to choose the globally oldest LRU tail, stale accessOrder can cause eviction of the wrong cache’s item under memory pressure. Consider updating accessOrder whenever an element is moved to the front (cache hit / Peek).
There was a problem hiding this comment.
This is by design, I don't want to be adding an extra atomic increment for each operation at the cache. This does mean when eviction fires it could evict the item more recently used technically but this is a worth while technicality for one less atomic increment on every cache hit.
bbrks
left a comment
There was a problem hiding this comment.
I am conscious we're trying to simplify the cache generally, but this seems more complicated conceptually to wrap my head around the safety of.
With the complexity of the memory tracking/eviction, LRU/monotonic counter in particular, I'd potentially like a walkthrough of the design to properly understand this.
| // revision and delta caches for cross-cache LRU eviction decisions. | ||
| // Using a counter rather than time.Now() keeps the hot-path cost to a single | ||
| // atomic increment (~1–2 ns vs ~25 ns for a clock read). | ||
| var globalCacheAccessCounter atomic.Uint64 |
There was a problem hiding this comment.
This is process-global - should be scoped to a database!
| // check for memory based eviction | ||
| c.triggerMemoryEviction() |
There was a problem hiding this comment.
This doesn't need to be called if getCachedDelta is not creating an entry right? Similarly if revisionCache.Get doesn't insert anything
I know there's an early return inside but it's still doing the work to acquire the lock to check.
| if revCandidateOrder == 0 && deltaCandidateOrder == 0 { | ||
| // Both caches are empty — nothing more to evict. | ||
| break | ||
| } |
There was a problem hiding this comment.
evictLRUTail() already tolerates the empty cache case. I don't think this is needed?
| rc.bypassStat.Add(1) | ||
|
|
||
| return docRev, nil | ||
| return docRev, checkForMemoryEviction, nil |
There was a problem hiding this comment.
This function always returns false for checkForMemoryEviction and I think it reads clearer if it isn't using the variable name without a set.
| revisionCache *LRURevisionCache // holds document revisions | ||
| deltaCache *LRUDeltaCache // holds computed deltas, only initialized when delta sync is enabled | ||
| memoryController *CacheMemoryController // used to control memory usage of revision cache and delta cache combined | ||
| evictionLock sync.Mutex // This is to synchronise the eviction process so we don;t have multiple goroutines fighting to evict |
There was a problem hiding this comment.
| evictionLock sync.Mutex // This is to synchronise the eviction process so we don;t have multiple goroutines fighting to evict | |
| evictionLock sync.Mutex // This is to synchronise the eviction process so we don't have multiple goroutines fighting to evict |
| // Only count bytes if they were already accounted in the stat (memStateSized). | ||
| // If the item was still in memStateLoading the in-flight CAS will fail and skip | ||
| // the increment, so there is nothing to decrement. | ||
| if value.memState.Swap(memStateRemoved) == memStateSized { | ||
| numBytesEvicted += value.getItemBytes() | ||
| } |
There was a problem hiding this comment.
Not specific to this code, but the whole eviction, size tracking, LRU/monotonic counter approach seems complicated, and I'd potentially like a walkthrough of the design to properly understand this.
CBG-5232
Memory based eviction for delta cache and revision cache.
Slight difference to original design, Original design was that a memory controller would control the eviction process, when implementing I felt it was cleaner for the orchestrator to do this given it.s job is to orchestrator between the caches. So memory controller simply sits on each shard and counts memory for that shard and lets the orchestrator know when its over capacity.
Pre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiDependencies (if applicable)
Integration Tests