Skip to content

CBG-5232: cache memory managment#8151

Open
gregns1 wants to merge 7 commits intomainfrom
CBG-5232
Open

CBG-5232: cache memory managment#8151
gregns1 wants to merge 7 commits intomainfrom
CBG-5232

Conversation

@gregns1
Copy link
Copy Markdown
Contributor

@gregns1 gregns1 commented Apr 7, 2026

CBG-5232

Memory based eviction for delta cache and revision cache.
Slight difference to original design, Original design was that a memory controller would control the eviction process, when implementing I felt it was cleaner for the orchestrator to do this given it.s job is to orchestrator between the caches. So memory controller simply sits on each shard and counts memory for that shard and lets the orchestrator know when its over capacity.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Dependencies (if applicable)

  • Link upstream PRs
  • Update Go module dependencies when merged

Integration Tests

Copilot AI review requested due to automatic review settings April 7, 2026 10:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces memory-based eviction across the revision cache and (when enabled) the delta cache, shifting eviction orchestration into RevisionCacheOrchestrator via a shared per-shard CacheMemoryController. This aims to keep combined cache memory usage within RevisionCacheOptions.MaxBytes and adds/updates tests around eviction and memory accounting.

Changes:

  • Add a new CacheMemoryController to track combined rev+delta cache memory and drive memory-based eviction.
  • Refactor RevisionCacheOrchestrator to construct and coordinate sub-caches and perform cross-cache LRU eviction.
  • Update revision-cache APIs/tests to accommodate new return values and add new eviction/accounting test coverage.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
db/cache_memory_controller.go Adds shared per-shard memory controller and global access ordering for cross-cache eviction decisions.
db/revision_cache_orchestrator.go Refactors orchestrator construction and adds memory-eviction loop across rev+delta caches.
db/revision_cache_lru.go Reworks revision cache memory accounting with lifecycle states and integrates memory controller.
db/delta_cache_lru.go Integrates delta cache with shared memory controller and exposes cross-cache eviction helpers.
db/revision_cache_interface.go Updates the RevisionCache interface signatures and revises cache construction.
db/revision_cache_bypass.go Updates bypass implementation to match the new interface signature.
db/revision_cache_test.go Updates existing tests and adds extensive new tests for memory-based eviction/accounting.
db/delta_cache_test.go Updates tests for new Get signature and adds memory accounting/eviction tests.
db/util_testing.go Updates a test helper (PutRevEntry) to remove direct rev-cache eviction/memory calls (now orchestrator-driven).
Comments suppressed due to low confidence (1)

db/delta_cache_lru.go:121

  • Delta cache entries also don’t refresh accessOrder on access: addDelta’s duplicate-key path and getCachedDelta both MoveToFront but leave accessOrder unchanged. Since the orchestrator compares accessOrder across caches, this can select the wrong eviction candidate. Update the entry’s accessOrder whenever it’s moved to the front (and in the duplicate-key early return).
	dc.lock.Lock()
	defer dc.lock.Unlock()
	if elem := dc.cache[key]; elem != nil {
		dc.lruList.MoveToFront(elem)
		return
	}
	value := &deltaCacheValue{delta: &toDelta, itemKey: key}
	elem := dc.lruList.PushFront(value)
	dc.cache[key] = elem
	dc.cacheNumDeltas.Add(1)
	value.accessOrder.Store(nextAccessOrder()) // stamp on insert
	numItemsRemoved, bytesEvicted := dc._numberCapacityEviction()
	if numItemsRemoved > 0 {
		dc.cacheNumDeltas.Add(-numItemsRemoved)
	}
	if dc.memoryController != nil {
		dc.memoryController.incrementBytesCount(value.delta.totalDeltaBytes)
		if bytesEvicted > 0 {
			dc.memoryController.decrementBytesCount(bytesEvicted)
		}
	}
}

// _numberCapacityEviction will iterate removing the last element in cache til we fall below the maximum number of items
// threshold for this shard, returning the number of items evicted
func (dc *LRUDeltaCache) _numberCapacityEviction() (numItemsEvicted int64, bytesEvicted int64) {
	for dc.lruList.Len() > int(dc.capacity) {
		value := dc.lruList.Back()
		if value == nil {
			// no items to remove
			break
		}
		deltaValue := value.Value.(*deltaCacheValue)
		dc.lruList.Remove(value)
		// delete item from lookup map
		delete(dc.cache, deltaValue.itemKey)
		numItemsEvicted++
		bytesEvicted += deltaValue.delta.totalDeltaBytes
	}
	return numItemsEvicted, bytesEvicted
}

// getCachedDelta will retrieve a delta if cached
func (dc *LRUDeltaCache) getCachedDelta(ctx context.Context, docID, fromVersionString, toVersionString string, collectionID uint32) *RevisionDelta {
	if docID == "" || fromVersionString == "" || toVersionString == "" {
		return nil
	}
	key := createDeltaCacheKey(docID, fromVersionString, toVersionString, collectionID)
	dc.lock.Lock()
	defer dc.lock.Unlock()
	var deltaValue *RevisionDelta
	if elem := dc.cache[key]; elem != nil {
		dc.lruList.MoveToFront(elem)
		cachedDelta := elem.Value.(*deltaCacheValue)
		deltaValue = cachedDelta.delta
	}

Comment on lines 359 to 371
if elem := rc.cache[key]; elem != nil {
rc.lruList.MoveToFront(elem)
value = elem.Value.(*revCacheValue)
} else if create {
value = &revCacheValue{id: docID, collectionID: collectionID, itemKey: key}
if currentVersion, parseErr := ParseVersion(docVersionString); parseErr != nil {
value.revID = docVersionString
} else {
value.cv = currentVersion
}
value.itemLoaded.Store(false)
value.accessOrder.Store(nextAccessOrder()) // stamp on insert
rc.cache[key] = rc.lruList.PushFront(value)
rc.cacheNumItems.Add(1)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accessOrder is stamped on insert (nextAccessOrder), but it is never refreshed when entries are accessed (the cache-hit path calls MoveToFront without updating accessOrder). Because the orchestrator compares accessOrder across the revision and delta caches to choose the globally oldest LRU tail, stale accessOrder can cause eviction of the wrong cache’s item under memory pressure. Consider updating accessOrder whenever an element is moved to the front (cache hit / Peek).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is by design, I don't want to be adding an extra atomic increment for each operation at the cache. This does mean when eviction fires it could evict the item more recently used technically but this is a worth while technicality for one less atomic increment on every cache hit.

@gregns1 gregns1 requested a review from bbrks April 7, 2026 11:30
Copy link
Copy Markdown
Member

@bbrks bbrks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am conscious we're trying to simplify the cache generally, but this seems more complicated conceptually to wrap my head around the safety of.

With the complexity of the memory tracking/eviction, LRU/monotonic counter in particular, I'd potentially like a walkthrough of the design to properly understand this.

// revision and delta caches for cross-cache LRU eviction decisions.
// Using a counter rather than time.Now() keeps the hot-path cost to a single
// atomic increment (~1–2 ns vs ~25 ns for a clock read).
var globalCacheAccessCounter atomic.Uint64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is process-global - should be scoped to a database!

Comment on lines +93 to +94
// check for memory based eviction
c.triggerMemoryEviction()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be called if getCachedDelta is not creating an entry right? Similarly if revisionCache.Get doesn't insert anything

I know there's an early return inside but it's still doing the work to acquire the lock to check.

if revCandidateOrder == 0 && deltaCandidateOrder == 0 {
// Both caches are empty — nothing more to evict.
break
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evictLRUTail() already tolerates the empty cache case. I don't think this is needed?

rc.bypassStat.Add(1)

return docRev, nil
return docRev, checkForMemoryEviction, nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function always returns false for checkForMemoryEviction and I think it reads clearer if it isn't using the variable name without a set.

revisionCache *LRURevisionCache // holds document revisions
deltaCache *LRUDeltaCache // holds computed deltas, only initialized when delta sync is enabled
memoryController *CacheMemoryController // used to control memory usage of revision cache and delta cache combined
evictionLock sync.Mutex // This is to synchronise the eviction process so we don;t have multiple goroutines fighting to evict
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
evictionLock sync.Mutex // This is to synchronise the eviction process so we don;t have multiple goroutines fighting to evict
evictionLock sync.Mutex // This is to synchronise the eviction process so we don't have multiple goroutines fighting to evict

Comment on lines +441 to +446
// Only count bytes if they were already accounted in the stat (memStateSized).
// If the item was still in memStateLoading the in-flight CAS will fail and skip
// the increment, so there is nothing to decrement.
if value.memState.Swap(memStateRemoved) == memStateSized {
numBytesEvicted += value.getItemBytes()
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specific to this code, but the whole eviction, size tracking, LRU/monotonic counter approach seems complicated, and I'd potentially like a walkthrough of the design to properly understand this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants