CBG-5232: cache memory managment by gregns1 · Pull Request #8151 · couchbase/sync_gateway

gregns1 · 2026-04-07T10:00:29Z

Memory based eviction for delta cache and revision cache.
Slight difference to original design, Original design was that a memory controller would control the eviction process, when implementing I felt it was cleaner for the orchestrator to do this given it.s job is to orchestrator between the caches. So memory controller simply sits on each shard and counts memory for that shard and lets the orchestrator know when its over capacity.

Pre-review checklist

Removed debug logging (fmt.Print, log.Print, ...)
Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Dependencies (if applicable)

Link upstream PRs
Update Go module dependencies when merged

Integration Tests

https://jenkins.sgwdev.com/job/SyncGatewayIntegration/433/
https://jenkins.sgwdev.com/job/SyncGatewayIntegration/434/ (disable rev cache)

Copilot

Pull request overview

This PR introduces memory-based eviction across the revision cache and (when enabled) the delta cache, shifting eviction orchestration into RevisionCacheOrchestrator via a shared per-shard CacheMemoryController. This aims to keep combined cache memory usage within RevisionCacheOptions.MaxBytes and adds/updates tests around eviction and memory accounting.

Changes:

Add a new CacheMemoryController to track combined rev+delta cache memory and drive memory-based eviction.
Refactor RevisionCacheOrchestrator to construct and coordinate sub-caches and perform cross-cache LRU eviction.
Update revision-cache APIs/tests to accommodate new return values and add new eviction/accounting test coverage.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
db/cache_memory_controller.go	Adds shared per-shard memory controller and global access ordering for cross-cache eviction decisions.
db/revision_cache_orchestrator.go	Refactors orchestrator construction and adds memory-eviction loop across rev+delta caches.
db/revision_cache_lru.go	Reworks revision cache memory accounting with lifecycle states and integrates memory controller.
db/delta_cache_lru.go	Integrates delta cache with shared memory controller and exposes cross-cache eviction helpers.
db/revision_cache_interface.go	Updates the `RevisionCache` interface signatures and revises cache construction.
db/revision_cache_bypass.go	Updates bypass implementation to match the new interface signature.
db/revision_cache_test.go	Updates existing tests and adds extensive new tests for memory-based eviction/accounting.
db/delta_cache_test.go	Updates tests for new `Get` signature and adds memory accounting/eviction tests.
db/util_testing.go	Updates a test helper (`PutRevEntry`) to remove direct rev-cache eviction/memory calls (now orchestrator-driven).

Comments suppressed due to low confidence (1)

db/delta_cache_lru.go:121

Delta cache entries also don’t refresh accessOrder on access: addDelta’s duplicate-key path and getCachedDelta both MoveToFront but leave accessOrder unchanged. Since the orchestrator compares accessOrder across caches, this can select the wrong eviction candidate. Update the entry’s accessOrder whenever it’s moved to the front (and in the duplicate-key early return).

	dc.lock.Lock()
	defer dc.lock.Unlock()
	if elem := dc.cache[key]; elem != nil {
		dc.lruList.MoveToFront(elem)
		return
	}
	value := &deltaCacheValue{delta: &toDelta, itemKey: key}
	elem := dc.lruList.PushFront(value)
	dc.cache[key] = elem
	dc.cacheNumDeltas.Add(1)
	value.accessOrder.Store(nextAccessOrder()) // stamp on insert
	numItemsRemoved, bytesEvicted := dc._numberCapacityEviction()
	if numItemsRemoved > 0 {
		dc.cacheNumDeltas.Add(-numItemsRemoved)
	}
	if dc.memoryController != nil {
		dc.memoryController.incrementBytesCount(value.delta.totalDeltaBytes)
		if bytesEvicted > 0 {
			dc.memoryController.decrementBytesCount(bytesEvicted)
		}
	}
}

// _numberCapacityEviction will iterate removing the last element in cache til we fall below the maximum number of items
// threshold for this shard, returning the number of items evicted
func (dc *LRUDeltaCache) _numberCapacityEviction() (numItemsEvicted int64, bytesEvicted int64) {
	for dc.lruList.Len() > int(dc.capacity) {
		value := dc.lruList.Back()
		if value == nil {
			// no items to remove
			break
		}
		deltaValue := value.Value.(*deltaCacheValue)
		dc.lruList.Remove(value)
		// delete item from lookup map
		delete(dc.cache, deltaValue.itemKey)
		numItemsEvicted++
		bytesEvicted += deltaValue.delta.totalDeltaBytes
	}
	return numItemsEvicted, bytesEvicted
}

// getCachedDelta will retrieve a delta if cached
func (dc *LRUDeltaCache) getCachedDelta(ctx context.Context, docID, fromVersionString, toVersionString string, collectionID uint32) *RevisionDelta {
	if docID == "" || fromVersionString == "" || toVersionString == "" {
		return nil
	}
	key := createDeltaCacheKey(docID, fromVersionString, toVersionString, collectionID)
	dc.lock.Lock()
	defer dc.lock.Unlock()
	var deltaValue *RevisionDelta
	if elem := dc.cache[key]; elem != nil {
		dc.lruList.MoveToFront(elem)
		cachedDelta := elem.Value.(*deltaCacheValue)
		deltaValue = cachedDelta.delta
	}

db/util_testing.go

db/revision_cache_interface.go

db/revision_cache_bypass.go

db/delta_cache_lru.go

Copilot · 2026-04-07T10:09:52Z

db/revision_cache_lru.go

 	if elem := rc.cache[key]; elem != nil {
 		rc.lruList.MoveToFront(elem)
 		value = elem.Value.(*revCacheValue)
 	} else if create {
 		value = &revCacheValue{id: docID, collectionID: collectionID, itemKey: key}
 		if currentVersion, parseErr := ParseVersion(docVersionString); parseErr != nil {
 			value.revID = docVersionString
 		} else {
 			value.cv = currentVersion
 		}
-		value.itemLoaded.Store(false)
+		value.accessOrder.Store(nextAccessOrder()) // stamp on insert
 		rc.cache[key] = rc.lruList.PushFront(value)
 		rc.cacheNumItems.Add(1)


accessOrder is stamped on insert (nextAccessOrder), but it is never refreshed when entries are accessed (the cache-hit path calls MoveToFront without updating accessOrder). Because the orchestrator compares accessOrder across the revision and delta caches to choose the globally oldest LRU tail, stale accessOrder can cause eviction of the wrong cache’s item under memory pressure. Consider updating accessOrder whenever an element is moved to the front (cache hit / Peek).

This is by design, I don't want to be adding an extra atomic increment for each operation at the cache. This does mean when eviction fires it could evict the item more recently used technically but this is a worth while technicality for one less atomic increment on every cache hit.

db/revision_cache_orchestrator.go

db/cache_memory_controller.go

bbrks

I am conscious we're trying to simplify the cache generally, but this seems more complicated conceptually to wrap my head around the safety of.

With the complexity of the memory tracking/eviction, LRU/monotonic counter in particular, I'd potentially like a walkthrough of the design to properly understand this.

bbrks · 2026-04-08T14:11:09Z

db/cache_memory_controller.go

+// revision and delta caches for cross-cache LRU eviction decisions.
+// Using a counter rather than time.Now() keeps the hot-path cost to a single
+// atomic increment (~1–2 ns vs ~25 ns for a clock read).
+var globalCacheAccessCounter atomic.Uint64


This is process-global - should be scoped to a database!

bbrks · 2026-04-08T14:13:53Z

db/revision_cache_orchestrator.go

+	// check for memory based eviction
+	c.triggerMemoryEviction()


This doesn't need to be called if getCachedDelta is not creating an entry right? Similarly if revisionCache.Get doesn't insert anything

I know there's an early return inside but it's still doing the work to acquire the lock to check.

bbrks · 2026-04-08T14:17:33Z

db/revision_cache_orchestrator.go

+		if revCandidateOrder == 0 && deltaCandidateOrder == 0 {
+			// Both caches are empty — nothing more to evict.
+			break
+		}


evictLRUTail() already tolerates the empty cache case. I don't think this is needed?

bbrks · 2026-04-08T14:19:10Z

db/revision_cache_bypass.go

 	rc.bypassStat.Add(1)

-	return docRev, nil
+	return docRev, checkForMemoryEviction, nil


This function always returns false for checkForMemoryEviction and I think it reads clearer if it isn't using the variable name without a set.

bbrks · 2026-04-08T14:19:32Z

db/revision_cache_orchestrator.go

+	revisionCache    *LRURevisionCache      // holds document revisions
+	deltaCache       *LRUDeltaCache         // holds computed deltas, only initialized when delta sync is enabled
+	memoryController *CacheMemoryController // used to control memory usage of revision cache and delta cache combined
+	evictionLock     sync.Mutex             // This is to synchronise the eviction process so we don;t have multiple goroutines fighting to evict


Suggested change

evictionLock sync.Mutex // This is to synchronise the eviction process so we don;t have multiple goroutines fighting to evict

evictionLock sync.Mutex // This is to synchronise the eviction process so we don't have multiple goroutines fighting to evict

bbrks · 2026-04-08T14:40:22Z

db/revision_cache_lru.go

+		// Only count bytes if they were already accounted in the stat (memStateSized).
+		// If the item was still in memStateLoading the in-flight CAS will fail and skip
+		// the increment, so there is nothing to decrement.
+		if value.memState.Swap(memStateRemoved) == memStateSized {
+			numBytesEvicted += value.getItemBytes()
+		}


Not specific to this code, but the whole eviction, size tracking, LRU/monotonic counter approach seems complicated, and I'd potentially like a walkthrough of the design to properly understand this.

gregns1 added 6 commits April 7, 2026 10:02

CBG-5232: cache memory managment

a67e322

reafctor work

cfd6bb7

remove itemLoaded and fix race between load and remove for item bytes

fe4a893

add some more unit tests

48bb893

more testing

667ea6a

fix incorrect rebase

5983e16

Copilot AI review requested due to automatic review settings April 7, 2026 10:00

Copilot started reviewing on behalf of gregns1 April 7, 2026 10:01 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

copilot review changes

7bfb9af

gregns1 requested a review from bbrks April 7, 2026 11:30

gregns1 assigned bbrks Apr 7, 2026

bbrks requested changes Apr 8, 2026

View reviewed changes

	evictionLock sync.Mutex // This is to synchronise the eviction process so we don;t have multiple goroutines fighting to evict
	evictionLock sync.Mutex // This is to synchronise the eviction process so we don't have multiple goroutines fighting to evict

Conversation

gregns1 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist

Dependencies (if applicable)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bbrks left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gregns1 commented Apr 7, 2026 •

edited

Loading