perf(mpool): #2/#7 prototype — measured that false-sharing isn't the cap by gburd · Pull Request #19 · berkeleydb/libdb

gburd · 2026-06-17T09:45:33Z

Prototypes and measures the first candidate fix for the read-scaling ceiling identified in docs/design/scaling-findings.md.

Hypothesis

struct __bh packs the write-hot fields (pin ref count, LRU priority — written on every __memp_fget/__memp_fput) into the same cache line as the read-mostly identity fields (pgno/mf_offset/flags/hq) that every concurrent hash-chain walk of a hot (btree root) buffer reads. So each pin would invalidate the line all readers need just to traverse/match.

Change

Isolate the write-hot fields on their own cache line, behind MPOOL_HOTFIELDS_ISOLATED (one-line A/B). Off by default.

Measured (controlled interleaved A/B, medians, 12-core)

workload	t	packed	isolated	delta
rrand	8	486,745	489,564	+0.6%
rrand	12	390,927	390,416	-0.1%
snap	8	518,422	514,260	-0.8%
snap	12	408,213	409,415	+0.3%

No effect (±0.6%). The cap is true sharing of the atomic counters (bhp->ref + the shared-latch share-counts), not false sharing — relocating the words can't help.

Why it's still useful

It rules out the cheap fix with data and refines #2/#7: the per-read shared-counter RMW must be removed (optimistic/versioned access — needs epoch reclamation BDB lacks — or a sharded pin count), not relocated. Kept guarded + off to re-A/B on the 24-core Linux box (currently unreachable) where the futex-dominated ceiling was characterized. Smoke-tested write + MVCC-freeze paths; full TCL regression required before any default-on change.

Builds clean default (off). See docs/design/scaling-findings.md → Prototype 1.

Prototype the #2/#7 hypothesis that the buffer-header write-hot fields (pin ref count + LRU priority) false-share a cache line with the read-mostly identity/traversal fields every hash-chain walk reads. Isolate them on their own line behind MPOOL_HOTFIELDS_ISOLATED (off by default). Controlled interleaved A/B (packed vs isolated, medians, 12-core box): no effect (+/-0.6%). The read-path cap is TRUE sharing of the atomic counters (bhp->ref and the shared-latch share-counts), not false sharing -- relocating the words cannot help. Left off by default (only adds per-buffer memory); kept guarded to re-A/B on the 24-core Linux box. Refines the #2/#7 direction: the per-read shared-counter RMW must be removed (optimistic/versioned access needing epoch reclamation, or a sharded pin count), not relocated. Documented in docs/design/scaling-findings.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(mpool): #2/#7 prototype — measured that false-sharing isn't the cap#19

perf(mpool): #2/#7 prototype — measured that false-sharing isn't the cap#19
gburd wants to merge 1 commit into
masterfrom
perf/mpool-pin

gburd commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gburd commented Jun 17, 2026

Hypothesis

Change

Measured (controlled interleaved A/B, medians, 12-core)

Why it's still useful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant