perf(mpool): #2/#7 prototype — measured that false-sharing isn't the cap#19
Open
gburd wants to merge 1 commit into
Open
perf(mpool): #2/#7 prototype — measured that false-sharing isn't the cap#19gburd wants to merge 1 commit into
gburd wants to merge 1 commit into
Conversation
Prototype the #2/#7 hypothesis that the buffer-header write-hot fields (pin ref count + LRU priority) false-share a cache line with the read-mostly identity/traversal fields every hash-chain walk reads. Isolate them on their own line behind MPOOL_HOTFIELDS_ISOLATED (off by default). Controlled interleaved A/B (packed vs isolated, medians, 12-core box): no effect (+/-0.6%). The read-path cap is TRUE sharing of the atomic counters (bhp->ref and the shared-latch share-counts), not false sharing -- relocating the words cannot help. Left off by default (only adds per-buffer memory); kept guarded to re-A/B on the 24-core Linux box. Refines the #2/#7 direction: the per-read shared-counter RMW must be removed (optimistic/versioned access needing epoch reclamation, or a sharded pin count), not relocated. Documented in docs/design/scaling-findings.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prototypes and measures the first candidate fix for the read-scaling ceiling identified in
docs/design/scaling-findings.md.Hypothesis
struct __bhpacks the write-hot fields (pinrefcount, LRUpriority— written on every__memp_fget/__memp_fput) into the same cache line as the read-mostly identity fields (pgno/mf_offset/flags/hq) that every concurrent hash-chain walk of a hot (btree root) buffer reads. So each pin would invalidate the line all readers need just to traverse/match.Change
Isolate the write-hot fields on their own cache line, behind
MPOOL_HOTFIELDS_ISOLATED(one-line A/B). Off by default.Measured (controlled interleaved A/B, medians, 12-core)
No effect (±0.6%). The cap is true sharing of the atomic counters (
bhp->ref+ the shared-latch share-counts), not false sharing — relocating the words can't help.Why it's still useful
It rules out the cheap fix with data and refines #2/#7: the per-read shared-counter RMW must be removed (optimistic/versioned access — needs epoch reclamation BDB lacks — or a sharded pin count), not relocated. Kept guarded + off to re-A/B on the 24-core Linux box (currently unreachable) where the futex-dominated ceiling was characterized. Smoke-tested write + MVCC-freeze paths; full TCL regression required before any default-on change.
Builds clean default (off). See
docs/design/scaling-findings.md→ Prototype 1.