Skip to content

fix(rscthrottler): avoid double-counting RSS and reserved in S3 write throttler (4.0-dev)#24931

Open
aunjgr wants to merge 1 commit into
matrixorigin:4.0-devfrom
aunjgr:cp/s3-throttler-fix-4.0-dev
Open

fix(rscthrottler): avoid double-counting RSS and reserved in S3 write throttler (4.0-dev)#24931
aunjgr wants to merge 1 commit into
matrixorigin:4.0-devfrom
aunjgr:cp/s3-throttler-fix-4.0-dev

Conversation

@aunjgr

@aunjgr aunjgr commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Cherry-pick of #24892 (f964de9) to 4.0-dev.

Fixes #24881
Fixes #24888

What this PR does

The S3 write throttler's acquireWithinRSSLimit had two gates:

  1. Pool check via Available() — double-counts S3 write buffers (in RSS as mmap'd mpool allocations + in reserved as throttler tokens). Under ForceRefresh() this returned 0 available, permanently rejecting all writes ([Bug]: Failed to load vector data during regression testing, but it works when isolation #24881).
  2. RSS gate rss + reserved + ask > total × limitRate — same double-counting. The RSS gate's original purpose (preventing OOM from non-S3 memory pressure, PR fix: throttle CN S3 writes under memory pressure #24268) is now handled by RSS scavenging (85%/92% thresholds with graduated cache eviction + FreeOSMemory).

Fix:

  • Pool check: use limit - reserved directly instead of Available(). The pool only needs to bound forward-looking reservations against the throttler limit.
  • RSS gate: removed. Physical memory protected by pinnedRate >= 0.80 ceiling + RSS scavenging.

Available() is unchanged — other throttler consumers rely on its semantics.

… throttler (matrixorigin#24892)

The S3 write throttler's `acquireWithinRSSLimit` had two gates:

1. **Pool check** via `Available()` — double-counts S3 write buffers (in RSS as mmap'd mpool allocations + in reserved as throttler tokens). Under `ForceRefresh()` this returned 0 available, permanently rejecting all writes (matrixorigin#24881).
2. **RSS gate** `rss + reserved + ask > total × limitRate` — same double-counting. Even with stale RSS the sum `rss + reserved` pushes over the cap, causing false rejects. And the RSS gate's original purpose (preventing OOM from non-S3 memory pressure, PR matrixorigin#24268) is now handled by RSS scavenging (85%/92% thresholds with graduated cache eviction + FreeOSMemory).

**Fix**:

- **Pool check**: use `limit - reserved` directly instead of `Available()`. The pool only needs to bound forward-looking reservations against the throttler limit.
- **RSS gate**: removed. Physical memory is protected by:
- `pinnedRate >= 0.80` hard ceiling at `AcquirePolicyForCNFlushS3`
- RSS scavenging (85%/92%) with cache eviction + FreeOSMemory

`Available()` is unchanged — other throttler consumers (workspace, merge) rely on its `rss + reserved` semantics.

Approved by: @XuPeng-SH
@aunjgr aunjgr requested a review from XuPeng-SH as a code owner June 11, 2026 04:15
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the double-counting bug this PR is fixing, but I think the current backport swings too far in the other direction: it removes RSS from the CNFlushS3 admission decision entirely.

That means if the process is already near the cgroup/host limit because of non-S3 memory, AcquirePolicyForCNFlushS3 can still admit more S3 reservations as long as reserved is below the pool limit. In other words, the policy now protects the forward-looking reservation pool, but no longer protects physical memory headroom from unrelated RSS pressure.

A concrete example from the new test shape is: actualTotalMemory=100, limit=90, reserved=0, rss=91. With the new code, pinnedRate() is still 0 because it only looks at reserved, so the request falls through to acquireWithinRSSLimit() and can still be granted. That is exactly the kind of non-S3 RSS pressure case the old gate was preventing. RSS scavenging is useful, but it is not an equivalent guarantee here: it is asynchronous / best-effort and cannot reclaim live heap/query memory.

So my current view is that this is not just a coverage question. The fix removes the double-counting bug, but it also removes the only hard backpressure for high RSS caused by non-S3 memory. I think the right fix needs to discount the overlapping S3 portion from RSS without dropping physical-memory enforcement altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants