Skip to content

*: enhance mem tracker exclude rss_file & check limit for columnar#10867

Open
yongman wants to merge 10 commits into
pingcap:masterfrom
yongman:fix-mem-tracker-columnar
Open

*: enhance mem tracker exclude rss_file & check limit for columnar#10867
yongman wants to merge 10 commits into
pingcap:masterfrom
yongman:fix-mem-tracker-columnar

Conversation

@yongman

@yongman yongman commented May 25, 2026

Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: ref #10844

Problem Summary:

  1. Exclude rss_file result to query failed with memory exceeded exception.
  2. For columnar read path, the memory usage can not counted for columnar hub in Rust.

What is changed and how it works?

  1. Exclude rss_file part when check the memory limitation. rss_file part can be auto freed by OS.
  2. Actively check the memory limitation after read block from columnar hub in Rust to avoid OOM.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • Bug Fixes

    • Improved memory-limit enforcement by using effective RSS (RSS minus file-backed RSS) instead of raw RSS.
    • Enhanced background memory collection to track more process-level metrics.
    • Refreshed jemalloc sampling more reliably before collecting allocation statistics.
  • Chores

    • Added RSS limit checks during disaggregated columnar read operations to better prevent memory pressure issues.
    • Updated the description of the testing memory tracker accuracy setting to reference effective RSS.

yongman added 5 commits May 25, 2026 14:41
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit d54c26a4afa31b362c538394190f29299b94970c)
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit ec4bf8011fa46f8c9750594a01f7303f7bd96689)
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit aeb3bf1e0300d92c6fa12ef21505b052a5fd6b41)
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit d946399942e70bf0d8aa1b7567ad5e498108f2e2)
Signed-off-by: yongman <yming0221@gmail.com>
(cherry picked from commit 66c3e5a377acb05cbb9cd819e06b0fc94d8ada9a)
@ti-chi-bot

ti-chi-bot Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 25, 2026
@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c2c483e8-ecfa-42d0-9080-91a7e3ad222c

📥 Commits

Reviewing files that changed from the base of the PR and between dd94f94 and 577f54b.

📒 Files selected for processing (1)
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp

📝 Walkthrough

Walkthrough

The PR separates file-backed RSS from anonymous RSS, records rss_file via process metrics, computes effective RSS = max(real_rss - real_rss_file, 0), centralizes RSS-limit checks, integrates the check into allocation and storage read paths, and advances jemalloc epoch before metrics collection.

Changes

RSS-aware memory tracking and limit enforcement

Layer / File(s) Summary
RSS file-backed metrics data model
dbms/src/Common/MemoryTracker.h, dbms/src/Common/MemoryTracker.cpp
Global atomic real_rss_file is added alongside real_rss to separately track RSS from memory-mapped files.
Process metrics collection from background task
dbms/src/Common/BackgroundTask.cpp
Background memory collection loop now calls both get_process_mem_usage() for RSS and get_process_metrics() for file-backed RSS metrics, updating real_rss and real_rss_file atomics.
Allocation path: rollback, effective_rss, and RSS checks
dbms/src/Common/MemoryTracker.cpp
Allocation-time logic refactored to use a rollback lambda; fault-injection accuracy checks compute effective_rss = max(real_rss - real_rss_file, 0) and report rss_file/total rss; explicit RSS validation call is integrated before quota checks with exception handling.
Centralized RSS limit enforcement
dbms/src/Common/MemoryTracker.h, dbms/src/Common/MemoryTracker.cpp
New private checkRssLimitImpl(...) centralizes RSS validation and exception throwing; public MemoryTracker::checkRssLimit() and CurrentMemoryTracker::checkRssLimit() forward the check to the current tracker.
Storage integration and metrics collection updates
dbms/src/Storages/StorageDisaggregatedColumnar.cpp, dbms/src/Interpreters/AsynchronousMetrics.cpp, dbms/src/Interpreters/Settings.h
Storage layer calls CurrentMemoryTracker::checkRssLimit() after proxy block read; JEMalloc stats epoch is advanced before metric collection; memory_tracker_accuracy_diff_for_test setting description now references effective RSS (excluding file-backed pages).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • CalvinNeo
  • JinheLin
  • JaySon-Huang

Poem

🐰 I hop where pages come and go,
Counting bytes both high and low,
I split the RSS into two,
Track the file-backed and the true,
So memory limits stay in tow!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the two main changes: excluding rss_file from memory tracker checks and adding memory limit checks for the columnar read path.
Description check ✅ Passed The description includes the issue reference, clear problem summary, changes explanation, and required checklist with manual test marked. The commit message section is empty but non-critical.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: yongman <yming0221@gmail.com>
@yongman yongman marked this pull request as ready for review May 25, 2026 08:04
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 25, 2026
@pantheon-ai

pantheon-ai Bot commented May 25, 2026

Copy link
Copy Markdown

@yongman I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Common/MemoryTracker.cpp`:
- Line 190: The call to checkRssLimitImpl(...) (e.g., checkRssLimitImpl(/*
require_tracked_growth */ true, will_be, size)) can throw after tracked bytes
were already incremented, so you must roll back the tracked increase on
exception; wrap the checkRssLimitImpl call in a try/catch and on catch subtract
the previously added amount (e.g., call the corresponding decrease/undo method
or subtract from tracked_bytes by size) before rethrowing; apply the same
rollback pattern to the other allocation paths referenced (the block around
lines 235–290) so any early abort always undoes the tracked increment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fd99563b-e41a-4731-b09c-3a308a1afd56

📥 Commits

Reviewing files that changed from the base of the PR and between f0766c2 and a60789f.

📒 Files selected for processing (6)
  • dbms/src/Common/BackgroundTask.cpp
  • dbms/src/Common/MemoryTracker.cpp
  • dbms/src/Common/MemoryTracker.h
  • dbms/src/Interpreters/AsynchronousMetrics.cpp
  • dbms/src/Interpreters/Settings.h
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp

Comment thread dbms/src/Common/MemoryTracker.cpp Outdated
yongman added 3 commits May 25, 2026 16:40
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
@ti-chi-bot

ti-chi-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Lloyd-Pottiger
Once this PR has been reviewed and has the lgtm label, please assign likidu for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 17, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

[LGTM Timeline notifier]

Timeline:

  • 2026-06-17 14:50:54.978888334 +0000 UTC m=+1576356.049205724: ☑️ agreed by Lloyd-Pottiger.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Line 850: The Exception constructor at the throw statement with "lock error"
message and ErrorCodes::COLUMNAR_SNAPSHOT_ERROR is using incorrect argument
order. Reorder the arguments so that the error code
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR comes first as the first parameter, followed
by the message string "lock error" as the second parameter, to follow the
fmt-style Exception constructor pattern required by the coding guidelines.
- Line 1465: The Exception throw statement is using incorrect parameter order.
According to coding guidelines, the fmt-style Exception constructor requires the
error code as the first parameter followed by the message string. Locate the
Exception throw statement containing "read_block failed in columnar" and swap
the parameter order so that ErrorCodes::LOGICAL_ERROR appears first and the
message string appears second.
- Line 1635: The `likely` builtin macro at line 1635 has incorrect syntax. The
condition check currently reads `if likely (block && block.rows() > 0)` but
needs to properly wrap the condition within the `likely` macro's parentheses.
Fix the syntax by changing it to `if (likely(block && block.rows() > 0))` to
ensure the likely macro receives its argument with correct parentheses syntax.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c2c483e8-ecfa-42d0-9080-91a7e3ad222c

📥 Commits

Reviewing files that changed from the base of the PR and between dd94f94 and 577f54b.

📒 Files selected for processing (1)
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Line 850: The Exception constructor at the throw statement with "lock error"
message and ErrorCodes::COLUMNAR_SNAPSHOT_ERROR is using incorrect argument
order. Reorder the arguments so that the error code
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR comes first as the first parameter, followed
by the message string "lock error" as the second parameter, to follow the
fmt-style Exception constructor pattern required by the coding guidelines.
- Line 1465: The Exception throw statement is using incorrect parameter order.
According to coding guidelines, the fmt-style Exception constructor requires the
error code as the first parameter followed by the message string. Locate the
Exception throw statement containing "read_block failed in columnar" and swap
the parameter order so that ErrorCodes::LOGICAL_ERROR appears first and the
message string appears second.
- Line 1635: The `likely` builtin macro at line 1635 has incorrect syntax. The
condition check currently reads `if likely (block && block.rows() > 0)` but
needs to properly wrap the condition within the `likely` macro's parentheses.
Fix the syntax by changing it to `if (likely(block && block.rows() > 0))` to
ensure the likely macro receives its argument with correct parentheses syntax.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c2c483e8-ecfa-42d0-9080-91a7e3ad222c

📥 Commits

Reviewing files that changed from the base of the PR and between dd94f94 and 577f54b.

📒 Files selected for processing (1)
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
🛑 Comments failed to post (3)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (3)

850-850: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use fmt-style Exception constructor.

As per coding guidelines, Exception should use fmt-style constructor with error code first: throw Exception(ErrorCodes::SOME_CODE, "message");

🔧 Proposed fix
-        throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` at line 850, The
Exception constructor at the throw statement with "lock error" message and
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR is using incorrect argument order. Reorder
the arguments so that the error code ErrorCodes::COLUMNAR_SNAPSHOT_ERROR comes
first as the first parameter, followed by the message string "lock error" as the
second parameter, to follow the fmt-style Exception constructor pattern required
by the coding guidelines.

Source: Coding guidelines


1465-1465: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use fmt-style Exception constructor.

As per coding guidelines, Exception should use fmt-style constructor with error code first: throw Exception(ErrorCodes::SOME_CODE, "message");

🔧 Proposed fix
-            throw Exception("read_block failed in columnar", ErrorCodes::LOGICAL_ERROR);
+            throw Exception(ErrorCodes::LOGICAL_ERROR, "read_block failed in columnar");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

            throw Exception(ErrorCodes::LOGICAL_ERROR, "read_block failed in columnar");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` at line 1465, The
Exception throw statement is using incorrect parameter order. According to
coding guidelines, the fmt-style Exception constructor requires the error code
as the first parameter followed by the message string. Locate the Exception
throw statement containing "read_block failed in columnar" and swap the
parameter order so that ErrorCodes::LOGICAL_ERROR appears first and the message
string appears second.

Source: Coding guidelines


1635-1635: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix syntax error with likely macro.

The likely builtin requires proper parentheses syntax. The current code will fail to compile.

🐛 Proposed fix
-    if likely (block && block.rows() > 0)
+    if (likely(block && block.rows() > 0))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    if (likely(block && block.rows() > 0))
🧰 Tools
🪛 Cppcheck (2.21.0)

[error] 1635-1635: syntax error

(syntaxError)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` at line 1635, The
`likely` builtin macro at line 1635 has incorrect syntax. The condition check
currently reads `if likely (block && block.rows() > 0)` but needs to properly
wrap the condition within the `likely` macro's parentheses. Fix the syntax by
changing it to `if (likely(block && block.rows() > 0))` to ensure the likely
macro receives its argument with correct parentheses syntax.

Source: Linters/SAST tools

@yongman

yongman commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

/retest-required

@ti-chi-bot

ti-chi-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@yongman: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-sanitizer-tsan 577f54b link false /test pull-sanitizer-tsan

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@yongman

yongman commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

/retest-required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants