Skip to content

Restore resource_cast memory limit check#1231

Open
bdice wants to merge 1 commit into
NVIDIA:mainfrom
bdice:restore-resource-cast-memory-limit
Open

Restore resource_cast memory limit check#1231
bdice wants to merge 1 commit into
NVIDIA:mainfrom
bdice:restore-resource-cast-memory-limit

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented May 17, 2026

Description

Restore the memory-limit-aware branch in get_device_memory_size() that was temporarily removed during the CCCL/RMM memory-resource migration in #1035.

The function now uses cuda::mr::resource_cast on rmm::mr::get_current_device_resource_ref() to detect an active rmm::mr::limiting_resource_adaptor. When present, it reports the limit/usage/free values and returns the smaller of CUDA total memory and the adaptor allocation limit, matching the previous behavior.

@bdice bdice requested a review from a team as a code owner May 17, 2026 16:23
@bdice bdice requested review from Kh4ster and rg20 May 17, 2026 16:23
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR enhances CUDA device memory reporting by adding support for RMM memory limiting adaptors. The header gains new dependencies for memory resource types, and get_device_memory_size() now detects when a limiting adaptor is active, emits debug diagnostics, and returns the effective allocation limit instead of the full device memory.

Changes

Device memory reporting with RMM limiting adaptors

Layer / File(s) Summary
RMM resource headers and dependencies
cpp/src/utilities/cuda_helpers.cuh
CUDA memory resource and RMM adaptor headers are added to enable limiting adaptor type detection and casting in the file.
Device memory size query with limiting adaptor detection
cpp/src/utilities/cuda_helpers.cuh
get_device_memory_size() now reads the current RMM device resource and attempts to cast it to a limiting adaptor; if present, debug output is printed with allocation limits and usage, and the function returns the minimum of CUDA-reported total memory and the adaptor's allocation limit; otherwise it uses the existing path.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Restore resource_cast memory limit check' directly and specifically describes the main change: restoring a memory-limit-aware check that was previously removed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description clearly explains the restoration of memory-limit-aware functionality in get_device_memory_size() and references the related migration PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cpp/src/utilities/cuda_helpers.cuh (1)

254-259: ⚡ Quick win

Gate debug prints behind a debug flag.

Unconditional printf in get_device_memory_size() can spam stdout and add sync overhead when this helper is called frequently. Consider guarding these lines with a compile-time debug flag.

Suggested patch
-    printf("limiting_adaptor->get_allocation_limit(): %fMiB\n",
-           limiting_adaptor->get_allocation_limit() / (double)1e6);
-    printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / (double)1e6);
-    printf("free_mem: %fMiB\n",
-           (limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) /
-             (double)1e6);
+#ifdef CUOPT_ENABLE_RESOURCE_LIMIT_DEBUG
+    printf("limiting_adaptor->get_allocation_limit(): %fMiB\n",
+           limiting_adaptor->get_allocation_limit() / (double)1e6);
+    printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / (double)1e6);
+    printf("free_mem: %fMiB\n",
+           (limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) /
+             (double)1e6);
+#endif
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/utilities/cuda_helpers.cuh` around lines 254 - 259, The three
unconditional printf calls inside get_device_memory_size() (printing
limiting_adaptor->get_allocation_limit(), get_allocated_bytes(), and free
memory) should be wrapped in a compile-time debug macro so they don't run in
production; add a macro guard (e.g., CUDA_HELPERS_DEBUG or similar) around those
prints in cpp/src/utilities/cuda_helpers.cuh and only emit the diagnostics when
the macro is defined, keeping the rest of get_device_memory_size() unchanged.
Ensure the macro name is documented in the header comment so callers can enable
it at build time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/utilities/cuda_helpers.cuh`:
- Around line 254-259: The printed memory diagnostics use the label "MiB" but
divide by 1e6 (decimal MB); update the three printf calls that reference
limiting_adaptor->get_allocation_limit() and
limiting_adaptor->get_allocated_bytes() to divide by 1024.0 * 1024.0 (or use a
named constant like BYTES_PER_MIB) so the units correctly reflect mebibytes, and
keep the "MiB" labels unchanged.

---

Nitpick comments:
In `@cpp/src/utilities/cuda_helpers.cuh`:
- Around line 254-259: The three unconditional printf calls inside
get_device_memory_size() (printing limiting_adaptor->get_allocation_limit(),
get_allocated_bytes(), and free memory) should be wrapped in a compile-time
debug macro so they don't run in production; add a macro guard (e.g.,
CUDA_HELPERS_DEBUG or similar) around those prints in
cpp/src/utilities/cuda_helpers.cuh and only emit the diagnostics when the macro
is defined, keeping the rest of get_device_memory_size() unchanged. Ensure the
macro name is documented in the header comment so callers can enable it at build
time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 430867ae-8a10-47b2-948f-2993d59e032c

📥 Commits

Reviewing files that changed from the base of the PR and between 3e60a3a and 7d66043.

📒 Files selected for processing (1)
  • cpp/src/utilities/cuda_helpers.cuh

Comment on lines +254 to +259
printf("limiting_adaptor->get_allocation_limit(): %fMiB\n",
limiting_adaptor->get_allocation_limit() / (double)1e6);
printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / (double)1e6);
printf("free_mem: %fMiB\n",
(limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) /
(double)1e6);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix unit mismatch in memory diagnostics.

Line 254, Line 256, and Line 257 label values as MiB but divide by 1e6 (MB). Use 1024.0 * 1024.0 or relabel to MB.

Suggested patch
-    printf("limiting_adaptor->get_allocation_limit(): %fMiB\n",
-           limiting_adaptor->get_allocation_limit() / (double)1e6);
-    printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / (double)1e6);
+    constexpr double bytes_per_mib = 1024.0 * 1024.0;
+    printf("limiting_adaptor->get_allocation_limit(): %fMiB\n",
+           limiting_adaptor->get_allocation_limit() / bytes_per_mib);
+    printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / bytes_per_mib);
     printf("free_mem: %fMiB\n",
-           (limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) /
-             (double)1e6);
+           (limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) /
+             bytes_per_mib);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
printf("limiting_adaptor->get_allocation_limit(): %fMiB\n",
limiting_adaptor->get_allocation_limit() / (double)1e6);
printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / (double)1e6);
printf("free_mem: %fMiB\n",
(limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) /
(double)1e6);
constexpr double bytes_per_mib = 1024.0 * 1024.0;
printf("limiting_adaptor->get_allocation_limit(): %fMiB\n",
limiting_adaptor->get_allocation_limit() / bytes_per_mib);
printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / bytes_per_mib);
printf("free_mem: %fMiB\n",
(limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) /
bytes_per_mib);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/utilities/cuda_helpers.cuh` around lines 254 - 259, The printed
memory diagnostics use the label "MiB" but divide by 1e6 (decimal MB); update
the three printf calls that reference limiting_adaptor->get_allocation_limit()
and limiting_adaptor->get_allocated_bytes() to divide by 1024.0 * 1024.0 (or use
a named constant like BYTES_PER_MIB) so the units correctly reflect mebibytes,
and keep the "MiB" labels unchanged.

@bdice bdice self-assigned this May 17, 2026
@bdice bdice added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels May 17, 2026
@anandhkb anandhkb added this to the 26.06 milestone May 18, 2026
@anandhkb
Copy link
Copy Markdown
Contributor

Requesting @rg20 's review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants