[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

viiccwen · 2026-01-28T17:33:27Z

Purpose of PR

Fixes a bug in launch_l2_norm_batch (f64) where attempting to process more than limited samples would result in an invalid CUDA kernel launch. The fix adds early validation to return an error when num_samples exceeds the CUDA 1D grid dimension limit.

Related Issues or PRs

closes #967

Changes Made

Breaking Changes

Yes
No

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes
Successfully built and ran all unit tests or manual tests locally
PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
Code follows ASF guidelines

viiccwen · 2026-01-28T17:57:13Z

cc @rich7420, @ryankert01

ryankert01

using 65,535, you might be using a limit inherited from the Fermi architecture (pre-2012).

On any modern GPU (Compute Capability 3.0 or higher, which is virtually everything in use today), the 1D grid limit for the X-dimension is significantly higher. (at 2^31-1)

Which may or may not be touch before out-of-memory. (but it's good to add a check tho)

…ension limit

viiccwen · 2026-01-31T08:36:26Z

@ryankert01 thx for suggestion! What if we use cudaDeviceGetAttribute(...) to get the cudaDevAttrMaxGridDimX from user's CUDA? That prevent Hard-coded attribute CUDA_MAX_GRID_DIM_1D.

ryankert01

Thanks for the update! Some comments

qdp/qdp-kernels/src/amplitude.cu

qdp/qdp-kernels/tests/amplitude_encode.rs

rich7420

@viiccwen thanks for the patch!
I think we could add tests for num_samples==0/sample_len==0.

qdp/qdp-kernels/src/amplitude.cu

…used tests

…d sample length for float32 and float64

rich7420

LGTM

rich7420 · 2026-02-05T08:09:43Z

@ryankert01 do you want to take another look?

ryankert01 · 2026-02-06T08:19:41Z

@viiccwen lg, ty for the contribution

guan404ming added this to the Qumat 0.5.1 milestone Jan 29, 2026

ryankert01 requested changes Jan 29, 2026

View reviewed changes

fix: Fix invalid CUDA kernel launch when num_samples exceeds grid dim…

562f1a9

…ension limit

feat: Enhance grid dimension handling in L2 norm kernels

1f60d55

viiccwen force-pushed the fix-cuda-kernel-num_samples-exceeds-grid-dimension-limit branch from 76021d0 to 1f60d55 Compare January 31, 2026 08:42

ryankert01 requested changes Jan 31, 2026

View reviewed changes

qdp/qdp-kernels/src/amplitude.cu Outdated Show resolved Hide resolved

qdp/qdp-kernels/src/amplitude.cu Outdated Show resolved Hide resolved

fix: Update CUDA grid dimension limit to reflect signed 32-bit int max

ce313eb

ryankert01 reviewed Feb 1, 2026

View reviewed changes

qdp/qdp-kernels/tests/amplitude_encode.rs Outdated Show resolved Hide resolved

ryankert01 self-assigned this Feb 1, 2026

rich7420 reviewed Feb 2, 2026

View reviewed changes

qdp/qdp-kernels/src/amplitude.cu Outdated Show resolved Hide resolved

viiccwen added 2 commits February 2, 2026 11:26

refactor: Rename cached variable for max grid dimension and remove un…

5c97e0d

…used tests

test: Add tests for L2 norm batch kernel rejection of zero samples an…

41f2fbb

…d sample length for float32 and float64

rich7420 approved these changes Feb 5, 2026

View reviewed changes

ryankert01 merged commit 4438a9e into apache:main Feb 6, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

viiccwen commented Jan 28, 2026 •

edited

Loading

Uh oh!

viiccwen commented Jan 28, 2026

Uh oh!

ryankert01 left a comment •

edited

Loading

Uh oh!

viiccwen commented Jan 31, 2026

Uh oh!

ryankert01 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rich7420 left a comment

Uh oh!

Uh oh!

rich7420 left a comment

Uh oh!

rich7420 commented Feb 5, 2026

Uh oh!

ryankert01 commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

Conversation

viiccwen commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of PR

Related Issues or PRs

Changes Made

Breaking Changes

Checklist

Uh oh!

viiccwen commented Jan 28, 2026

Uh oh!

ryankert01 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viiccwen commented Jan 31, 2026

Uh oh!

ryankert01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rich7420 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rich7420 left a comment

Choose a reason for hiding this comment

Uh oh!

rich7420 commented Feb 5, 2026

Uh oh!

ryankert01 commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

viiccwen commented Jan 28, 2026 •

edited

Loading

ryankert01 left a comment •

edited

Loading