Skip to content

[Release/3.3] Fix out-of-bounds memory access in SetKernel for 0-size tensor#78536

Merged
sneaxiy merged 6 commits intoPaddlePaddle:release/3.3from
hushenwei2000:release33_fix/set-kernel-zero-size-oob
Apr 3, 2026
Merged

[Release/3.3] Fix out-of-bounds memory access in SetKernel for 0-size tensor#78536
sneaxiy merged 6 commits intoPaddlePaddle:release/3.3from
hushenwei2000:release33_fix/set-kernel-zero-size-oob

Conversation

@hushenwei2000
Copy link
Copy Markdown
Contributor

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

devPR:#78486

修复 0-Size 报错问题

paddle.Tensor.set_ accuracy CPU 精度不对 paddle.Tensor.set_(Tensor([20],"complex64"), Tensor([0, 3],"complex64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"float32"), Tensor([0, 3],"float32"), list[20,], list[2,], 0, )

Summary

  • Fix GPU out-of-bounds memory access (detected by compute-sanitizer) when calling Tensor.set_(source, shape, stride, offset) with a 0-size source tensor and non-zero target shape.
  • When source.numel() == 0, the output tensor now inherits the source's 0-size dims/strides instead of using user-specified shape/stride, preventing invalid memory access.
  • Updated existing TestSet_API_ZeroSize and added 5 new test cases covering various 0-size scenarios.

Root Cause

In SetKernel (paddle/phi/kernels/set_kernel.cc), the conditional logic for handling 0-size tensors had a missing branch: when source.numel() == 0 and x.numel() != 0, no branch was executed. The output tensor retained its original data holder, but SetInferMeta had already set its meta to the user-specified shape/stride (e.g., shape=[20], stride=[2]). When ContiguousKernel later attempted to read 20 elements via stride=2 (requiring storage for indices 0–38), the underlying storage was empty (nullptr), causing CUDA illegal memory access.

Before fix (missing branch):

if source.numel() != 0:    # False (source is 0-size)
    ...
elif x.numel() == 0:        # False (x has 20 elements)
    ...

Neither branch executes → out keeps stale holder with mismatched meta

Fix

When source.numel() == 0, force the output tensor's dims/strides to match the source's 0-size shape, and rebind the holder to the source's (empty) storage. This ensures numel == 0, so downstream kernels (e.g., ContiguousKernel) skip safely via their numel <= 0 early-return guards.

Test Plan

  • Verified fix with compute-sanitizer: ERROR SUMMARY: 0 errors (was 11 errors before fix)
    • paddle.Tensor.set_(Tensor([20],"float32"), Tensor([0, 3],"float32"), list[20,], list[2,], 0, )
  • Added 5 new test cases in TestSet_API_ZeroSize:
    • test_zero_size_source_with_nonzero_shape — 0-size source + explicit non-zero shape
    • test_zero_size_source_default_args — 0-size source with default shape/stride
    • test_zero_size_x_nonzero_source — 0-size x with non-zero source
    • test_both_zero_size — both x and source are 0-size
    • test_zero_size_source_no_crash_on_contiguous — no crash on .contiguous() after set_
  • APITest 中现有 set_ 全部测试
paddle.Tensor.set_(Tensor([20],"float64"), Tensor([0, 3],"float64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"float64"), Tensor([15, 0],"float64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([3, 0],"float16"), Tensor([6, 0],"float16"), list[3,8,], list[2,2,], 0, )
paddle.Tensor.set_(Tensor([3, 8],"float16"), Tensor([0, 3],"float16"), list[3,8,], list[2,2,], 0, )
paddle.Tensor.set_(Tensor([3, 8],"float16"), Tensor([6, 0],"float16"), list[3,8,], list[2,2,], 0, )
paddle.Tensor.set_(Tensor([20],"complex64"), Tensor([0, 3],"complex64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"float32"), Tensor([0, 3],"float32"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"complex64"), Tensor([15, 0],"complex64"), list[20,], list[2,], 0, )
paddle.Tensor.set_(Tensor([20],"float32"), Tensor([15, 0],"float32"), list[20,], list[2,], 0, )

是否引起精度变化

hushenwei2000 and others added 6 commits March 31, 2026 18:51
When calling Tensor.set_(source, shape, stride, offset) with a 0-size source
tensor and non-zero target shape, the original code had a missing branch in
the conditional logic: when source.numel()==0 and x.numel()!=0, no branch
was executed, leaving `out` with its original data holder but with the
user-specified meta (shape/stride). This caused ContiguousKernel to read
beyond allocated memory when converting the strided tensor to contiguous.

The fix forces the output tensor to inherit the source's 0-size dims/strides
when source has no elements, preventing out-of-bounds access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Mar 31, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.59459% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/3.3@88e1968). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddle/phi/kernels/set_kernel.cc 94.59% 2 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/3.3   #78536   +/-   ##
==============================================
  Coverage               ?   94.59%           
==============================================
  Files                  ?        1           
  Lines                  ?       37           
  Branches               ?        0           
==============================================
  Hits                   ?       35           
  Misses                 ?        2           
  Partials               ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

1 similar comment
@hushenwei2000
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Copy link
Copy Markdown
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sneaxiy sneaxiy merged commit 67d72e1 into PaddlePaddle:release/3.3 Apr 3, 2026
142 of 153 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants