Use caller CUDA stream for D2H and H2D copies (#20498) by Conarnar · Pull Request #20498 · pytorch/executorch

Conarnar · 2026-06-24T22:51:11Z

Summary:

CudaAllocator memory copies now support async copy on a caller-provided CUDA stream. When a caller stream is available (via getCallerStream()), copy_host_to_device and copy_device_to_host use cudaMemcpyAsync and synchronize the stream before returning — preserving the blocking API contract while allowing work to be issued on the caller's stream. When no caller stream is set, the synchronous cudaMemcpy path is used as before.

Additionally:

Added null pointer and zero-byte validation — null dst/src return Error::InvalidArgument instead of aborting in cudaMemcpy, and zero-byte copies return Error::Ok early.
Assert single-GPU case (index 0 or -1) until multi-GPU stream validation is added.
Wired //executorch/extension/cuda:caller_stream dependency in TARGETS.
Added test_cuda_allocator with coverage for sync/async paths and error handling.

Differential Revision: D109590531

pytorch-bot · 2026-06-24T22:51:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20498

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 1 Unrelated Failure

As of commit 3d8da75 with merge base 45a14b9 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner (gh)
>>> Lint for backends/cuda/CMakeLists.txt:
pull / unittest-buck / linux / linux-job (gh)
RuntimeError: Command docker exec -t 53fd3bdbd1cb742e07ca658dafb16785263edd8aaeaccd6563651265393f6324 /exec failed with exit code 3
pull / unittest-buck / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 3

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-06-24T22:51:21Z

@Conarnar has exported this pull request. If you are a Meta employee, you can view the originating Diff in D109590531.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

github-actions · 2026-06-24T22:52:05Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: CudaAllocator memory copies now support async copy on a caller-provided CUDA stream. When a caller stream is available (via `getCallerStream()`), `copy_host_to_device` and `copy_device_to_host` use `cudaMemcpyAsync` and synchronize the stream before returning — preserving the blocking API contract while allowing work to be issued on the caller's stream. When no caller stream is set, the synchronous `cudaMemcpy` path is used as before. Additionally: - Added null pointer and zero-byte validation — null `dst`/`src` return `Error::InvalidArgument` instead of aborting in `cudaMemcpy`, and zero-byte copies return `Error::Ok` early. - Assert single-GPU case (index 0 or -1) until multi-GPU stream validation is added. - Wired `//executorch/extension/cuda:caller_stream` dependency in TARGETS. - Added `test_cuda_allocator` with coverage for sync/async paths and error handling. Differential Revision: D109590531

Copilot AI review requested due to automatic review settings June 24, 2026 22:51

Conarnar requested review from kirklandsign and larryliu0820 as code owners June 24, 2026 22:51

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2026

meta-codesync Bot added the meta-exported label Jun 24, 2026

meta-codesync Bot temporarily deployed to cadence June 24, 2026 22:51 Inactive

Copilot started reviewing on behalf of Conarnar June 24, 2026 22:51 View session

Copilot AI reviewed Jun 24, 2026

meta-codesync Bot changed the title ~~Use caller CUDA stream for D2H and H2D copies~~ Use caller CUDA stream for D2H and H2D copies (#20498) Jun 24, 2026

Conarnar force-pushed the export-D109590531 branch from 3ac4dc3 to 3d8da75 Compare June 24, 2026 23:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use caller CUDA stream for D2H and H2D copies (#20498)#20498

Use caller CUDA stream for D2H and H2D copies (#20498)#20498
Conarnar wants to merge 1 commit into
pytorch:mainfrom
Conarnar:export-D109590531

Conarnar commented Jun 24, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Conarnar commented Jun 24, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20498

❗ 2 Active SEVs

❌ 3 New Failures, 1 Unrelated Failure

Uh oh!

meta-codesync Bot commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 24, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conarnar commented Jun 24, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Jun 24, 2026 •

edited

Loading

This PR needs a `release notes:` label