[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003#42
Closed
Leguark wants to merge 4 commits intooctree_improvementfrom
Closed
[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003#42Leguark wants to merge 4 commits intooctree_improvementfrom
Leguark wants to merge 4 commits intooctree_improvementfrom
Conversation
Member
Author
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
- Updated tensor creation to use pinned memory and non-blocking transfer for improved GPU performance. - Introduced `_zeros`, `_ones`, and `_eye` wrapper functions for better consistency in tensor initialization on the specified device. - Refined the `_wrap_pytorch_functions` method to streamline tensor operations and ensure compatibility with the device settings. - Enabled stricter CUDA checks by updating conditions for GPU availability.
…ularity and enhance batch processing support - Introduced the `keops_enabled` parameter across various modules to enable conditional usage of PyKeOps for optimized computations. - Added `_interpolate_stack_batched.py` for GPU-accelerated batched interpolation with CUDA streams, minimizing memory overhead and improving throughput. - Updated tensor creation logic in `backend_tensor.py` to include `pykeops_eval_enabled` for enhanced flexibility in method selection. - Refactored multiple constructor methods to propagate `keops_enabled`, ensuring consistent conditional logic for tensor handling and backend compatibility. - Improved fault data initialization and dependency handling in interpolation pipelines for better parallel computation.
This was referenced Jan 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

[ENH] Improve tensor handling and device management in backend_tensor.py
_zeros,_ones, and_eyewrapper functions for better consistency in tensor initialization on the specified device._wrap_pytorch_functionsmethod to streamline tensor operations and ensure compatibility with the device settings.[ENH] Add
keops_enabledparameter to improve kernel constructor modularity and enhance batch processing supportkeops_enabledparameter across various modules to enable conditional usage of PyKeOps for optimized computations._interpolate_stack_batched.pyfor GPU-accelerated batched interpolation with CUDA streams, minimizing memory overhead and improving throughput.backend_tensor.pyto includepykeops_eval_enabledfor enhanced flexibility in method selection.keops_enabled, ensuring consistent conditional logic for tensor handling and backend compatibility.[WIP] Towards batching
[ENH] JIT-compiled kernel functions for improved GPU performance
torch.jit.scriptfor better GPU execution