[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 by Leguark · Pull Request #42 · gempy-project/gempy_engine

Leguark · 2025-11-21T10:38:07Z

[ENH] Improve tensor handling and device management in backend_tensor.py

Updated tensor creation to use pinned memory and non-blocking transfer for improved GPU performance.
Introduced _zeros, _ones, and _eye wrapper functions for better consistency in tensor initialization on the specified device.
Refined the _wrap_pytorch_functions method to streamline tensor operations and ensure compatibility with the device settings.
Enabled stricter CUDA checks by updating conditions for GPU availability.

[ENH] Add `keops_enabled` parameter to improve kernel constructor modularity and enhance batch processing support

Introduced the keops_enabled parameter across various modules to enable conditional usage of PyKeOps for optimized computations.
Added _interpolate_stack_batched.py for GPU-accelerated batched interpolation with CUDA streams, minimizing memory overhead and improving throughput.
Updated tensor creation logic in backend_tensor.py to include pykeops_eval_enabled for enhanced flexibility in method selection.
Refactored multiple constructor methods to propagate keops_enabled, ensuring consistent conditional logic for tensor handling and backend compatibility.
Improved fault data initialization and dependency handling in interpolation pipelines for better parallel computation.

[WIP] Towards batching

[ENH] JIT-compiled kernel functions for improved GPU performance

Optimized kernel functions with torch.jit.script for better GPU execution
Refactored mathematical expressions for numerical stability and performance
Improved memory efficiency with fused operations

Leguark · 2025-11-21T10:38:24Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 #42 👈 (View in Graphite)
[REFACTOR] Generalize dual contouring triangulation for arbitrary grid shapes and remove legacy code #43 : 1 other dependent PR (#44 )
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

- Updated tensor creation to use pinned memory and non-blocking transfer for improved GPU performance. - Introduced `_zeros`, `_ones`, and `_eye` wrapper functions for better consistency in tensor initialization on the specified device. - Refined the `_wrap_pytorch_functions` method to streamline tensor operations and ensure compatibility with the device settings. - Enabled stricter CUDA checks by updating conditions for GPU availability.

…ularity and enhance batch processing support - Introduced the `keops_enabled` parameter across various modules to enable conditional usage of PyKeOps for optimized computations. - Added `_interpolate_stack_batched.py` for GPU-accelerated batched interpolation with CUDA streams, minimizing memory overhead and improving throughput. - Updated tensor creation logic in `backend_tensor.py` to include `pykeops_eval_enabled` for enhanced flexibility in method selection. - Refactored multiple constructor methods to propagate `keops_enabled`, ensuring consistent conditional logic for tensor handling and backend compatibility. - Improved fault data initialization and dependency handling in interpolation pipelines for better parallel computation.

Leguark changed the title ~~[ENH] Improve tensor handling and device management in backend_tensor.py~~ [ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing Nov 21, 2025

Leguark changed the title ~~[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing~~ [ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 Nov 21, 2025

Leguark added 4 commits January 23, 2026 08:49

[WIP] Towards batching

8a3b741

[ENH] A few extra optimizations

2d9ce3a

Leguark changed the base branch from main to graphite-base/42 January 23, 2026 07:49

Leguark force-pushed the optimizing branch from bbf0394 to 2d9ce3a Compare January 23, 2026 07:49

Leguark changed the base branch from graphite-base/42 to octree_improvement January 23, 2026 07:49

This was referenced Jan 23, 2026

[REFACTOR] Generalize dual contouring triangulation for arbitrary grid shapes and remove legacy code #43

Merged

[ENH] Optimize PyTorch backend and refactor kernel functions for GPU performance #44

Draft

Leguark closed this Jan 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003#42

[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003#42
Leguark wants to merge 4 commits intooctree_improvementfrom
optimizing

Leguark commented Nov 21, 2025 •

edited

Loading

Uh oh!

Leguark commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Leguark commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[ENH] Improve tensor handling and device management in backend_tensor.py

[ENH] Add keops_enabled parameter to improve kernel constructor modularity and enhance batch processing support

[WIP] Towards batching

[ENH] JIT-compiled kernel functions for improved GPU performance

Uh oh!

Leguark commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Leguark commented Nov 21, 2025 •

edited

Loading

[ENH] Add `keops_enabled` parameter to improve kernel constructor modularity and enhance batch processing support

Leguark commented Nov 21, 2025 •

edited

Loading