Skip to content

[ENH] Optimize kernel data stacking with profiling and GPU-first strategies#57

Draft
Leguark wants to merge 1 commit intowqef_improvementsfrom
memory
Draft

[ENH] Optimize kernel data stacking with profiling and GPU-first strategies#57
Leguark wants to merge 1 commit intowqef_improvementsfrom
memory

Conversation

@Leguark
Copy link
Copy Markdown
Member

@Leguark Leguark commented Mar 26, 2026

  • Introduced _build_stacked_kernel_data_parallel and refactored _build_stacked_kernel_data to enhance parallelization.
  • Added multiple selective stacking strategies (_stack_sub_struct_split, _stack_sub_struct_pinned, _stack_sub_struct_gpu_first) for improved flexibility.
  • Enhanced GPU memory handling with explicit profiling zones for optimized tensor operations.
  • Improved fail-safe concatenation logic to handle various backend configurations.

…tegies

- Introduced `_build_stacked_kernel_data_parallel` and refactored `_build_stacked_kernel_data` to enhance parallelization.
- Added multiple selective stacking strategies (`_stack_sub_struct_split`, `_stack_sub_struct_pinned`, `_stack_sub_struct_gpu_first`) for improved flexibility.
- Enhanced GPU memory handling with explicit profiling zones for optimized tensor operations.
- Improved fail-safe concatenation logic to handle various backend configurations.
Copy link
Copy Markdown
Member Author

Leguark commented Mar 26, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant