-
Notifications
You must be signed in to change notification settings - Fork 38
Pr501 pcg #543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
aaadelmann
wants to merge
4
commits into
master
Choose a base branch
from
pr501-pcg
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Pr501 pcg #543
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,365 @@ | ||
| # PR 501 Split Map | ||
|
|
||
| Source PR: https://github.com/IPPL-framework/ippl/pull/501 | ||
|
|
||
| Local mapping branch: `pr-501-map` | ||
|
|
||
| Compared against: `origin/master` at `da43fd66a18848b65dcd82af90001f5b64a798ab` | ||
|
|
||
| Overall size: | ||
|
|
||
| - 41 commits | ||
| - 149 files changed | ||
| - 23,231 insertions | ||
| - 2,205 deletions | ||
|
|
||
| ## Subsystem Size | ||
|
|
||
| | Subsystem | Files | Additions | Deletions | | ||
| | --- | ---: | ---: | ---: | | ||
| | Interpolation / Autotune | 33 | 8,138 | 20 | | ||
| | FFT / NUFFT | 24 | 6,571 | 814 | | ||
| | Particle | 21 | 3,841 | 757 | | ||
| | Alpine / PIF examples | 12 | 1,923 | 37 | | ||
| | Utility | 11 | 1,424 | 45 | | ||
| | Communicate | 14 | 626 | 156 | | ||
| | Solvers | 6 | 394 | 253 | | ||
| | Field / Halo / Layout | 11 | 116 | 87 | | ||
| | Build / CMake | 4 | 117 | 12 | | ||
| | Types | 2 | 35 | 4 | | ||
| | FEM | 3 | 12 | 13 | | ||
| | Other | 8 | 34 | 7 | | ||
|
|
||
| The review load is dominated by three areas: | ||
|
|
||
| - Interpolation / Autotune: new scatter/gather dispatch, binning, tiling, tuning cache. | ||
| - FFT / NUFFT: backend/transform refactor plus native NUFFT and pruned transforms. | ||
| - Particle: new spatial update path, send/recv serialization hooks, sorting/buffer reuse. | ||
|
|
||
| ## Commit Shape | ||
|
|
||
| The branch starts with well-labeled subsystem commits: | ||
|
|
||
| | Commit | Area | Notes | | ||
| | --- | --- | --- | | ||
| | `7cff674a` | Build / dependencies | Adds `IPPL_ENABLE_FINUFFT`, `IPPL_ENABLE_CUFFTMP`, autotune preset plumbing. | | ||
| | `13950c4f` | FFT / NUFFT | Large FFT backend/transform split plus native NUFFT engine. | | ||
| | `dbbe59a4` | Interpolation | Scatter/gather redesign, binning, tiling, autotune runtime cache. | | ||
| | `2878b90b` | Particle / halo | Rewrites `ParticleSpatialLayout`, adds sort buffers, threads `nghost` through field layout/halo. | | ||
| | `9507a796` | Utility | Timing overhaul, `BufferView`, `Tuning`, `ParallelDispatch`, type utilities. | | ||
| | `f2820064` | Communicate | Serialization hooks, shared buffer handler. | | ||
| | `63d6ccf2` | Integration glue | Field/solver/type updates for new FFT and particle layout APIs. | | ||
| | `99d21ba7` | Alpine PIF | Adds ElectrostaticPIF examples. | | ||
| | `f0560f76` | Unit tests | Adds NUFFT, interpolation, binning, particle update tests. | | ||
| | `927cac7b` | Integration tests | Updates particle benchmark and adds scaling benchmark. | | ||
|
|
||
| After that, the branch has many fixup/regression commits. Several are descriptive and likely fold into the earlier subsystem commits; some are separable: | ||
|
|
||
| | Commit | Area | Split relevance | | ||
| | --- | --- | --- | | ||
| | `40438d17` | PCG / preconditioner | Good candidate for an independent PR: hoists per-iteration allocations out of solver loop. | | ||
| | `be5f794c` | PCG / PoissonCG | Follow-up to the PCG allocation PR: pass `Field` by reference through `OperatorF`. | | ||
| | `95762403` | PIF examples | Consolidates pruned variants into parameterized source. Belongs with PIF examples. | | ||
| | `0e7bc58d`, `6c1c48ad` | FFT / NUFFT | FINUFFT guards for CUDA + 2D unit-test build. Belong with NUFFT/FINUFFT PR. | | ||
| | `cec1ca61` | Cleanup | Doxygen, formatting, size-type consistency. Should be folded into relevant split PRs or kept as a final cleanup PR. | | ||
|
|
||
| ## Dependency Observations | ||
|
|
||
| ### NUFFT Depends On Interpolation | ||
|
|
||
| `src/FFT/NUFFT/NativeNUFFT.h` includes and uses: | ||
|
|
||
| - `Interpolation/Scatter/ScatterConfig.h` | ||
| - `Interpolation/Gather/GatherConfig.h` | ||
| - scatter/gather kernels through the new interpolation dispatch | ||
|
|
||
| That means native NUFFT cannot be cleanly reviewed before the higher-order scatter/gather layer. | ||
|
|
||
| ### PIF Depends On NUFFT | ||
|
|
||
| The Alpine PIF examples use: | ||
|
|
||
| - `ippl::FFT<ippl::NUFFTransform, Field_t>` | ||
| - `scatterPIFNUFFT` | ||
| - `gatherPIFNUFFT` | ||
|
|
||
| So PIF examples should come after the NUFFT transform API. | ||
|
|
||
| ### Particle Update And Communicate Are Coupled | ||
|
|
||
| The particle update rewrite uses: | ||
|
|
||
| - `ParticleAttrib::serialize` / `deserialize` | ||
| - shared `BufferHandler` | ||
| - direct archive serialization | ||
| - reusable send/recv buffers | ||
|
|
||
| This suggests communication and particle update may be one PR unless a clean lower-level communication PR is extracted first. | ||
|
|
||
| ### `nghost` / Halo Changes Cross Particle And Field | ||
|
|
||
| `ParticleSpatialLayout` changes also thread `nghost` through: | ||
|
|
||
| - `FieldLayout` | ||
| - `HaloCells` | ||
| - field layout APIs | ||
|
|
||
| This is small in line count but high in integration risk. It should be explicitly called out in whichever PR contains particle update. | ||
|
|
||
| ### PCG Allocation Fix Looks Separately Reviewable | ||
|
|
||
| The late commits: | ||
|
|
||
| - `40438d17` | ||
| - `be5f794c` | ||
|
|
||
| only touch: | ||
|
|
||
| - `src/LinearSolvers/PCG.h` | ||
| - `src/LinearSolvers/Preconditioner.h` | ||
| - `src/PoissonSolvers/PoissonCG.h` | ||
|
|
||
| This can likely become an independent performance/regression PR, separate from PIF. | ||
|
|
||
| ## Candidate Split | ||
|
|
||
| ### PR 1: PCG Allocation / Solver Performance | ||
|
|
||
| Scope: | ||
|
|
||
| - `src/LinearSolvers/PCG.h` | ||
| - `src/LinearSolvers/Preconditioner.h` | ||
| - `src/PoissonSolvers/PoissonCG.h` | ||
|
|
||
| Candidate commits: | ||
|
|
||
| - `40438d17` | ||
| - `be5f794c` | ||
|
|
||
| Reason: | ||
|
|
||
| - Smallest coherent split. | ||
| - Directly addresses the reported PCG slowdown / repeated `cudaMalloc` concern. | ||
| - Does not conceptually depend on PIF, NUFFT, or interpolation. | ||
|
|
||
| Risk: | ||
|
|
||
| - Need to verify solver convergence and exact iteration behavior. | ||
|
|
||
| ### PR 2: Communication And Particle Update Infrastructure | ||
|
|
||
| Scope: | ||
|
|
||
| - `src/Communicate/*` | ||
| - `src/Particle/ParticleSpatialLayout*` | ||
| - `src/Particle/ParticleAttrib*` | ||
| - `src/Particle/ParticleBase*` | ||
| - `src/Particle/ParticleSort.h` | ||
| - `src/Particle/SortBuffer.h` | ||
| - `src/FieldLayout/FieldLayout*` | ||
| - `src/Field/HaloCells*` | ||
| - particle update tests | ||
|
|
||
| Candidate commits: | ||
|
|
||
| - `2878b90b` | ||
| - `f2820064` | ||
| - relevant parts of `63d6ccf2` | ||
| - relevant parts of `f0560f76` | ||
| - relevant fixup commits after `728af370` | ||
|
|
||
| Reason: | ||
|
|
||
| - Captures the PIC scaling improvement that is not PIF-specific. | ||
| - Reviewable independently from FFT/NUFFT algorithms. | ||
|
|
||
| Risk: | ||
|
|
||
| - This is still a substantial behavioral change. | ||
| - Needs multi-rank CPU/GPU particle update and `ParticleSendRecv` coverage. | ||
|
|
||
| ### PR 3: Higher-Order Scatter/Gather And Autotune | ||
|
|
||
| Scope: | ||
|
|
||
| - `src/Interpolation/*` | ||
| - `cmake/AutoTunePresets.cmake` | ||
| - `cmake/IpplAutoTunePresets.h.in` | ||
| - `cmake/auto_tune/*` | ||
| - interpolation tests | ||
|
|
||
| Candidate commits: | ||
|
|
||
| - `7cff674a` only for autotune preset plumbing, not FINUFFT/CUFFTMP if separable. | ||
| - `dbbe59a4` | ||
| - interpolation portions of `f0560f76` | ||
| - `dacdcd9a` if routing legacy CIC through the new framework is included. | ||
|
|
||
| Reason: | ||
|
|
||
| - This is the algorithmic layer native NUFFT requires. | ||
| - Can be validated without PIF examples. | ||
|
|
||
| Risk: | ||
|
|
||
| - `dacdcd9a` changes existing CIC scatter/gather behavior by routing legacy paths through the new framework. That may be better as a second PR after the new interpolation layer lands. | ||
|
|
||
| ### PR 4: FFT Backend / Transform Refactor | ||
|
|
||
| Scope: | ||
|
|
||
| - `src/FFT/Backend/*` | ||
| - `src/FFT/Transform/CC.h` | ||
| - `src/FFT/Transform/RC.h` | ||
| - `src/FFT/Transform/Trig.h` | ||
| - `src/FFT/Transform/PrunedCC.h` | ||
| - `src/FFT/Transform/PrunedRC.h` | ||
| - `src/FFT/Transform/Common.h` | ||
| - `src/FFT/Traits.h` | ||
| - existing FFT test updates | ||
|
|
||
| Candidate commits: | ||
|
|
||
| - FFT refactor parts of `13950c4f` | ||
| - FFT portions of `f0560f76` | ||
| - FFT portions of later build fixes | ||
|
|
||
| Reason: | ||
|
|
||
| - Gives reviewers the backend/transform API without also reviewing NUFFT and PIF. | ||
|
|
||
| Risk: | ||
|
|
||
| - Current commit `13950c4f` mixes backend refactor with native NUFFT files. This split likely requires path-based extraction or commit surgery. | ||
|
|
||
| ### PR 5: Native NUFFT And FINUFFT/cuFINUFFT Integration | ||
|
|
||
| Scope: | ||
|
|
||
| - `src/FFT/NUFFT/*` | ||
| - `src/FFT/Transform/NUFFT.*` | ||
| - FINUFFT/CUFFTMP build switches if not already merged | ||
| - NUFFT tests | ||
|
|
||
| Candidate commits: | ||
|
|
||
| - NUFFT portions of `13950c4f` | ||
| - FINUFFT/CUFFTMP portions of `7cff674a` | ||
| - NUFFT portions of `f0560f76` | ||
| - `0e7bc58d` | ||
| - `6c1c48ad` | ||
|
|
||
| Reason: | ||
|
|
||
| - NUFFT depends on PR 3 scatter/gather and PR 4 transform structure. | ||
|
|
||
| Risk: | ||
|
|
||
| - Native NUFFT depends on new scatter/gather. | ||
| - FINUFFT path has Dim-guard complexity and external dependency complexity. | ||
|
|
||
| ### PR 6: PIF / Alpine Examples | ||
|
|
||
| Scope: | ||
|
|
||
| - `alpine/ElectrostaticPIF/*` | ||
| - Alpine manager wiring | ||
| - PIF-specific particle attrib convenience APIs if not included in NUFFT PR | ||
|
|
||
| Candidate commits: | ||
|
|
||
| - `99d21ba7` | ||
| - `95762403` | ||
| - PIF/example fixups | ||
|
|
||
| Reason: | ||
|
|
||
| - Examples should be reviewed after core APIs are accepted. | ||
|
|
||
| Risk: | ||
|
|
||
| - PIF examples currently depend on the full stack: particle update, scatter/gather, FFT transform refactor, NUFFT. | ||
|
|
||
| ## Minimal Split Alternative | ||
|
|
||
| If six PRs is too much process overhead, a practical minimum is: | ||
|
|
||
| 1. PCG allocation fix. | ||
| 2. Particle communication/update infrastructure. | ||
| 3. Scatter/gather + FFT backend + NUFFT core. | ||
| 4. Alpine PIF examples and benchmarks. | ||
|
|
||
| This is less clean, but still much better than one 23k-line PR. | ||
|
|
||
| ## Next Mapping Step | ||
|
|
||
| Prototype extraction order: | ||
|
|
||
| 1. Create a branch from `origin/master` for PCG. | ||
| 2. Cherry-pick `40438d17` and `be5f794c`. | ||
| 3. Build and run solver tests. | ||
| 4. If clean, this becomes the first small PR. | ||
|
|
||
| Then try the particle/communication split because it is probably the most valuable non-PIF improvement and has direct tests: | ||
|
|
||
| - `unit_tests/Particle/ParticleUpdate.cpp` | ||
| - `unit_tests/Particle/ParticleUpdateNonuniform.cpp` | ||
| - existing `ParticleSendRecv` | ||
| - existing `GatherScatterTest` | ||
|
|
||
| ## Progress | ||
|
|
||
| ### 2026-05-31: PR 1 Prototype - PCG Allocation / Solver Performance | ||
|
|
||
| Status: extracted cleanly. | ||
|
|
||
| Worktree: | ||
|
|
||
| ```text | ||
| /private/tmp/ippl-pr501-pcg | ||
| ``` | ||
|
|
||
| Branch: | ||
|
|
||
| ```text | ||
| pr501-pcg-split | ||
| ``` | ||
|
|
||
| Base: | ||
|
|
||
| ```text | ||
| origin/master @ da43fd66a18848b65dcd82af90001f5b64a798ab | ||
| ``` | ||
|
|
||
| Cherry-picked commits: | ||
|
|
||
| - `40438d17` -> `a76097f0` (`PCG/Preconditioner: hoist per-iteration allocations out of solve loop`) | ||
| - `be5f794c` -> `092a04e5` (`PCG: pass Field by reference through OperatorF`) | ||
|
|
||
| Resulting diff: | ||
|
|
||
| - `src/LinearSolvers/PCG.h` | ||
| - `src/LinearSolvers/Preconditioner.h` | ||
| - `src/PoissonSolvers/PoissonCG.h` | ||
| - 3 files changed, 388 insertions, 250 deletions | ||
|
|
||
| Validation: | ||
|
|
||
| ```text | ||
| cmake -S . -B build-pcg-debug -DCMAKE_BUILD_TYPE=Debug -DIPPL_ENABLE_UNIT_TESTS=ON -DIPPL_ENABLE_FFT=ON -DIPPL_DEFAULT_TEST_PROCS=1 | ||
| cmake --build build-pcg-debug | ||
| ctest --test-dir build-pcg-debug --output-on-failure -O build-pcg-debug/ctest-rank1.log | ||
| ``` | ||
|
|
||
| Result: | ||
|
|
||
| ```text | ||
| 100% tests passed, 0 tests failed out of 34 | ||
| Total Test time (real) = 54.17 sec | ||
| ``` | ||
|
|
||
| Assessment: | ||
|
|
||
| - This is a good standalone first PR. | ||
| - It has a narrow file set and clear motivation: remove repeated allocation work from the PCG/preconditioner loop. | ||
| - Recommended next validation before opening: run the relevant solver integration tests or CSCS GPU tests that originally showed PCG allocation/slowdown symptoms. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.