Extract Particle Communication And Update Infrastructure From PR501#544
Open
aaadelmann wants to merge 11 commits into
Open
Extract Particle Communication And Update Infrastructure From PR501#544aaadelmann wants to merge 11 commits into
aaadelmann wants to merge 11 commits into
Conversation
Reuse PCG and preconditioner work fields across iterations instead of allocating temporary fields during every solve step. This also passes the solver field by reference through OperatorF to avoid extra halo-related allocations.
Split the communication and particle-update infrastructure from PR501 on top of the PCG split. This brings in reusable communication buffers, page-granular archive allocation, particle attribute serialization hooks, packed particle send IDs, particle sorting buffers, and the rewritten ParticleSpatialLayout update path. Keep this branch independent from the later interpolation, FFT, NUFFT, and PIF splits by dropping those APIs from the extracted ParticleAttrib changes. Add particle update regression coverage and update existing tests for live-view and page-sized buffer semantics. Validated with a Debug Serial Kokkos 5.0.0 build: full 1-rank ctest passes, plus ParticleSendRecv, ParticleUpdate, and ParticleUpdateNonuniform pass under mpiexec -n 2.
Consume CUDA/HIP runtime return values in Archive so HIP nodiscard annotations do not trigger warnings. Update particle benchmark/test callers to store ParticleAttrib::getView() by value now that it returns a live subview instead of a stable lvalue reference.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extract Particle Communication And Update Infrastructure From PR501
Summary
This PR extracts the particle communication/update infrastructure from the large
PR501 branch. It builds on the PCG allocation split and keeps FFT, NUFFT,
higher-order scatter/gather, and PIF examples out of scope.
The main goals are:
nghostthrough field layout / halo APIs where particle layout needsconsistent ghost-width awareness,
What Changed
Particle migration path
Main files:
src/Particle/ParticleSpatialLayout.hsrc/Particle/ParticleSpatialLayout.hppsrc/Particle/ParticleBase.hsrc/Particle/ParticleBase.hppsrc/Particle/ParticleAttrib.hsrc/Particle/ParticleAttrib.hppsrc/Particle/ParticleAttribBase.hParticleSpatialLayout::update()now has a more explicit multi-stage migrationflow:
The receive tail is timed explicitly:
particleWaitparticleFreeBuffersparticleDeserializeparticleDeserResizeparticleDeserCopyThis makes the previously hidden tail of
updateParticlevisible in profiles.Receive-side pre-reserve fix
The key performance fix is receive-side pre-reserving of particle attribute
capacity.
Root cause observed on LUMI:
ParticleAttrib::deserialize(offset, nrecvs)onceper source rank and attribute.
Kokkos::resizepreserves existing entries, so repeated grows copiedalready-live particle storage many times within a single update step.
particleDeserializeand caused a largeupdateParticleregression.
Fix:
Add
ParticleAttrib::reserve(size_type).Before deferred receive finalizers run, compute final receive capacity once:
Reserve every particle attribute once to that capacity.
Keep receive finalizers focused on archive copy/deserialization.
Relevant code:
Communication archive and buffer handling
Main files:
src/Communicate/Archive.hsrc/Communicate/Archive.hppsrc/Communicate/BufferHandler.hppsrc/Communicate/Buffers.*src/Communicate/Communicator.*src/Communicate/LogEntry.*src/Communicate/LoggingBufferHandler.hThe branch adds/updates:
Archive.Notable HIP detail:
Archiverounds HIP GPU allocations to 64 KiB granularity to satisfy HSA IPCrequirements used by Cray MPICH for large GPU transfers.
hipFree/cudaFreereturn values are intentionally cast tovoidto avoidwarning noise from
nodiscardreturn values in destructors/free paths.Particle sorting infrastructure
Main files:
src/Particle/ParticleSort.hsrc/Particle/SortBuffer.hThe branch introduces reusable sorting buffers and particle sort helpers used by
the new spatial update path. The buffers grow on demand and are reused to avoid
allocation churn.
Field layout / halo
nghostplumbingMain files:
src/FieldLayout/FieldLayout.hsrc/FieldLayout/FieldLayout.hppsrc/FieldLayout/SubFieldLayout.hppsrc/Field/HaloCells.hsrc/Field/HaloCells.hppsrc/Field/BareField.hppThe particle layout changes require consistent ghost-width awareness when field
layouts and halo neighbor regions are computed. This PR threads
nghostthroughthe relevant field layout / halo APIs.
This is small in line count but important for correctness; reviewers should
look at it together with the particle layout changes.
Utility support
Main files:
src/Utility/BufferView.hsrc/Utility/ParallelDispatch.hsrc/Utility/Tuning.hsrc/Utility/TypeUtils.hsrc/Utility/IpplTimings.*src/Utility/Timer.*The utility changes provide reusable support for:
ALPINE Kokkos view lifetime fixes
The ALPINE managers no longer take addresses of temporary Kokkos view handles
returned by
getView().Changed files:
alpine/LandauDampingManager.halpine/BumponTailInstabilityManager.halpine/PenningTrapManager.hBefore:
view_type* R = &(this->pcontainer_m->R.getView()); samplingR.generate(*R, rand_pool64);After:
view_type R = this->pcontainer_m->R.getView(); samplingR.generate(R, rand_pool64);This avoids dangling pointers/references to temporary view handles and fixes
compilers/backends that reject taking the address of a temporary Kokkos view.
Validation And Performance Evidence
LUMI Results (ALPS will follow)
513_10512_10512_10512_10512_10512_10The original symptom was a large
updateParticleregression moving frompr501-pcgto the communication/particle-update split. Initial child timerslooked small because deferred receive finalization/deserialization was hidden in
the tail of
updateParticle.Diagnostics split that tail into:
particleWait,particleFreeBuffers,particleDeserialize.The regression was traced to
particleDeserialize, not MPI wait time.LUMI before/after pre-reserve fix
Recorded in
PR501_SPLIT_MAP.md:updateParticlewall maxparticleDeserializewall maxupdateParticlewall maxparticleDeserializewall maxPost-fix timer split:
particleDeserializewall maxparticleDeserResizewall maxparticleDeserCopywall maxInterpretation:
The repeated preserving resize was the dominant regression.
After pre-reserving, deserialize time is small and almost entirely actual
archive copy.
At 128 ranks,
updateParticlebecame balanced:Local OpenMP check
The split map records a Mac OpenMP comparison between
pr501-pcgandpr501-communication-particle-update.Command shape:
Observed locally:
by the
particleDeserializediagnosis.Test coverage added/updated
New or updated tests include:
unit_tests/Particle/ParticleUpdate.cppunit_tests/Particle/ParticleUpdateNonuniform.cppunit_tests/Particle/ParticleSendRecv.cppunit_tests/Particle/ParticleBase.cppunit_tests/Communicate/BufferHandler.cpptest/particle.ParticleUpdateNonuniform.cppcovers ORB/nonuniform layout scenarios including:Reviewer Notes
Review the particle/communication changes as the new layer on top of PCG.
higher-order scatter/gather, and PIF changes.
nghostfield layout / halo changes should be reviewed with the particlelayout changes; they are part of the same correctness surface.
further GPU performance issues appear, profile
particleDeserCopyandArchive::deserialize(offset).