Skip to content

PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop (backport of #6482)#6483

Open
hjmjohnson wants to merge 2 commits into
InsightSoftwareConsortium:release-5.4from
hjmjohnson:perf-dti-reconstruction-hoist-svd-r54
Open

PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop (backport of #6482)#6483
hjmjohnson wants to merge 2 commits into
InsightSoftwareConsortium:release-5.4from
hjmjohnson:perf-dti-reconstruction-hoist-svd-r54

Conversation

@hjmjohnson

Copy link
Copy Markdown
Member

Backport of #6482 to release-5.4.

DiffusionTensor3DReconstructionImageFilter constructed a vnl_svd of the constant m_TensorBasis once per voxel (~16.7M times for a 256³ DWI). The pseudo-inverse (the dual tensor basis) is identical for every voxel, so this precomputes it once in BeforeThreadedGenerateData and applies it per voxel as a matrix-vector product. vnl_svd::solve(X) is pinverse()·X, so the reconstructed tensors are unchanged; removing the per-voxel SVD also frees the loop of the thread-unsafe netlib dsvdc call.

Verification

The change is identical to #6482 (main), which is validated by itkDiffusionTensor3DReconstructionImageFilterTest plus a mutation check (corrupting the precomputed pseudo-inverse makes that test fail). The release-5.4 filter is structurally identical; the edited filter compiles and type-checks clean against ITK 5.x, and release-5.4 CI runs the same test.

DiffusionTensor3DReconstructionImageFilter constructed a vnl_svd of the constant
m_TensorBasis once per voxel (~16.7M times for a 256^3 DWI) to solve the
Stejskal-Tanner equations. The pseudo-inverse (the dual tensor basis) is the same
for every voxel, so precompute it once in BeforeThreadedGenerateData and apply it
per voxel as a matrix-vector product. vnl_svd::solve(X) is pinverse()*X, so the
reconstructed tensors are unchanged. Removing the per-voxel SVD also frees the
loop of the thread-unsafe netlib dsvdc call, so the stale single-thread warning
is dropped.
@github-actions github-actions Bot added type:Performance Improvement in terms of compilation or execution time area:Filtering Issues affecting the Filtering module labels Jun 21, 2026
The existing DiffusionTensor3DReconstructionImageFilter test only exercises the
multi-image gradient path (AddGradientImage). Add a test that reconstructs the
same data through the single-VectorImage path (SetGradientImage) and verifies it
produces identical tensors, covering the previously-untested code path.
@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct labels Jun 21, 2026
@hjmjohnson hjmjohnson marked this pull request as ready for review June 21, 2026 03:19
@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Hoists the constant-per-image vnl_svd pseudo-inverse computation out of the per-voxel loop in DiffusionTensor3DReconstructionImageFilter, precomputing it once in BeforeThreadedGenerateData and replacing the per-voxel solve() call with a matrix–vector product. This eliminates redundant SVD work and removes a thread-unsafe dsvdc call from the threaded path.

  • itkDiffusionTensor3DReconstructionImageFilter.{h,hxx}: Adds m_TensorBasisInverse (precomputed vnl_svd::pinverse() of m_TensorBasis), removes the two identical vnl_svd constructions inside both iterator-path voxel loops, and deletes the now-obsolete comment warning about the single-thread requirement.
  • itkDiffusionTensor3DReconstructionSingleImageTest.cxx: New regression test that exercises the SetGradientImage (single VectorImage) code path — which the existing test does not cover — and verifies it produces tensors identical to the AddGradientImage multi-image path within 1e-8.

Confidence Score: 5/5

Safe to merge; the optimization is mathematically equivalent to the original and the thread-safety concern it resolves is validated by the existing and new tests.

Both voxel-loop code paths are updated consistently, the m_BMatrix transpose ordering in BeforeThreadedGenerateData is preserved, and vnl_svd::pinverse()·b is algebraically identical to vnl_svd::solve(b). The new test covers the previously untested SetGradientImage path and would catch numerical regressions.

No files require special attention; all four changed files are straightforward and consistent with each other.

Important Files Changed

Filename Overview
Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.hxx Core change: SVD pseudo-inverse moved to BeforeThreadedGenerateData; both voxel-loop paths updated consistently; m_BMatrix transpose ordering preserved.
Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h Adds m_TensorBasisInverse as a vnl_matrix private member with a clear comment; no API or public interface changes.
Modules/Filtering/DiffusionTensorImage/test/itkDiffusionTensor3DReconstructionSingleImageTest.cxx New test covering the SetGradientImage path; compares multi-image vs single-image output with a 1e-8 tolerance and an anyNonZero guard to catch degenerate zero output.
Modules/Filtering/DiffusionTensorImage/test/CMakeLists.txt Registers the new test source and adds an itk_add_test entry; no data dependencies needed since the test generates synthetic in-memory data.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant ITK as ITK Pipeline
    participant BTGD as BeforeThreadedGenerateData
    participant TGD as DynamicThreadedGenerateData (N threads)

    ITK->>BTGD: call once before threading
    BTGD->>BTGD: build m_BMatrix from gradient dirs
    BTGD->>BTGD: set m_TensorBasis
    BTGD->>BTGD: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
    BTGD->>BTGD: m_BMatrix.inplace_transpose()
    BTGD-->>ITK: done

    ITK->>TGD: spawn N threads
    par Thread 1
        TGD->>TGD: "D = m_TensorBasisInverse x (m_BMatrix x B)"
    and Thread 2
        TGD->>TGD: "D = m_TensorBasisInverse x (m_BMatrix x B)"
    and Thread N
        TGD->>TGD: "D = m_TensorBasisInverse x (m_BMatrix x B)"
    end
    TGD-->>ITK: output tensor image
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant ITK as ITK Pipeline
    participant BTGD as BeforeThreadedGenerateData
    participant TGD as DynamicThreadedGenerateData (N threads)

    ITK->>BTGD: call once before threading
    BTGD->>BTGD: build m_BMatrix from gradient dirs
    BTGD->>BTGD: set m_TensorBasis
    BTGD->>BTGD: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
    BTGD->>BTGD: m_BMatrix.inplace_transpose()
    BTGD-->>ITK: done

    ITK->>TGD: spawn N threads
    par Thread 1
        TGD->>TGD: "D = m_TensorBasisInverse x (m_BMatrix x B)"
    and Thread 2
        TGD->>TGD: "D = m_TensorBasisInverse x (m_BMatrix x B)"
    and Thread N
        TGD->>TGD: "D = m_TensorBasisInverse x (m_BMatrix x B)"
    end
    TGD-->>ITK: output tensor image
Loading

Reviews (1): Last reviewed commit: "ENH: Add single-image-path test for DTI ..." | Re-trigger Greptile

@hjmjohnson hjmjohnson requested a review from thewtex June 21, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Filtering Issues affecting the Filtering module type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Performance Improvement in terms of compilation or execution time type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant