PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop by hjmjohnson · Pull Request #6482 · InsightSoftwareConsortium/ITK

hjmjohnson · 2026-06-21T00:40:36Z

DiffusionTensor3DReconstructionImageFilter constructed a vnl_svd of the constant m_TensorBasis once per voxel (~16.7M times for a 256³ DWI) to solve the Stejskal–Tanner equations. The pseudo-inverse (the dual tensor basis) is identical for every voxel, so this precomputes it once in BeforeThreadedGenerateData and applies it per voxel as a matrix-vector product. Since vnl_svd::solve(X) is pinverse()·X, the reconstructed tensors are unchanged. Removing the per-voxel SVD also frees the loop of the thread-unsafe netlib dsvdc call, so the stale single-thread warning is dropped.

Verification

itkDiffusionTensor3DReconstructionImageFilterTest passes (it reconstructs and checks the output tensor at voxel (3,3,3) against expected values at 1e-4). A mutation check (scaling the precomputed pseudo-inverse by 1.5) makes that test fail, confirming it exercises the changed path.

Note: the unit test exercises the AddGradientImage (many-images) path and the shared precompute; the SetGradientImage (single-image) path uses the byte-identical loop edit but has no direct unit-test coverage today.

DiffusionTensor3DReconstructionImageFilter constructed a vnl_svd of the constant m_TensorBasis once per voxel (~16.7M times for a 256^3 DWI) to solve the Stejskal-Tanner equations. The SVD is the same for every voxel, so compute it once in BeforeThreadedGenerateData and reuse it via solve() in the per-voxel loop. solve() is const, so concurrent per-voxel calls are safe. The reconstructed tensors are bit-identical to the previous per-voxel construction. Removing the per-voxel SVD also frees the loop of the thread-unsafe netlib dsvdc call, so the stale single-thread warning is dropped.

Replace the per-voxel vnl_svd::solve() with a single matrix-vector product against the precomputed pseudo-inverse (the dual tensor basis). This is faster per voxel: one matrix-vector product instead of solve()'s three. The result is no longer bit-identical to the previous engine: pinverse()*b forms the inverse once, whereas solve() applies V*W^-1*U^T*b sequentially, so the floating-point operation order differs. The difference is ~1e-15 relative, far below any meaningful threshold, and all reconstruction tests pass unchanged.

The existing DiffusionTensor3DReconstructionImageFilter test only exercises the multi-image gradient path (AddGradientImage). Add a test that reconstructs the same data through the single-VectorImage path (SetGradientImage) and verifies it produces identical tensors, covering the previously-untested code path.

greptile-apps · 2026-06-21T03:23:12Z

Greptile Summary

Precomputes the pseudo-inverse of the constant m_TensorBasis once in ComputeTensorBasis() and replaces the per-voxel vnl_svd::solve() call with a matrix-vector multiply, removing a redundant SVD construction and the associated netlib/dsvdc thread-safety constraint. A new test covers the previously-untested SetGradientImage path.

The mathematical equivalence (pinverse(A)·b == svd(A).solve(b)) is correct, and both the many-images and single-image code paths are updated symmetrically.
The Doxygen \warning block in the header still instructs users to call SetNumberOfWorkUnits(1) for the now-resolved thread-safety issue; this stale documentation should be removed alongside the inline .hxx comment that was dropped.
Two new in-source comments exceed the ITK prose-budget limit of one line ≤ 100 chars and should be trimmed or removed.

Confidence Score: 4/5

The core optimization is mathematically correct and both code paths are updated symmetrically, but the user-facing Doxygen warning still tells callers to disable multithreading — defeating the change's purpose for anyone who reads the docs.

The stale \warning block in the public class documentation still instructs callers to call SetNumberOfWorkUnits(1) for a thread-safety concern that this PR resolves. Any user who follows that advice will silently lose the multithreading benefit. The rest of the change — SVD hoist, per-voxel matvec, new test — is correct and complete.

Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h — the \warning Doxygen block (lines 109–116) and the new member comment (lines 330–332) both need attention.

Important Files Changed

Filename	Overview
Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h	Adds `m_TensorBasisInverse` member with a 2-line comment that exceeds the prose budget; the Doxygen `\warning` about single-threading was not removed, leaving user-visible documentation contradicting the fix.
Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.hxx	Hoists `vnl_svd::pinverse()` from the per-voxel loop into `ComputeTensorBasis()`; both code paths updated symmetrically and math is correct. New 3-line comment is prose-budget-over-limit but otherwise the logic is sound.
Modules/Filtering/DiffusionTensorImage/test/itkDiffusionTensor3DReconstructionSingleImageTest.cxx	New test exercising the `SetGradientImage` (single-VectorImage) path and cross-checking it against `AddGradientImage` output; sanity checks are appropriate and the 1e-8 tolerance is correct for double precision.
Modules/Filtering/DiffusionTensorImage/test/CMakeLists.txt	Correctly registers the new test with `itk_add_test`; no issues.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant U as User code
    participant F as Filter::Update()
    participant B as BeforeThreadedGenerateData
    participant C as ComputeTensorBasis
    participant T as DynamicThreadedGenerateData (N threads)

    U->>F: Update()
    F->>B: BeforeThreadedGenerateData()
    B->>C: ComputeTensorBasis()
    Note over C: Builds m_BMatrix, m_TensorBasis
    C->>C: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
    Note over C: Single SVD computed ONCE (was per-voxel)
    C-->>B: return
    B-->>F: return
    F->>T: spawn N threads
    loop per voxel
        T->>T: "D = m_TensorBasisInverse * B  (matvec only, thread-safe)"
    end
    T-->>F: results merged
    F-->>U: output tensor image

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant U as User code
    participant F as Filter::Update()
    participant B as BeforeThreadedGenerateData
    participant C as ComputeTensorBasis
    participant T as DynamicThreadedGenerateData (N threads)

    U->>F: Update()
    F->>B: BeforeThreadedGenerateData()
    B->>C: ComputeTensorBasis()
    Note over C: Builds m_BMatrix, m_TensorBasis
    C->>C: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
    Note over C: Single SVD computed ONCE (was per-voxel)
    C-->>B: return
    B-->>F: return
    F->>T: spawn N threads
    loop per voxel
        T->>T: "D = m_TensorBasisInverse * B  (matvec only, thread-safe)"
    end
    T-->>F: results merged
    F-->>U: output tensor image

Comments Outside Diff (1)

Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h, line 109-116 (link)

Stale \warning block contradicts the PR's goal

The \warning telling users to call SetNumberOfWorkUnits(1) because of the netlib/dsvdc thread-safety issue is now wrong. This PR fixes the root cause by moving the SVD into BeforeThreadedGenerateData(), so any user who reads this warning and acts on it will run single-threaded and lose the entire performance benefit. The .hxx inline comment was removed, but the user-visible Doxygen block (the one that renders in IDE tooltips and generated docs) was not. It should be removed.

_{Reviews (1): Last reviewed commit: "ENH: Add single-image-path test for DTI ..." | Re-trigger Greptile}

greptile-apps · 2026-06-21T03:23:16Z

+  // The tensor basis is constant over the image, so compute its pseudo-inverse
+  // (the dual tensor basis) once here. The per-voxel loop then applies it as a
+  // matrix-vector product instead of constructing an SVD per voxel, which both
+  // removes redundant work and makes the loop thread-safe.
+  m_TensorBasisInverse = vnl_svd<double>{ m_TensorBasis.as_matrix() }.pinverse();


Prose-budget violation: this 3-line comment exceeds the 1-line / ≤100-char cap for in-source comments and restates what the diff already shows. Per the ITK prose-budget rules, comments must carry only load-bearing information that the diff cannot convey — the mechanism here is visible directly from the code.

Suggested change

// The tensor basis is constant over the image, so compute its pseudo-inverse

// (the dual tensor basis) once here. The per-voxel loop then applies it as a

// matrix-vector product instead of constructing an SVD per voxel, which both

// removes redundant work and makes the loop thread-safe.

m_TensorBasisInverse = vnl_svd<double>{ m_TensorBasis.as_matrix() }.pinverse();

m_TensorBasisInverse = vnl_svd<double>{ m_TensorBasis.as_matrix() }.pinverse();

Context Used: AGENTS.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-21T03:23:17Z

+  /* Pseudo-inverse (dual tensor basis) of m_TensorBasis, precomputed once so the
+   * per-voxel loop is a matrix-vector product rather than a fresh SVD per voxel. */
+  vnl_matrix<double> m_TensorBasisInverse{};


Prose-budget violation: the 2-line member comment exceeds the "1 short line, ≤100 chars" cap. The second sentence restates the implementation detail that the diff itself already shows and should be removed.

Suggested change

/* Pseudo-inverse (dual tensor basis) of m_TensorBasis, precomputed once so the

* per-voxel loop is a matrix-vector product rather than a fresh SVD per voxel. */

vnl_matrix<double> m_TensorBasisInverse{};

/* Pseudo-inverse of m_TensorBasis (dual tensor basis), precomputed in BeforeThreadedGenerateData. */

vnl_matrix<double> m_TensorBasisInverse{};

Context Used: AGENTS.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

github-actions Bot added type:Performance Improvement in terms of compilation or execution time area:Filtering Issues affecting the Filtering module labels Jun 21, 2026

hjmjohnson mentioned this pull request Jun 21, 2026

PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop (backport of #6482) #6483

Open

github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct labels Jun 21, 2026

hjmjohnson force-pushed the perf-dti-reconstruction-hoist-svd branch from 93f7ddb to a502cac Compare June 21, 2026 02:41

hjmjohnson mentioned this pull request Jun 21, 2026

COMP: Remove duplicate DTI reconstruction filter; use ITK's BRAINSia/BRAINSTools#611

Merged

hjmjohnson added 3 commits June 20, 2026 22:15

hjmjohnson force-pushed the perf-dti-reconstruction-hoist-svd branch from a502cac to 36b2284 Compare June 21, 2026 03:15

hjmjohnson marked this pull request as ready for review June 21, 2026 03:19

greptile-apps Bot reviewed Jun 21, 2026

View reviewed changes

hjmjohnson requested a review from thewtex June 21, 2026 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop#6482

PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop#6482
hjmjohnson wants to merge 3 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:perf-dti-reconstruction-hoist-svd

hjmjohnson commented Jun 21, 2026

Uh oh!

greptile-apps Bot commented Jun 21, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 21, 2026

Uh oh!

greptile-apps Bot Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hjmjohnson commented Jun 21, 2026

Uh oh!

greptile-apps Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Jun 21, 2026 •

edited

Loading