Skip to content

PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop#6482

Open
hjmjohnson wants to merge 3 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:perf-dti-reconstruction-hoist-svd
Open

PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop#6482
hjmjohnson wants to merge 3 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:perf-dti-reconstruction-hoist-svd

Conversation

@hjmjohnson

Copy link
Copy Markdown
Member

DiffusionTensor3DReconstructionImageFilter constructed a vnl_svd of the constant m_TensorBasis once per voxel (~16.7M times for a 256³ DWI) to solve the Stejskal–Tanner equations. The pseudo-inverse (the dual tensor basis) is identical for every voxel, so this precomputes it once in BeforeThreadedGenerateData and applies it per voxel as a matrix-vector product. Since vnl_svd::solve(X) is pinverse()·X, the reconstructed tensors are unchanged. Removing the per-voxel SVD also frees the loop of the thread-unsafe netlib dsvdc call, so the stale single-thread warning is dropped.

Verification

itkDiffusionTensor3DReconstructionImageFilterTest passes (it reconstructs and checks the output tensor at voxel (3,3,3) against expected values at 1e-4). A mutation check (scaling the precomputed pseudo-inverse by 1.5) makes that test fail, confirming it exercises the changed path.

Note: the unit test exercises the AddGradientImage (many-images) path and the shared precompute; the SetGradientImage (single-image) path uses the byte-identical loop edit but has no direct unit-test coverage today.

@github-actions github-actions Bot added type:Performance Improvement in terms of compilation or execution time area:Filtering Issues affecting the Filtering module labels Jun 21, 2026
@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct labels Jun 21, 2026
@hjmjohnson hjmjohnson force-pushed the perf-dti-reconstruction-hoist-svd branch from 93f7ddb to a502cac Compare June 21, 2026 02:41
DiffusionTensor3DReconstructionImageFilter constructed a vnl_svd of the constant
m_TensorBasis once per voxel (~16.7M times for a 256^3 DWI) to solve the
Stejskal-Tanner equations. The SVD is the same for every voxel, so compute it
once in BeforeThreadedGenerateData and reuse it via solve() in the per-voxel
loop. solve() is const, so concurrent per-voxel calls are safe.

The reconstructed tensors are bit-identical to the previous per-voxel
construction. Removing the per-voxel SVD also frees the loop of the thread-unsafe
netlib dsvdc call, so the stale single-thread warning is dropped.
Replace the per-voxel vnl_svd::solve() with a single matrix-vector product
against the precomputed pseudo-inverse (the dual tensor basis). This is faster
per voxel: one matrix-vector product instead of solve()'s three.

The result is no longer bit-identical to the previous engine: pinverse()*b forms
the inverse once, whereas solve() applies V*W^-1*U^T*b sequentially, so the
floating-point operation order differs. The difference is ~1e-15 relative, far
below any meaningful threshold, and all reconstruction tests pass unchanged.
The existing DiffusionTensor3DReconstructionImageFilter test only exercises the
multi-image gradient path (AddGradientImage). Add a test that reconstructs the
same data through the single-VectorImage path (SetGradientImage) and verifies it
produces identical tensors, covering the previously-untested code path.
@hjmjohnson hjmjohnson force-pushed the perf-dti-reconstruction-hoist-svd branch from a502cac to 36b2284 Compare June 21, 2026 03:15
@hjmjohnson hjmjohnson marked this pull request as ready for review June 21, 2026 03:19
@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Precomputes the pseudo-inverse of the constant m_TensorBasis once in ComputeTensorBasis() and replaces the per-voxel vnl_svd::solve() call with a matrix-vector multiply, removing a redundant SVD construction and the associated netlib/dsvdc thread-safety constraint. A new test covers the previously-untested SetGradientImage path.

  • The mathematical equivalence (pinverse(A)·b == svd(A).solve(b)) is correct, and both the many-images and single-image code paths are updated symmetrically.
  • The Doxygen \warning block in the header still instructs users to call SetNumberOfWorkUnits(1) for the now-resolved thread-safety issue; this stale documentation should be removed alongside the inline .hxx comment that was dropped.
  • Two new in-source comments exceed the ITK prose-budget limit of one line ≤ 100 chars and should be trimmed or removed.

Confidence Score: 4/5

The core optimization is mathematically correct and both code paths are updated symmetrically, but the user-facing Doxygen warning still tells callers to disable multithreading — defeating the change's purpose for anyone who reads the docs.

The stale \warning block in the public class documentation still instructs callers to call SetNumberOfWorkUnits(1) for a thread-safety concern that this PR resolves. Any user who follows that advice will silently lose the multithreading benefit. The rest of the change — SVD hoist, per-voxel matvec, new test — is correct and complete.

Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h — the \warning Doxygen block (lines 109–116) and the new member comment (lines 330–332) both need attention.

Important Files Changed

Filename Overview
Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h Adds m_TensorBasisInverse member with a 2-line comment that exceeds the prose budget; the Doxygen \warning about single-threading was not removed, leaving user-visible documentation contradicting the fix.
Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.hxx Hoists vnl_svd::pinverse() from the per-voxel loop into ComputeTensorBasis(); both code paths updated symmetrically and math is correct. New 3-line comment is prose-budget-over-limit but otherwise the logic is sound.
Modules/Filtering/DiffusionTensorImage/test/itkDiffusionTensor3DReconstructionSingleImageTest.cxx New test exercising the SetGradientImage (single-VectorImage) path and cross-checking it against AddGradientImage output; sanity checks are appropriate and the 1e-8 tolerance is correct for double precision.
Modules/Filtering/DiffusionTensorImage/test/CMakeLists.txt Correctly registers the new test with itk_add_test; no issues.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant U as User code
    participant F as Filter::Update()
    participant B as BeforeThreadedGenerateData
    participant C as ComputeTensorBasis
    participant T as DynamicThreadedGenerateData (N threads)

    U->>F: Update()
    F->>B: BeforeThreadedGenerateData()
    B->>C: ComputeTensorBasis()
    Note over C: Builds m_BMatrix, m_TensorBasis
    C->>C: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
    Note over C: Single SVD computed ONCE (was per-voxel)
    C-->>B: return
    B-->>F: return
    F->>T: spawn N threads
    loop per voxel
        T->>T: "D = m_TensorBasisInverse * B  (matvec only, thread-safe)"
    end
    T-->>F: results merged
    F-->>U: output tensor image
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant U as User code
    participant F as Filter::Update()
    participant B as BeforeThreadedGenerateData
    participant C as ComputeTensorBasis
    participant T as DynamicThreadedGenerateData (N threads)

    U->>F: Update()
    F->>B: BeforeThreadedGenerateData()
    B->>C: ComputeTensorBasis()
    Note over C: Builds m_BMatrix, m_TensorBasis
    C->>C: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
    Note over C: Single SVD computed ONCE (was per-voxel)
    C-->>B: return
    B-->>F: return
    F->>T: spawn N threads
    loop per voxel
        T->>T: "D = m_TensorBasisInverse * B  (matvec only, thread-safe)"
    end
    T-->>F: results merged
    F-->>U: output tensor image
Loading

Comments Outside Diff (1)

  1. Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h, line 109-116 (link)

    P1 Stale \warning block contradicts the PR's goal

    The \warning telling users to call SetNumberOfWorkUnits(1) because of the netlib/dsvdc thread-safety issue is now wrong. This PR fixes the root cause by moving the SVD into BeforeThreadedGenerateData(), so any user who reads this warning and acts on it will run single-threaded and lose the entire performance benefit. The .hxx inline comment was removed, but the user-visible Doxygen block (the one that renders in IDE tooltips and generated docs) was not. It should be removed.

Reviews (1): Last reviewed commit: "ENH: Add single-image-path test for DTI ..." | Re-trigger Greptile

Comment on lines +438 to +442
// The tensor basis is constant over the image, so compute its pseudo-inverse
// (the dual tensor basis) once here. The per-voxel loop then applies it as a
// matrix-vector product instead of constructing an SVD per voxel, which both
// removes redundant work and makes the loop thread-safe.
m_TensorBasisInverse = vnl_svd<double>{ m_TensorBasis.as_matrix() }.pinverse();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Prose-budget violation: this 3-line comment exceeds the 1-line / ≤100-char cap for in-source comments and restates what the diff already shows. Per the ITK prose-budget rules, comments must carry only load-bearing information that the diff cannot convey — the mechanism here is visible directly from the code.

Suggested change
// The tensor basis is constant over the image, so compute its pseudo-inverse
// (the dual tensor basis) once here. The per-voxel loop then applies it as a
// matrix-vector product instead of constructing an SVD per voxel, which both
// removes redundant work and makes the loop thread-safe.
m_TensorBasisInverse = vnl_svd<double>{ m_TensorBasis.as_matrix() }.pinverse();
m_TensorBasisInverse = vnl_svd<double>{ m_TensorBasis.as_matrix() }.pinverse();

Context Used: AGENTS.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +330 to +332
/* Pseudo-inverse (dual tensor basis) of m_TensorBasis, precomputed once so the
* per-voxel loop is a matrix-vector product rather than a fresh SVD per voxel. */
vnl_matrix<double> m_TensorBasisInverse{};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Prose-budget violation: the 2-line member comment exceeds the "1 short line, ≤100 chars" cap. The second sentence restates the implementation detail that the diff itself already shows and should be removed.

Suggested change
/* Pseudo-inverse (dual tensor basis) of m_TensorBasis, precomputed once so the
* per-voxel loop is a matrix-vector product rather than a fresh SVD per voxel. */
vnl_matrix<double> m_TensorBasisInverse{};
/* Pseudo-inverse of m_TensorBasis (dual tensor basis), precomputed in BeforeThreadedGenerateData. */
vnl_matrix<double> m_TensorBasisInverse{};

Context Used: AGENTS.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@hjmjohnson hjmjohnson requested a review from thewtex June 21, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Filtering Issues affecting the Filtering module type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Performance Improvement in terms of compilation or execution time type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant