take fast path if c2c transform does not need padding or trimming by chillenb · Pull Request #283 · IntelPython/mkl_fft

chillenb · 2026-03-03T02:36:53Z

Thanks for creating and maintaining this package!

If you try to get MKL C-API performance out of this package, you will probably discover that fftn is very sensitive to the input arguments. Here is an example:

In [1]: import numpy as np
   ...: import mkl_fft
   ...: N = 200
   ...: A = np.random.random((1,N,N,N)).astype(np.complex128)
 
In [2]: %timeit mkl_fft.fftn(A, s=A.shape[1:], axes=(1,2,3))
164 ms ± 187 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %timeit mkl_fft.fftn(A, axes=(1,2,3))
6.56 ms ± 304 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit mkl_fft.interfaces.numpy_fft.fftn(A, axes=(1,2,3))
165 ms ± 31.3 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

This is because mkl_fft.fftn always takes a slow path (_iter_fftnd) when s != None. Furthermore, the NumPy and SciPy interfaces don't pass through s=None unchanged, so they are also forced to take this path.
This pull request allows fftn to detect when the input s argument is equivalent to s=None so it can use the faster function _iter_complementary.

After these code changes, performance aligns better with expectations:

In [1]: import numpy as np
   ...: import mkl_fft
   ...: N = 200
   ...: A = np.random.random((1,N,N,N)).astype(np.complex128)

In [2]: %timeit mkl_fft.interfaces.numpy_fft.fftn(A, axes=(1,2,3))
8.28 ms ± 551 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %timeit mkl_fft.fftn(A, s=A.shape[1:], axes=(1,2,3))
The slowest run took 4.49 times longer than the fastest. This could mean that an intermediate result is being cached.
9.92 ms ± 7.49 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit mkl_fft.fftn(A, axes=(1,2,3))
6.49 ms ± 60.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Test system: dual-socket Xeon Platinum 8268 server.

intel-python-devops · 2026-03-03T02:40:07Z

Can one of the admins verify this patch?

chillenb · 2026-03-03T03:21:29Z

Oops, I didn't realize that the CI would have to be approved again after fixing the whitespace. Sorry!

The tests previously approved did run and pass, though.

ndgrigorian · 2026-03-03T04:52:39Z

Oops, I didn't realize that the CI would have to be approved again after fixing the whitespace. Sorry!

The tests previously approved did run and pass, though.

It's no problem, thanks for this contribution to the project. :)

I'm not sure if any of our tests currently cover this case and compare with e.g. numpy, so adding a test would be good too.

take fast path if c2c transform does not need padding or trimming

568b208

chillenb requested review from antonwolfy, jharlow-intel, ndgrigorian and xaleryb as code owners March 3, 2026 02:36

Satisfy linter

db1a1c2

add test for s=None vs equivalent s

d761e90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

take fast path if c2c transform does not need padding or trimming#283

take fast path if c2c transform does not need padding or trimming#283
chillenb wants to merge 3 commits intoIntelPython:masterfrom
chillenb:faster

chillenb commented Mar 3, 2026

Uh oh!

intel-python-devops commented Mar 3, 2026

Uh oh!

chillenb commented Mar 3, 2026 •

edited

Loading

Uh oh!

ndgrigorian commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chillenb commented Mar 3, 2026

Uh oh!

intel-python-devops commented Mar 3, 2026

Uh oh!

chillenb commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ndgrigorian commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chillenb commented Mar 3, 2026 •

edited

Loading