Benchmark: freqresp! optimization strategies by baggepinnen · Pull Request #1063 · JuliaControl/ControlSystems.jl

baggepinnen · 2026-06-09T03:57:36Z

Standalone benchmark (under benchmark_freqresp/) comparing strategies for maximizing freqresp! performance on the target use case: a 48-state SISO continuous state-space system over 150 frequencies. Nothing in the package source is modified — all candidate implementations live in benchmark_freqresp/variants.jl, with ControlSystemsBase.freqresp! as the correctness oracle.

Companion to #1062 (full write-up and discussion there).

Strategies benchmarked

Baseline / serial — existing Hessenberg ldiv2! loop
Threading — Threads.@spawn, Polyester.@batch, OhMyThreads over frequency chunks
SIMD — @turbo and @tturbo ldiv2! (real/imag split, since @turbo lacks Complex support)
Modal / pole-residue — eigendecompose once, then a pure broadcast+reduction (the Reactant/XLA/GPU-friendly form)

Key findings (24-core machine, loop-only µs)

Strategy	µs	speedup
serial Hessenberg (current)	~545	1×
`@turbo` ldiv2!	~370	1.5×
threaded (best, ~8 threads)	~90	6×
threaded + `@turbo`	~78	7×
modal, serial	~42	13×
modal, threaded	~10	54×

The per-frequency loop dominates over the one-time factorization (~565 µs vs ~85 µs).
@turbo is a solid ~1.5× free win (needs a manual real/imag split — bare @turbo won't compile on complex).
Threading scales sub-linearly; ~8 threads beats 24 for this size.
@tturbo is slower than @turbo — wrong axis to thread.
Algorithm beats micro-optimization: the modal reformulation is 13–54×, but trades numerical robustness (depends on cond(V); degrades for non-normal/defective A), which is why the package defaults to the Hessenberg solve.
Reactant.jl is not worth it at this scale — the modal form already runs in ~10 µs on CPU, below XLA dispatch / GPU launch latency.

Run

cd benchmark_freqresp
julia --project=. -e 'using Pkg; Pkg.develop(path="../lib/ControlSystemsBase"); Pkg.add(["BenchmarkTools","LoopVectorization","Polyester","OhMyThreads"]); Pkg.instantiate()'
JULIA_EXCLUSIVE=1 julia -t auto --project=. benchmark.jl

Draft — this is an exploratory benchmark, not a proposed package change. Opening for visibility/discussion of which strategy (likely the real/imag-split @turbo ldiv2!, and an opt-in modal fast path) to upstream.

🤖 Generated with Claude Code

@Spawn

Standalone benchmark comparing strategies for freqresp! on a 48-state SISO state-space system over 150 frequencies: - baseline / serial Hessenberg ldiv2! - threaded outer loop (Threads.@Spawn, Polyester.@Batch, OhMyThreads) - @turbo / @tturbo ldiv2! (real/imag split, since @turbo lacks complex support) - modal / pole-residue reformulation (the Reactant/XLA/GPU-friendly form) Findings: the per-frequency loop dominates over the factorization; @turbo gives ~1.5x, threading ~6-7x, and the modal reformulation 13-54x (at the cost of robustness, depending on cond(V)). @tturbo and Reactant are not worth it at this problem size. See benchmark_freqresp/README.md for full results. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@turbo

Verified empirically: @turbo on the back-substitution loop trips LoopVectorization check_args and silently falls back to a scalar loop. The inner j-loop is a sequential recurrence (scan) and the independent column axis is length 1 for SISO, so no vectorization is possible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add modal_accuracy.jl demonstrating, against a BigFloat reference, that the modal/pole-residue form's accuracy is governed by cond(V): it fails for defective A (repeated poles -> relerr 1.0) and near-defective/non-normal A (clustered eigenvalues -> relerr ~1e3), while the Hessenberg solve stays at machine precision. Evaluating near a pole is not a modal-specific weakness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

JuliaControlBot · 2026-06-09T04:10:58Z

This is an automated message.
Plots were compared to references. 4/11 images have changed, see differences below.
After pulling this PR, please update the reference images by creating a PR to ControlExamplePlots.jl here.

Difference	Reference Image	New Image
✔️ 0.0
✔️ 0.0
✔️ 0.0
✔️ 0.0

codecov · 2026-06-09T04:48:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.64%. Comparing base (856eb2b) to head (76e9832).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #1063       +/-   ##
===========================================
+ Coverage   19.68%   91.64%   +71.95%     
===========================================
  Files          42       42               
  Lines        5598     5660       +62     
===========================================
+ Hits         1102     5187     +4085     
+ Misses       4496      473     -4023

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

baggepinnen and others added 3 commits June 9, 2026 05:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark: freqresp! optimization strategies#1063

Benchmark: freqresp! optimization strategies#1063
baggepinnen wants to merge 3 commits into
masterfrom
freqresp-benchmarks

baggepinnen commented Jun 9, 2026

Uh oh!

JuliaControlBot commented Jun 9, 2026

Uh oh!

codecov Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

baggepinnen commented Jun 9, 2026

Strategies benchmarked

Key findings (24-core machine, loop-only µs)

Run

Uh oh!

JuliaControlBot commented Jun 9, 2026

Uh oh!

codecov Bot commented Jun 9, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants