Benchmark: freqresp! optimization strategies#1063
Draft
baggepinnen wants to merge 3 commits into
Draft
Conversation
Standalone benchmark comparing strategies for freqresp! on a 48-state SISO state-space system over 150 frequencies: - baseline / serial Hessenberg ldiv2! - threaded outer loop (Threads.@Spawn, Polyester.@Batch, OhMyThreads) - @turbo / @tturbo ldiv2! (real/imag split, since @turbo lacks complex support) - modal / pole-residue reformulation (the Reactant/XLA/GPU-friendly form) Findings: the per-frequency loop dominates over the factorization; @turbo gives ~1.5x, threading ~6-7x, and the modal reformulation 13-54x (at the cost of robustness, depending on cond(V)). @tturbo and Reactant are not worth it at this problem size. See benchmark_freqresp/README.md for full results. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified empirically: @turbo on the back-substitution loop trips LoopVectorization check_args and silently falls back to a scalar loop. The inner j-loop is a sequential recurrence (scan) and the independent column axis is length 1 for SISO, so no vectorization is possible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add modal_accuracy.jl demonstrating, against a BigFloat reference, that the modal/pole-residue form's accuracy is governed by cond(V): it fails for defective A (repeated poles -> relerr 1.0) and near-defective/non-normal A (clustered eigenvalues -> relerr ~1e3), while the Hessenberg solve stays at machine precision. Evaluating near a pole is not a modal-specific weakness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This is an automated message.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1063 +/- ##
===========================================
+ Coverage 19.68% 91.64% +71.95%
===========================================
Files 42 42
Lines 5598 5660 +62
===========================================
+ Hits 1102 5187 +4085
+ Misses 4496 473 -4023 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.








Standalone benchmark (under
benchmark_freqresp/) comparing strategies for maximizingfreqresp!performance on the target use case: a 48-state SISO continuous state-space system over 150 frequencies. Nothing in the package source is modified — all candidate implementations live inbenchmark_freqresp/variants.jl, withControlSystemsBase.freqresp!as the correctness oracle.Companion to #1062 (full write-up and discussion there).
Strategies benchmarked
ldiv2!loopThreads.@spawn,Polyester.@batch,OhMyThreadsover frequency chunks@turboand@tturboldiv2!(real/imag split, since@turbolacksComplexsupport)Key findings (24-core machine, loop-only µs)
@turboldiv2!@turbo@turbois a solid ~1.5× free win (needs a manual real/imag split — bare@turbowon't compile on complex).@tturbois slower than@turbo— wrong axis to thread.cond(V); degrades for non-normal/defectiveA), which is why the package defaults to the Hessenberg solve.Run
Draft — this is an exploratory benchmark, not a proposed package change. Opening for visibility/discussion of which strategy (likely the real/imag-split
@turboldiv2!, and an opt-in modal fast path) to upstream.🤖 Generated with Claude Code