fix(ci): raise AVM check-circuit per-test timeout to 120s#23969
Draft
AztecBot wants to merge 1 commit into
Draft
fix(ci): raise AVM check-circuit per-test timeout to 120s#23969AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
Collaborator
Author
Flakey Tests🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
avm-check-circuitjob failed onnext(run 27215957791, exit code 124 = timeout).The GitHub job failure is just the symptom; the actual failure was a single AVM check-circuit test hitting its per-test timeout:
This is not a constraint failure — the circuit is valid. The individual test log shows it was simply killed mid-check after exceeding the 30s budget:
Running check (with skippable) circuit over 700560 rowsstarted, thentimeout: sending signal TERMat ~30sRoot cause
avm_check_circuit_cmdsruns every dumped tx withTIMEOUT=30sand the default 2 CPUs / 8g. Most e2e txs produce small AVM traces that clear check-circuit in 4–8s, but a few (notablye2e_multiple_blobs, ~700k circuit rows) need ~30–40s for trace generation plus check-circuit at 2 CPUs. The 30s budget was right on the edge, so this tx flakes. The triggering commit (#23963, a CI spot-instance change) is unrelated to the AVM — this is a pre-existing borderline-timeout test, not a regression.Change
Raise the per-test timeout
30s → 120sinavm_check_circuit_cmds.A longer timeout — rather than raising
CPUS— is the targeted fix: the parallel scheduler acquiresCPUScores per test from a global semaphore, so bumpingCPUSfrom 2 to 4 would halve concurrency (96 → 48 on a 192-core runner) for every tx, penalizing the hundreds of small txs to help a handful of large outliers. A longer timeout only lets the rare large tx hold its slot a little longer, with zero impact on the common case. 120s gives ~3x headroom over the observed ~36s worst case.Validation
The failing run shows the test executing correctly (valid trace, valid in-progress check) and being killed purely by the wall-clock timeout, so additional time resolves it. The change is a single comment + timeout-value edit in the CI test-command generator.
Created by claudebox · group:
slackbot