Skip to content

Reduce canary schedule frequency (AMD 4→2/day, ARM 8→2/day)#68821

Open
potiuk wants to merge 1 commit into
apache:mainfrom
potiuk:reduce-canary-frequency
Open

Reduce canary schedule frequency (AMD 4→2/day, ARM 8→2/day)#68821
potiuk wants to merge 1 commit into
apache:mainfrom
potiuk:reduce-canary-frequency

Conversation

@potiuk

@potiuk potiuk commented Jun 22, 2026

Copy link
Copy Markdown
Member

The scheduled canary runs the full matrix on all versions and is by far the
most expensive run type. AMD ran 4×/day and ARM 8×/day.

This takes both to 2×/day, interleaved so a full-matrix canary still
runs roughly every ~6 h, alternating architecture:

  • AMD: 58 1,13 * * * → 01:58, 13:58
  • ARM: 28 7,19 * * * → 07:28, 19:28

Savings: halving AMD alone is ~1,900 compute-h / 2 weeks (~6 % of AMD CI);
cutting ARM 8→2/day removes 75 % of the ARM canary schedule. It also relieves
runner-pool queuing — each canary is the heaviest job-flood (~269 jobs).

Trade-off (latency, not correctness): the canary is the "trust earned on
main" backstop. Interleaving keeps a ~6 h effective cadence for arch-agnostic
regressions; an architecture-specific regression now has up to a ~12 h
detection gap (each arch runs 2×/day) instead of ~6 h / ~3 h. Cron-only change,
no test logic touched.


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.8)

Generated-by: Claude Code (Opus 4.8) following the guidelines

The scheduled canary runs the full matrix on all versions and is by far the most
expensive run type. AMD ran 4x/day and ARM 8x/day; halving AMD alone saves
~1,900 compute-h / 2 weeks (~6% of AMD CI), and cutting ARM 8->2/day removes 75%
of the ARM canary schedule.

The two crons are interleaved (AMD 01:58 & 13:58, ARM 07:28 & 19:28) so a
full-matrix canary still runs roughly every ~6h, alternating architecture - so
the 'trust earned on main' backstop keeps a ~6h effective cadence for
arch-agnostic regressions (12h for arch-specific ones) while the scheduled
compute drops substantially. It also relieves runner-pool queuing, since each
canary is the heaviest job-flood.

check_ci_workflows_in_sync.py expected blocks updated to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant