fix: add PR quick benchmark mode and timeout to resolve CI timeout issue#7
Merged
fix: add PR quick benchmark mode and timeout to resolve CI timeout issue#7
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #6
Root cause: Criterion's default 100-sample measurement combined with SpacetimeDB's synchronous round-trip per operation (~8000 RTTs × ~1ms = ~8s/iter) caused each of the 7 SpacetimeDB benchmarks to run ~13 minutes, totalling >6h and hitting GitHub Actions' hard limit. Changes: - Add `benchmark-pr` job for pull_request events: runs with reduced scale (BENCHMARK_LINK_COUNT=10, BACKGROUND_LINK_COUNT=30) and tight Criterion settings (--sample-size 10, --warm-up-time 1, --measurement-time 2) for ~3-5 min total. Results uploaded as artifacts but not committed. - Update `benchmark` job (main branch only): use --sample-size 20 --nresamples 10000 for ~30-45 min runtime (was potentially 6h+ with defaults). Add timeout-minutes: 180. - Fix `test` job timeout from 360 min to 30 min (tests complete in ~3 min). - Add case study analysis at docs/case-studies/issue-6/README.md with root cause, timeline, and detailed performance calculations. - Add changelog entry. Fixes #6. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 25f463e.
Member
Author
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
Member
Author
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #6 — Benchmark CI job exceeded GitHub Actions' 6-hour time limit.
Root Cause
The
Benchmarkjob had no timeout and ran with Criterion's default settings (100 samples, 5s measurement per benchmark). Each SpacetimeDB benchmark requires ~8,000 WebSocket round trips per iteration (3,000 background links × 2 ops + 1,000 benchmark links × 2 ops), taking ~8 seconds of wall-clock time per iteration. With 100 samples this is ~800s (~13 min) per SpacetimeDB benchmark × 7 benchmarks = ~94 minutes for SpacetimeDB alone, plus cleanup overhead pushed the total past 6 hours.Full root cause analysis with timeline, performance calculations, and CI logs:
docs/case-studies/issue-6/README.mdChanges
.github/workflows/rust-benchmark.ymlNew
benchmark-prjob (pull_request events only,timeout-minutes: 20):BENCHMARK_LINK_COUNT=10,BACKGROUND_LINK_COUNT=30— reduces RTTs per iteration from ~8,000 to ~80--sample-size 10 --warm-up-time 1 --measurement-time 2 --nresamples 1000— fast Criterion settingsUpdated
benchmarkjob (push to main only,timeout-minutes: 180):BENCHMARK_LINK_COUNT=1000,BACKGROUND_LINK_COUNT=3000--sample-size 20 --nresamples 10000— reduces runtime from ~2h+ to ~30–45 minutes while still producing statistically meaningful resultstimeout-minutes: 180hard safety limitFixed
testjob timeout from 360 min to 30 min (tests complete in ~3 min)docs/case-studies/issue-6/README.md— Deep case study analysis:changelog.d/20260311_benchmark_timing_fix.md— Changelog entryThis PR was created automatically by the AI issue solver