benchmark: replace simulation with walltime macro benches + e2e correctness by natemoo-re · Pull Request #91 · bombshell-dev/clayterm

natemoo-re · 2026-06-07T02:35:09Z

Our CodSpeed benchmarks were frequently inaccurate: they flagged big "regressions"/"improvements" on PRs that don't touch the measured code (a test-file rename alone showed −54% on input; "dashboard layout" swung 20×). And the simulation job's runtime was wildly unpredictable.

Both problems are baked into CodSpeed's Valgrind simulation of tiny (µs) benchmarks, confirmed by their own docs, and chasing them one bench at a time is whack-a-mole.
This PR drops the simulation job entirely and moves all perf gating to WallTime macro benchmarks: bigger, realistic workloads where the real work dominates and noise averages out (stable to <1%, predictable runtime).
Replacements:
- input parsing → an event-loop correctness test + a throughput benchmark;
- render/ops → WallTime macro benches (render dashboard frames, pack complex layouts).
Also pinned the CI build (fixed compiler, no stale wasm cache) so the baseline can't silently drift.
Net: green CI actually means green — no phantom red/green on unrelated PRs.

Drives createInput as a consumer would (read chunk, scan, flush a pending ESC, dispatch) and asserts the same event stream whether input is fed whole, byte-by-byte, or split mid-sequence. Covers the lone-ESC flush path.

In-process bench over a large mixed corpus, measured in WallTime where real work dominates placement/JIT/alloc noise (the Simulation micro-benches sit on codegen cliffs). Feeds the corpus in small reads: a single scan() drains at most 256 events (the wasm event-buffer cap), so small reads keep every call under the cap and process the whole corpus.

The Simulation input micro-benches (long input burst, printable ASCII single char) move by 50-90% on unrelated changes, even a test-only rename, because their simulated cost snaps to a different value when the combined wasm shifts. Input perf is now gated by the throughput WallTime bench; correctness by the event-loop integration test.

Pin ubuntu-24.04 so the wasm toolchain is stable and drop the wasm cache so main and PRs always rebuild identically. The cache froze main's baseline on a stale build, so every PR compared a fresh build against a stale baseline and produced phantom regressions.

pkg-pr-new · 2026-06-07T02:35:40Z

Open in StackBlitz

npm i https://pkg.pr.new/clayterm@91

commit: 154a293

codspeed-hq · 2026-06-07T02:55:57Z

Merging this PR will not alter performance

✅ 5 untouched benchmarks
🆕 5 new benchmarks
⏩ 18 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
🆕	WallTime	`pack complex layout`	N/A	45 ms	N/A
🆕	WallTime	`pack large list`	N/A	59.1 ms	N/A
🆕	WallTime	`input throughput (mixed corpus, chunked read loop)`	N/A	9.6 ms	N/A
🆕	WallTime	`render mixed frames`	N/A	36.5 ms	N/A
🆕	WallTime	`render steady diff`	N/A	35.7 ms	N/A

_{Comparing ci-bench (154a293) with main (48ff4bb)}

18 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

CodSpeed's walltime tinybench plugin only populates result.latency on its async path; a sync task fn leaves it undefined and crashes (Cannot destructure 'min' of result.latency). startup.bench works because its tasks return a promise (spawnFixture). Return Promise.resolve() from the throughput task so the plugin takes the async path — a bare async fn with no await would trip deno's require-await lint. The walltime job runs startup and throughput as separate node processes.

… job CodSpeed Simulation (Valgrind) is unviable for CI here: flaky measurements (dashboard layout swung 20x, diff render 17% on changes that touch no render code) and unpredictable runtime — the same commit's simulation job finished in ~2 min one run and hung past 30 min the next. Convert render/ops to ms-scale WallTime macro benches (looped, promise-returning, ~7-11ms at <1% variance) run as separate node processes in the walltime job, and drop the simulation job entirely. mod.ts is now a local aggregator for deno task bench.

jbolda

The change to walltime makes sense. We were just discussing this aspect of codspeed on Thursday for some effection code. Might be worth just generally writing down the whys and whats for some of the decisions we made though!

jbolda · 2026-06-08T03:49:25Z

-    pack(complexOps, buf, 0);
+  .add("pack complex layout", () => {
+    for (let i = 0; i < 1500; i++) pack(complexOps, buf, 0);
+    return Promise.resolve();


Could we add a comment or perhaps a README.md in this folder explaining why this is required? I am assuming it is to force a last microtask or something? I don't see anything within the tinybench API that would point to the value of doing this.

Yes, I'll add that! It's a limitation of the @codspeed/plugin-tinybench package, which only expects the task to be async. With synchronous tasks, it throws an error like "cannot destructure 'latency' from undefined"

cowboyd · 2026-06-08T19:19:58Z

+    new Uint8Array([0x1b, 0x5b, 0x41]), // ArrowUp
+    str("\x1b[<0;35;12M"), // SGR mouse press
+    new Uint8Array([0xe4, 0xb8, 0xad]), // 中
+    str("\x1b[97;3u"), // Kitty a+Alt
+    new Uint8Array([0xf0, 0x9f, 0x8e, 0x89]), // 🎉


rather than using comments, might as well have functions:

arrowUp(),
mousePress(),

morre re-usable, descriptive and repeatable

natemoo-re added 4 commits June 6, 2026 18:34

test: add input event-loop integration test

159d617

Drives createInput as a consumer would (read chunk, scan, flush a pending ESC, dispatch) and asserts the same event stream whether input is fed whole, byte-by-byte, or split mid-sequence. Covers the lone-ESC flush path.

natemoo-re marked this pull request as draft June 7, 2026 02:35

natemoo-re force-pushed the ci-bench branch from 73e264a to 50d5d5f Compare June 7, 2026 02:50

natemoo-re force-pushed the ci-bench branch from 50d5d5f to d654f17 Compare June 7, 2026 03:02

natemoo-re force-pushed the ci-bench branch from d654f17 to 2848084 Compare June 7, 2026 03:04

natemoo-re changed the title ~~Replace cliff-prone input micro-benches with an E2E test + macro throughput bench~~ Replace flaky input benches with an E2E test + macro throughput bench Jun 7, 2026

natemoo-re force-pushed the ci-bench branch from a348022 to 2848084 Compare June 7, 2026 04:36

natemoo-re changed the title ~~Replace flaky input benches with an E2E test + macro throughput bench~~ Make benchmarks reliable: WallTime macro benches + E2E correctness (retire Simulation) Jun 7, 2026

natemoo-re marked this pull request as ready for review June 7, 2026 05:05

natemoo-re changed the title ~~Make benchmarks reliable: WallTime macro benches + E2E correctness (retire Simulation)~~ benchmark: replace simulation with walltime macro benches + e2e correctness Jun 7, 2026

natemoo-re requested review from cowboyd and rauhryan June 7, 2026 22:37

natemoo-re mentioned this pull request Jun 7, 2026

✨ Add glyph-cell text backgrounds #93

Open

12 tasks

jbolda requested changes Jun 8, 2026

View reviewed changes

cowboyd reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

benchmark: replace simulation with walltime macro benches + e2e correctness#91

benchmark: replace simulation with walltime macro benches + e2e correctness#91
natemoo-re wants to merge 6 commits into
mainfrom
ci-bench

natemoo-re commented Jun 7, 2026 •

edited

Loading

Uh oh!

pkg-pr-new Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

jbolda left a comment

Uh oh!

jbolda Jun 8, 2026

Uh oh!

natemoo-re Jun 8, 2026

Uh oh!

cowboyd Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

natemoo-re commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

jbolda left a comment

Choose a reason for hiding this comment

Uh oh!

jbolda Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

natemoo-re Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

cowboyd Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

natemoo-re commented Jun 7, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 7, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 7, 2026 •

edited

Loading