Skip to content

Add benchmark framework and benchmarks#149

Open
jonbodner-buf wants to merge 15 commits into
mainfrom
jbodner/add-benchmarks
Open

Add benchmark framework and benchmarks#149
jonbodner-buf wants to merge 15 commits into
mainfrom
jbodner/add-benchmarks

Conversation

@jonbodner-buf
Copy link
Copy Markdown

@jonbodner-buf jonbodner-buf commented May 14, 2026

Mirrors protovalidate-go's validator_bench_test.go in a new private packages/protovalidate-bench workspace so runtime cost can be tracked across changes and compared cross-language.

Uses tinybench, hand-built deterministic fixtures, and writes JSON results to .tmp/bench/.

Adds a checkbench script to diff two runs with a noise-aware regression threshold and non-zero exit on regression, suitable for gating PRs.

Mirrors protovalidate-go's validator_bench_test.go in a new private
packages/protovalidate-bench workspace so runtime cost can be tracked
across changes and compared cross-language. Uses tinybench, hand-built
deterministic fixtures, and writes JSON results to .tmp/bench/.
Adds a checkbench script to diff two runs with a noise-aware regression
threshold and non-zero exit on regression, suitable for gating PRs.
Eleven near-identical .bench.ts files have collapsed to four: cases.ts
lists every (name, schema, fixture) triple in one place, validate.bench.ts
iterates it for the per-case validate-time benches, and compile.bench.ts
plus standard-schema.bench.ts look up curated subsets by name.

Adding a benchmark is now a one-row append to cases.ts plus a fixture in
fixtures.ts instead of new-file + import + register call in bench.ts.

Bench output is byte-identical: same 17 tasks, same names, same ordering,
deltas within the noise floor.
@jonbodner-buf jonbodner-buf changed the title Jbodner/add benchmarks Add benchmark framework and benchmarks May 14, 2026
@jonbodner-buf jonbodner-buf requested a review from timostamm May 14, 2026 14:39
@jonbodner-buf jonbodner-buf requested review from ajeetdsouza and removed request for timostamm May 14, 2026 21:03
@jonbodner-buf
Copy link
Copy Markdown
Author

After this PR is approved there are a series of additional PRs that implement the protovalidate native rule support for ES. Each PR builds on the previous ones.

@jonbodner-buf jonbodner-buf requested a review from ejowers May 22, 2026 15:28
Comment thread packages/protovalidate-bench/README.md Outdated

The shortcuts `latest` and `previous` resolve to the newest and second-newest
JSON files in `.tmp/bench/` (by mtime). Calling with only one argument
defaults the baseline to `previous`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling with only one argument defaults the baseline to previous.

latest and previous would collide with filenames - is there any use case for keyword processing here? I think we should just define the behavior for 0 arguments and 1 argument, and avoid the keywords altogether - that would also make this documentation clearer.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the logic to the following:

  • no arguments: use the two most recent files (older is baseline, newer is current)
  • 1 argument: use the named file as the baseline and the latest file as current
  • 2 arguments: use the named files, first is baseline, second is current

const BENCH_DIR = ".tmp/bench";
const DEFAULT_THRESHOLD = 5;

function parseArgs(argv) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably use parseArgs from node:util here. yargs and commander are also in the dependency tree.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the node:util parseArgs now. there's also validation of directory and files existing.

@timostamm timostamm self-requested a review May 26, 2026 09:40
Comment thread packages/protovalidate-bench/biome.json Outdated
Comment thread packages/protovalidate-bench/scripts/checkbench.js Outdated
"generate": "buf generate",
"postgenerate": "license-header src/gen",
"bench": "tsx src/bench.ts",
"checkbench": "node scripts/checkbench.js",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about promoting checkbench.js to be a sibling to bench.ts?

We do have some vanilla JS scripts in various repositories, but not by choice. Limitations of the repository setup make it difficult to use TS. What we do in those cases is to add typedef annotations (example), which gives some IDE support.

In this case however, the package is purely internal, doesn't have build artifacts, and we can easily use TS.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can convert it to typescript and move it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converted.

"dependencies": {
"@bufbuild/protobuf": "^2.11.0",
"@bufbuild/protovalidate": "^1.2.0",
"tinybench": "^3.1.1"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tinybench is a solid choice 👍

But I recently stumbled upon mitata. It seems to have some really nice features like GC control, minimal overhead, and hardware counters. This could be very useful for incremental performance improvements in cel-es and protobuf-es. Do you think it would be worth taking a look into it here? The features might not be immediately useful for the native rules implementation, but it seems smart to use the same benchmarking tooling across the board.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-implemented with mitata. I have seen wild swings on the gc/heap numbers, even with the --expose-gc flag set and the .gc('inner') method call added to the mitata benchmarks. On the plus side, the native rules will probably make this better.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce the swings.

They go away when .gc("inner") is removed. This is a sharp edge with mitata and our allocation-heavy benchmarks: gc("inner") forces a GC before every sample, but charges it against the per-bench CPU time budget, and the alloc-heavy benches end up with too few samples. There are no knobs in the simple API.

There is also a separate issue: mitata doesn't support process isolation. So the first benchmarks will warm up the JIT, and the following benchmarks hitting the same code paths benefit from it. This means changing the order of benchmarks will change results.

I'm not sure where to go from here, but permanently porting the Go benchmark suite over with similar ergonomics doesn't seem feasible to me without investing significant time. Maybe a very simple script measuring wall-clock time is good enough for confirming the effectiveness of the native rules?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's effectively what tinybench was doing. Should I revert back to it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark is now updated to run multiple times and to customize gc behavior depending on the test type (compile tests use gc('inner') others don't). The numbers should be more stable now.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it fixes most of the issues. If we want to cut the noise completely, could we run separate runs for the timing and heap measurements? With already running it 5x, that might make it too time intensive...the cpu time could be lower for the heap runs though.

…. Update documentation to reflect its current arguments.

Signed-off-by: Jon Bodner <jbodner@buf.build>
Signed-off-by: Jon Bodner <jbodner@buf.build>
Signed-off-by: Jon Bodner <jbodner@buf.build>
Signed-off-by: Jon Bodner <jbodner@buf.build>
Signed-off-by: Jon Bodner <jbodner@buf.build>
… gc settings for different tests.

Signed-off-by: Jon Bodner <jbodner@buf.build>
Signed-off-by: Jon Bodner <jbodner@buf.build>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants