From 070f88434f0c5e06d20062c13d68a81958ff743f Mon Sep 17 00:00:00 2001 From: pdrobnjak Date: Fri, 13 Feb 2026 14:16:56 +0100 Subject: [PATCH 1/9] feat(benchmark): add fgprof, block/mutex profiling and improve profile docs Add wall-clock profiling (fgprof) alongside standard CPU profiling to capture off-CPU time (I/O, blocking, GC pauses). Register the fgprof handler behind the benchmark build tag so production binaries are unaffected. Enable block and mutex contention profiling via runtime calls, also gated behind the benchmark build tag. Use conservative sampling rates (1us block threshold, 1/5 mutex fraction) to minimize overhead on TPS. Update benchmark-compare.sh to capture all 6 profile types (CPU, fgprof, heap, goroutine, block, mutex) and report sizes for each. Expand benchmark/CLAUDE.md with: - Profile type reference table with when-to-use guidance - CPU vs fgprof explanation - Heap metric selection guide (inuse_space vs alloc_objects etc) - Interactive flamegraph and drill-down commands - Single-scenario manual capture examples - Source-mapping tip for pprof Co-Authored-By: Claude Opus 4.6 --- app/benchmark_profiling.go | 17 +++++ benchmark/CLAUDE.md | 89 +++++++++++++++++++++++++- benchmark/benchmark-compare.sh | 50 +++++++++++---- go.mod | 3 +- go.sum | 17 ++++- go.work.sum | 74 +++++++++++++++++++++ sei-tendermint/node/setup_benchmark.go | 16 +++++ 7 files changed, 249 insertions(+), 17 deletions(-) create mode 100644 app/benchmark_profiling.go create mode 100644 sei-tendermint/node/setup_benchmark.go diff --git a/app/benchmark_profiling.go b/app/benchmark_profiling.go new file mode 100644 index 0000000000..e5fa9f3191 --- /dev/null +++ b/app/benchmark_profiling.go @@ -0,0 +1,17 @@ +//go:build benchmark + +package app + +import "runtime" + +func init() { + // Enable block profiling: record blocking events lasting 1us or longer. + // Lower values capture more events but add overhead that can skew TPS. + // This lets /debug/pprof/block show time spent waiting on channels and mutexes. + runtime.SetBlockProfileRate(1000) + + // Enable mutex contention profiling: sample 1 in 5 contention events. + // Full capture (fraction=1) adds measurable overhead; 1/5 is a good balance. + // This lets /debug/pprof/mutex show where goroutines contend on locks. + runtime.SetMutexProfileFraction(5) +} diff --git a/benchmark/CLAUDE.md b/benchmark/CLAUDE.md index 6ed1b4d992..1a06b8e371 100644 --- a/benchmark/CLAUDE.md +++ b/benchmark/CLAUDE.md @@ -72,9 +72,52 @@ Each scenario gets its own binary, home dir, and port set (offset by 100). Resul **Important:** Cross-session benchmark numbers (TPS, total allocs) are not directly comparable. Only comparisons within the same `benchmark-compare.sh` run are valid, since all scenarios share identical conditions. -## Comparing pprof profiles +## Profiling -`benchmark-compare.sh` automatically captures pprof profiles (CPU and heap) midway through the run. Single-scenario runs only enable the pprof HTTP endpoint — capture profiles manually with `curl` or `go tool pprof`. +### Available profile types + +`benchmark-compare.sh` automatically captures all profile types midway through the run. Profiles are saved to `/tmp/sei-bench/