diff --git a/.claude/commands/optimize.md b/.claude/commands/optimize.md new file mode 100644 index 0000000000..c6b893c323 --- /dev/null +++ b/.claude/commands/optimize.md @@ -0,0 +1,94 @@ +--- +name: optimize +description: Run a profiling-driven optimization loop for a specific function +argument-hint: " e.g. executeEVMTxWithGigaExecutor" +allowed-tools: + - Read + - Write + - Edit + - Glob + - Grep + - Bash + - Task + - AskUserQuestion +--- + +# Optimization Loop for: $ARGUMENTS + +You are running a profiling-driven optimization loop focused on the function `$ARGUMENTS`. + +## References + +Read `benchmark/CLAUDE.md` for benchmark commands, environment variables, profiling, and the full optimization loop steps. + +## Workflow + +Execute the optimization loop from benchmark/CLAUDE.md section "Optimization loop", but focused on `$ARGUMENTS`: + +### Phase 1: Understand the target function + +1. Find the function `$ARGUMENTS` in the codebase using Grep +2. Read the function and its callers/callees to understand the hot path +3. Identify what packages, types, and helpers it uses + +### Phase 2: Profile + +4. Run the benchmark: `GIGA_EXECUTOR=true GIGA_OCC=true benchmark/benchmark.sh` +5. Wait for it to complete (default DURATION=120s) + +### Phase 3: Analyze (focused on target function) + +6. Run pprof analysis focused on `$ARGUMENTS` and its call tree. Run these in parallel: + - CPU: `go tool pprof -top -cum -nodecount=40 /tmp/sei-bench/pprof/cpu.pb.gz 2>&1 | head -60` + - fgprof: `go tool pprof -top -cum -nodecount=40 /tmp/sei-bench/pprof/fgprof.pb.gz 2>&1 | head -60` + - Heap (alloc_space): `go tool pprof -alloc_space -top -cum -nodecount=40 /tmp/sei-bench/pprof/heap.pb.gz 2>&1 | head -60` + - Heap (alloc_objects): `go tool pprof -alloc_objects -top -cum -nodecount=40 /tmp/sei-bench/pprof/heap.pb.gz 2>&1 | head -60` + - Block: `go tool pprof -top -cum -nodecount=40 /tmp/sei-bench/pprof/block.pb.gz 2>&1 | head -60` + - Mutex: `go tool pprof -top -cum -nodecount=40 /tmp/sei-bench/pprof/mutex.pb.gz 2>&1 | head -60` +7. Use `go tool pprof -text -focus='$ARGUMENTS' /tmp/sei-bench/pprof/cpu.pb.gz` to get function-focused breakdown +8. Open flamegraphs on separate ports for the user to inspect: + - `go tool pprof -http=:8080 /tmp/sei-bench/pprof/cpu.pb.gz &` + - `go tool pprof -http=:8081 /tmp/sei-bench/pprof/fgprof.pb.gz &` + - `go tool pprof -http=:8082 -alloc_space /tmp/sei-bench/pprof/heap.pb.gz &` + +### Phase 4: Summarize and discuss + +9. Present findings to the user: + - TPS from the benchmark run (extract from `/tmp/sei-bench/tps.txt`) + - Where `$ARGUMENTS` and its callees spend the most time (CPU, wall-clock) + - Biggest allocation hotspots within the function's call tree + - Any contention (block/mutex) in the function's path + - Top 2-3 candidate optimizations with expected impact and trade-offs +10. Ask the user which optimization direction to pursue. Do NOT write any code until the user picks. + +### Phase 5: Implement + +11. Implement the chosen optimization +12. Run `gofmt -s -w` on all modified `.go` files +13. Commit the change + +### Phase 6: Compare + +14. Record the commit hash before and after the optimization +15. Run comparison: `benchmark/benchmark-compare.sh baseline= candidate=` +16. Open diff flamegraphs for the user: + - `go tool pprof -http=:8083 -diff_base /tmp/sei-bench/baseline/pprof/cpu.pb.gz /tmp/sei-bench/candidate/pprof/cpu.pb.gz &` + - `go tool pprof -http=:8084 -diff_base /tmp/sei-bench/baseline/pprof/fgprof.pb.gz /tmp/sei-bench/candidate/pprof/fgprof.pb.gz &` + - `go tool pprof -http=:8085 -diff_base /tmp/sei-bench/baseline/pprof/heap.pb.gz /tmp/sei-bench/candidate/pprof/heap.pb.gz &` + +### Phase 7: Validate + +17. Present results: + - TPS delta (baseline vs candidate) + - CPU diff: `go tool pprof -top -diff_base /tmp/sei-bench/baseline/pprof/cpu.pb.gz /tmp/sei-bench/candidate/pprof/cpu.pb.gz` + - Heap diff: `go tool pprof -alloc_space -top -diff_base /tmp/sei-bench/baseline/pprof/heap.pb.gz /tmp/sei-bench/candidate/pprof/heap.pb.gz` +18. Ask the user: keep, iterate, or revert? +19. If keep and user approves, ask whether to open a PR + +## Important rules + +- ALWAYS ask the user before writing any optimization code (step 10) +- ALWAYS ask the user before opening a PR (step 19) +- Cross-session benchmark numbers are NOT comparable. Only compare within the same `benchmark-compare.sh` run. +- Run `gofmt -s -w` on all modified Go files before committing +- If `$ARGUMENTS` is empty or not found, ask the user to provide the function name diff --git a/app/benchmark_profiling.go b/app/benchmark_profiling.go new file mode 100644 index 0000000000..e5fa9f3191 --- /dev/null +++ b/app/benchmark_profiling.go @@ -0,0 +1,17 @@ +//go:build benchmark + +package app + +import "runtime" + +func init() { + // Enable block profiling: record blocking events lasting 1us or longer. + // Lower values capture more events but add overhead that can skew TPS. + // This lets /debug/pprof/block show time spent waiting on channels and mutexes. + runtime.SetBlockProfileRate(1000) + + // Enable mutex contention profiling: sample 1 in 5 contention events. + // Full capture (fraction=1) adds measurable overhead; 1/5 is a good balance. + // This lets /debug/pprof/mutex show where goroutines contend on locks. + runtime.SetMutexProfileFraction(5) +} diff --git a/benchmark/CLAUDE.md b/benchmark/CLAUDE.md index 6ed1b4d992..a4236bcf58 100644 --- a/benchmark/CLAUDE.md +++ b/benchmark/CLAUDE.md @@ -3,16 +3,17 @@ ## Single scenario ```bash -GIGA_EXECUTOR=true GIGA_OCC=true DEBUG=true benchmark/benchmark.sh +GIGA_EXECUTOR=true GIGA_OCC=true benchmark/benchmark.sh ``` -TPS is logged every 5s as `tps=` (with ANSI color codes). To capture output and extract TPS: +By default, the benchmark runs for `DURATION=120` seconds, auto-captures all 6 profile types, extracts TPS stats, and exits. Profiles are saved to `/tmp/sei-bench/pprof/`, TPS data to `/tmp/sei-bench/tps.txt`, and the full log to `/tmp/sei-bench/output.log`. -```bash -LOG_FILE=/tmp/bench.log DEBUG=true benchmark/benchmark.sh +Use `DURATION=0` to run forever (manual capture, original behavior). + +TPS is logged every 5s as `tps=` (with ANSI color codes). For manual extraction: -# Extract TPS values -sed 's/\x1b\[[0-9;]*m//g' /tmp/bench.log | sed -n 's/.*tps=\([0-9.]*\).*/\1/p' +```bash +sed 's/\x1b\[[0-9;]*m//g' /tmp/sei-bench/output.log | sed -n 's/.*tps=\([0-9.]*\).*/\1/p' ``` Available scenarios in `benchmark/scenarios/`: `evm.json` (default), `erc20.json`, `mixed.json`, `default.json`. @@ -29,9 +30,9 @@ BENCHMARK_CONFIG=benchmark/scenarios/erc20.json benchmark/benchmark.sh | Var | Default | Purpose | |-----|---------|---------| | `BENCHMARK_PHASE` | `all` | `init` (build+init+configure), `start` (run node), `all` (both) | -| `SEI_HOME` | `$HOME/.sei` | Final chain data dir. If != ~/.sei, init in ~/.sei then `mv` | -| `PORT_OFFSET` | `0` | Added to all ports (RPC, P2P, pprof, gRPC, etc.) | -| `SEID_BIN` | `""` | Pre-built binary path. If set, skip build + copy to ~/go/bin/seid | +| `SEI_HOME` | `$HOME/.sei` (or `$HOME/.sei-bench-` when auto-claimed) | Final chain data dir. Init uses a temp staging dir, then moves here | +| `PORT_OFFSET` | auto-claimed | Added to all ports (RPC, P2P, pprof, gRPC, etc.). Auto-claimed via atomic `mkdir` slots when not set, same mechanism as benchmark-compare.sh | +| `SEID_BIN` | `""` | Pre-built binary path. If set, skip build step | | `LOG_FILE` | `""` | Redirect seid output to file | | `BENCHMARK_CONFIG` | `$SCRIPT_DIR/scenarios/evm.json` | Scenario config file (absolute path resolved from script location) | | `BENCHMARK_TXS_PER_BATCH` | `1000` | Transactions per batch | @@ -41,6 +42,7 @@ BENCHMARK_CONFIG=benchmark/scenarios/erc20.json benchmark/benchmark.sh | `MOCK_BALANCES` | `true` | Use mock balances during benchmark | | `DISABLE_INDEXER` | `true` | Disable indexer for benchmark (reduces I/O overhead) | | `DEBUG` | `false` | Print all log output without filtering | +| `DURATION` | `120` | Auto-stop after N seconds (0 = run forever) | ### benchmark-compare.sh @@ -48,10 +50,12 @@ Inherits all benchmark.sh vars via delegation. Additionally: | Var | Default | Purpose | |-----|---------|---------| -| `DURATION` | `600` | How long (seconds) to run each node before stopping | +| `DURATION` | `120` | How long (seconds) to run each node before stopping | | `GIGA_EXECUTOR` | **`true`** | Overrides benchmark.sh default (false) | | `GIGA_OCC` | **`true`** | Overrides benchmark.sh default (false) | | `DB_BACKEND` | `goleveldb` | Forwarded to build and init phases | +| `RUN_ID` | `$$` (PID) | Namespaces `BASE_DIR` as `/tmp/sei-bench-${RUN_ID}/` | +| `RUN_PORT_OFFSET` | auto-claimed | Added to all per-scenario port offsets (auto-claimed via atomic `mkdir` slots) | **Note:** `GIGA_EXECUTOR` and `GIGA_OCC` default to `true` in the compare script but `false` in benchmark.sh. The compare script is designed for performance comparison where Giga Executor is typically enabled. @@ -66,15 +70,79 @@ benchmark/benchmark-compare.sh \ lazy-cms-fix=37a17fd02 ``` -Each scenario gets its own binary, home dir, and port set (offset by 100). Results are printed at the end with median/avg/min/max TPS. Raw data in `/tmp/sei-bench/