docs: update Spider 2.0-DBT score to 65.63% (current run) by heemzers · Pull Request #162 · SignalPilot-Labs/SignalPilot

heemzers · 2026-06-08T22:44:03Z

What

Updates the benchmark number in the README from the retired 51.56 figure to the current canonical result: 65.63% (42 of 64 tasks, claude-sonnet-4-6, run dbt-run9) — matching the live /benchmark page.

Two lines changed:

Headline badge (line 5)
Benchmark-suite tree comment (SOTA: …)

Why

The README, project docs, and memory all still cited 51.56 (+7.45 over JetBrains DataBao) from an earlier run while the site moved to 65.63%. Picking one canonical number.

Note for reviewer

I dropped the '+7.45 over JetBrains DataBao' delta rather than recompute it. That delta was tied to 51.56 vs DataBao's 44.11; at 65.63% it's no longer +7.45, and the current gap should be reverified against the live leaderboard before re-quoting. Add it back once verified.

🤖 Generated with Claude Code

The README cited 51.56 (+7.45 over JetBrains DataBao) from an earlier run. The current canonical result on the live /benchmark page is 65.63% (42 of 64 tasks, claude-sonnet-4-6, run dbt-run9). Updated the headline and the benchmark-suite SOTA reference to match. Dropped the '+7.45 over DataBao' delta — it was tied to the old 51.56 number and the gap should be reverified against the current leaderboard before re-quoting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-08T22:44:07Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
signal-pilot	Ready	Preview, Comment	Jun 8, 2026 10:46pm
signal-pilot-docs	Ready	Preview, Comment	Jun 8, 2026 10:46pm

vercel Bot deployed to Preview – signal-pilot June 8, 2026 22:45 View deployment

vercel Bot deployed to Preview – signal-pilot-docs June 8, 2026 22:46 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update Spider 2.0-DBT score to 65.63% (current run)#162

docs: update Spider 2.0-DBT score to 65.63% (current run)#162
heemzers wants to merge 1 commit into
mainfrom
fix/benchmark-number-65-63

heemzers commented Jun 8, 2026

Uh oh!

vercel Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heemzers commented Jun 8, 2026

What

Why

Note for reviewer

Uh oh!

vercel Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 8, 2026 •

edited

Loading