Skip to content

docs: update Spider 2.0-DBT score to 65.63% (current run)#162

Open
heemzers wants to merge 1 commit into
mainfrom
fix/benchmark-number-65-63
Open

docs: update Spider 2.0-DBT score to 65.63% (current run)#162
heemzers wants to merge 1 commit into
mainfrom
fix/benchmark-number-65-63

Conversation

@heemzers

@heemzers heemzers commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What

Updates the benchmark number in the README from the retired 51.56 figure to the current canonical result: 65.63% (42 of 64 tasks, claude-sonnet-4-6, run dbt-run9) — matching the live /benchmark page.

Two lines changed:

  • Headline badge (line 5)
  • Benchmark-suite tree comment (SOTA: …)

Why

The README, project docs, and memory all still cited 51.56 (+7.45 over JetBrains DataBao) from an earlier run while the site moved to 65.63%. Picking one canonical number.

Note for reviewer

I dropped the '+7.45 over JetBrains DataBao' delta rather than recompute it. That delta was tied to 51.56 vs DataBao's 44.11; at 65.63% it's no longer +7.45, and the current gap should be reverified against the live leaderboard before re-quoting. Add it back once verified.

🤖 Generated with Claude Code

The README cited 51.56 (+7.45 over JetBrains DataBao) from an earlier
run. The current canonical result on the live /benchmark page is 65.63%
(42 of 64 tasks, claude-sonnet-4-6, run dbt-run9). Updated the headline
and the benchmark-suite SOTA reference to match.

Dropped the '+7.45 over DataBao' delta — it was tied to the old 51.56
number and the gap should be reverified against the current leaderboard
before re-quoting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
signal-pilot Ready Ready Preview, Comment Jun 8, 2026 10:46pm
signal-pilot-docs Ready Ready Preview, Comment Jun 8, 2026 10:46pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant