docs: update Spider 2.0-DBT score to 65.63% (current run)#162
Open
heemzers wants to merge 1 commit into
Open
Conversation
The README cited 51.56 (+7.45 over JetBrains DataBao) from an earlier run. The current canonical result on the live /benchmark page is 65.63% (42 of 64 tasks, claude-sonnet-4-6, run dbt-run9). Updated the headline and the benchmark-suite SOTA reference to match. Dropped the '+7.45 over DataBao' delta — it was tied to the old 51.56 number and the gap should be reverified against the current leaderboard before re-quoting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Updates the benchmark number in the README from the retired 51.56 figure to the current canonical result: 65.63% (42 of 64 tasks,
claude-sonnet-4-6, rundbt-run9) — matching the live /benchmark page.Two lines changed:
SOTA: …)Why
The README, project docs, and memory all still cited 51.56 (+7.45 over JetBrains DataBao) from an earlier run while the site moved to 65.63%. Picking one canonical number.
Note for reviewer
I dropped the '+7.45 over JetBrains DataBao' delta rather than recompute it. That delta was tied to 51.56 vs DataBao's 44.11; at 65.63% it's no longer +7.45, and the current gap should be reverified against the live leaderboard before re-quoting. Add it back once verified.
🤖 Generated with Claude Code