feat: agent discoverability for World SDK docs by Ralph-20 · Pull Request #1457 · vercel/workflow

Ralph-20 · 2026-03-19T23:09:17Z

Summary

Adds keywords frontmatter, file path comments, and World SDK method signatures to all 8 World SDK doc pages + the workflow skill to improve agent discoverability.

Linear: DSE-2337 (child of DSE-2334) | Depends on: PR #1456 (World SDK docs breakout)

What Changed

File	Change
`source.config.ts`	Added `keywords` to frontmatter schema
8 `world/*.mdx` files	Added `keywords` frontmatter with method names + search terms
8 `world/*.mdx` files	Added file path comments to complete code examples
`skills/workflow/SKILL.md`	Rewrote Observability section with World SDK method signatures, resolveData guidance, name parsing return types, event types table, devalue format clarification

Skill Improvements (post-eval gap analysis)

Added exact World SDK method signatures with pagination nesting warning
Added resolveData parameter guidance ('none' for polling, 'all' for inspection)
Added runs.cancel() method
Added name parsing return type warning (returns null — always use optional chaining)
Added event types taxonomy table
Added devalue format clarification (without hydration, I/O is opaque arrays)

Eval Results

Two evals confirmed the skill improvements work. Full writeup: eval/skill-e2e-findings.md

Task breadth (with/without skill, 8 binary checks)

Check	Without Skill	With Skill
Valid workflow	✅	✅
Correct output	✅	✅
Hydration	❌	✅
Correct imports	❌	✅
Pagination shape	❌	✅
Display names	❌	✅
Step timing	✅	✅
Step I/O values	❌	✅

3/8 → 8/8 (+5 delta) — without-skill agent misses hydration, imports, pagination nesting, name parsers, and I/O validation entirely.

Code quality (3-condition deep eval)

Condition	Score	%	Notes
New skill (this PR)	25/28	89%	Correct `getWorld` from `workflow/runtime`, `hydrateResourceIO`, `parseStepName`, proper pagination
Old skill (main)	9/28	32%	No World SDK docs — invented raw HTTP workarounds
No skill (baseline)	9/28	32%	Same as old skill — old skill had zero observability coverage

Key finding: Old skill = no skill for World SDK tasks. The new skill's Observability section is what makes the difference.

Deep eval methodology

Each agent got the same task: "Build a 3-step workflow + an API route using the World SDK to list runs with pagination, get steps with hydrated I/O, calculate step duration, and parse display names."

Scored on 6 criteria (0-2 each):

Import correctness — workflow/runtime for getWorld, workflow/observability for hydration
API completeness — world.runs.list(), world.steps.get(), hydrateResourceIO()
Pattern quality — cursor pagination nesting, resolveData optimization, name parsing
Code correctness — would typecheck against real types
Observability coverage — step duration, hydrated I/O, display name parsing
File structure — correct file placement guided by doc file path comments

Where new skill won decisively:

Only new-skill agent used getWorld() from workflow/runtime (others used raw HTTP or local helpers)
Only new-skill agent used hydrateResourceIO() + observabilityRevivers (others wrote naive JSON.parse fallbacks)
Only new-skill agent used FatalError + RetryableError with retryAfter
New-skill agent placed files at app/api/ (matching doc file path comments) vs others using src/app/

Test Results

Docs typecheck: 302 passed, 23 failed (all pre-existing — zero new failures)
Grep dry run: All 5 skill grep hints return hits across world/ pages

Manual TODOs

Submit to Context7
Submit to llms.txt Hub
Remove @expect-error comments from observability.mdx after PR feat: re-export parseName + hydrators for observability DX #1453 merges

…rability - Add `keywords` field to Fumadocs schema (source.config.ts) - Add keywords to all 8 world/ MDX page frontmatter (method names, search terms) - Add realistic file path comments to complete code examples (e.g., `// app/api/workflow-runs/route.ts`) - Both changes improve agent greppability and code context Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

changeset-bot · 2026-03-19T23:09:22Z

⚠️ No Changeset found

Latest commit: 14f0897

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

vercel · 2026-03-19T23:09:24Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
example-nextjs-workflow-turbopack	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
example-nextjs-workflow-webpack	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
example-workflow	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workbench-astro-workflow	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workbench-express-workflow	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workbench-fastify-workflow	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workbench-hono-workflow	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workbench-nitro-workflow	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workbench-nuxt-workflow	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workbench-sveltekit-workflow	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workbench-vite-workflow	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workflow-docs	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workflow-nest	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm
workflow-swc-playground	Ready	Preview, Comment, Open in v0	Mar 20, 2026 3:58pm

github-actions · 2026-03-19T23:09:28Z

🧪 E2E Test Results

❌ Some tests failed

Summary

	Passed	Failed	Skipped	Total
❌ ▲ Vercel Production	757	1	67	825
✅ 💻 Local Development	782	0	118	900
✅ 📦 Local Production	782	0	118	900
✅ 🐘 Local Postgres	782	0	118	900
✅ 🪟 Windows	72	0	3	75
❌ 🌍 Community Worlds	118	56	15	189
✅ 📋 Other	198	0	27	225
Total	3491	57	466	4014

❌ Failed Tests

▲ Vercel Production (1 failed)

nitro (1 failed):

outputStreamWorkflow | wrun_01KM5ZHRK33F7JDG1C4AK1MH8S | 🔍 observability

🌍 Community Worlds (56 failed)

mongodb (3 failed):

hookWorkflow is not resumable via public webhook endpoint | wrun_01KM5ZGQ0F64XZ2FKRVV79MNSY
webhookWorkflow | wrun_01KM5ZGZ5NY5BHH9FGVMENZAXV
concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KM5ZNWCJ0AEZK9P2J3VXBDVE

redis (2 failed):

hookWorkflow is not resumable via public webhook endpoint | wrun_01KM5ZGQ0F64XZ2FKRVV79MNSY
concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KM5ZNWCJ0AEZK9P2J3VXBDVE

turso (51 failed):

addTenWorkflow | wrun_01KM5ZFKXKEF82XJD2SH3RRHTJ
addTenWorkflow | wrun_01KM5ZFKXKEF82XJD2SH3RRHTJ
wellKnownAgentWorkflow (.well-known/agent) | wrun_01KM5ZGXKWRF81FSSZEJH78QRZ
should work with react rendering in step
promiseAllWorkflow | wrun_01KM5ZFSNVEMQ2DAHHG4VSJ3DW
promiseRaceWorkflow | wrun_01KM5ZG05GYDGPRBY2DDT7JDE2
promiseAnyWorkflow | wrun_01KM5ZG1Y8V0M4GBZMSBFPBNYR
importedStepOnlyWorkflow | wrun_01KM5ZHAW1WENXZAQGMQY2WSZR
hookWorkflow | wrun_01KM5ZGDNH15BJGN61JXVPAV5Q
hookWorkflow is not resumable via public webhook endpoint | wrun_01KM5ZGQ0F64XZ2FKRVV79MNSY
webhookWorkflow | wrun_01KM5ZGZ5NY5BHH9FGVMENZAXV
sleepingWorkflow | wrun_01KM5ZH4GGQKHSZZK88Z8KPTA1
parallelSleepWorkflow | wrun_01KM5ZHGYKWZKTDT9A8A8W85RC
nullByteWorkflow | wrun_01KM5ZHMWWA2PRE8N14BG95WSR
workflowAndStepMetadataWorkflow | wrun_01KM5ZHPS33M0NNHWWP0CRJ2SB
fetchWorkflow | wrun_01KM5ZJJ9SYFA65M82HPVHTK0W
promiseRaceStressTestWorkflow | wrun_01KM5ZJN6J5SAA18X1DPEK75A4
error handling error propagation workflow errors nested function calls preserve message and stack trace
error handling error propagation workflow errors cross-file imports preserve message and stack trace
error handling error propagation step errors basic step error preserves message and stack trace
error handling error propagation step errors cross-file step error preserves message and function names in stack
error handling retry behavior regular Error retries until success
error handling retry behavior FatalError fails immediately without retries
error handling retry behavior RetryableError respects custom retryAfter delay
error handling retry behavior maxRetries=0 disables retries
error handling catchability FatalError can be caught and detected with FatalError.is()
hookCleanupTestWorkflow - hook token reuse after workflow completion | wrun_01KM5ZN9V0PV15V9V2MMXNBTM5
concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KM5ZNWCJ0AEZK9P2J3VXBDVE
hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running | wrun_01KM5ZPFCRS2V6Y5BFG68AESF0
stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars) | wrun_01KM5ZQ27FC8J24RFF3WDSG4JQ
stepFunctionWithClosureWorkflow - step function with closure variables passed as argument | wrun_01KM5ZQ9TB3J2H0TPWEKJD5A5T
closureVariableWorkflow - nested step functions with closure variables | wrun_01KM5ZQEHHR68Z85160H1T73GD
spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step | wrun_01KM5ZQGA4N3W9FX3QRWVMSBHK
health check (queue-based) - workflow and step endpoints respond to health check messages
pathsAliasWorkflow - TypeScript path aliases resolve correctly | wrun_01KM5ZQY0M008AV1SSMWBF117G
Calculator.calculate - static workflow method using static step methods from another class | wrun_01KM5ZR2JYXVW0RSGE09QEEZPZ
AllInOneService.processNumber - static workflow method using sibling static step methods | wrun_01KM5ZR7NPGAB2SRC32RZESNVG
ChainableService.processWithThis - static step methods using this to reference the class | wrun_01KM5ZRDB6YFCXHFRNXT65FBZ3
thisSerializationWorkflow - step function invoked with .call() and .apply() | wrun_01KM5ZRK58TKXJV3H4DQY177JR
customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE | wrun_01KM5ZRRTHJPCAYPX2M7CYSDFR
instanceMethodStepWorkflow - instance methods with "use step" directive | wrun_01KM5ZRZR7H2W09NJ2ZDRNTGXY
crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context | wrun_01KM5ZS95QK9GY8CWB5XC2TR1E
stepFunctionAsStartArgWorkflow - step function reference passed as start() argument | wrun_01KM5ZSG1M8ZQ915B35FKBJC09
cancelRun - cancelling a running workflow | wrun_01KM5ZSNRP44D87EQ0G5AQBH4S
cancelRun via CLI - cancelling a running workflow | wrun_01KM5ZSY7CDM34DWMH0DD3CKPS
pages router addTenWorkflow via pages router
pages router promiseAllWorkflow via pages router
pages router sleepingWorkflow via pages router
hookWithSleepWorkflow - hook payloads delivered correctly with concurrent sleep | wrun_01KM5ZT98HMAZ8P7TYSN21CMA6
sleepInLoopWorkflow - sleep inside loop with steps actually delays each iteration | wrun_01KM5ZTVB3G1BJAKNACY3SC4R0
sleepWithSequentialStepsWorkflow - sequential steps work with concurrent sleep (control) | wrun_01KM5ZV5XFQV9DSB9KFACACCZ7

Details by Category

❌ ▲ Vercel Production

App	Passed	Failed	Skipped
✅ astro	68	0	7
✅ example	68	0	7
✅ express	68	0	7
✅ fastify	68	0	7
✅ hono	68	0	7
✅ nextjs-turbopack	73	0	2
✅ nextjs-webpack	73	0	2
❌ nitro	67	1	7
✅ nuxt	68	0	7
✅ sveltekit	68	0	7
✅ vite	68	0	7

✅ 💻 Local Development

App	Passed	Skipped
✅ astro-stable	66	9
✅ express-stable	66	9
✅ fastify-stable	66	9
✅ hono-stable	66	9
✅ nextjs-turbopack-canary	55	20
✅ nextjs-turbopack-stable	72	3
✅ nextjs-webpack-canary	55	20
✅ nextjs-webpack-stable	72	3
✅ nitro-stable	66	9
✅ nuxt-stable	66	9
✅ sveltekit-stable	66	9
✅ vite-stable	66	9

✅ 📦 Local Production

App	Passed	Skipped
✅ astro-stable	66	9
✅ express-stable	66	9
✅ fastify-stable	66	9
✅ hono-stable	66	9
✅ nextjs-turbopack-canary	55	20
✅ nextjs-turbopack-stable	72	3
✅ nextjs-webpack-canary	55	20
✅ nextjs-webpack-stable	72	3
✅ nitro-stable	66	9
✅ nuxt-stable	66	9
✅ sveltekit-stable	66	9
✅ vite-stable	66	9

✅ 🐘 Local Postgres

App	Passed	Skipped
✅ astro-stable	66	9
✅ express-stable	66	9
✅ fastify-stable	66	9
✅ hono-stable	66	9
✅ nextjs-turbopack-canary	55	20
✅ nextjs-turbopack-stable	72	3
✅ nextjs-webpack-canary	55	20
✅ nextjs-webpack-stable	72	3
✅ nitro-stable	66	9
✅ nuxt-stable	66	9
✅ sveltekit-stable	66	9
✅ vite-stable	66	9

✅ 🪟 Windows

App	Passed	Failed	Skipped
✅ nextjs-turbopack	72	0	3

❌ 🌍 Community Worlds

App	Passed	Failed	Skipped
✅ mongodb-dev	3	0	2
❌ mongodb	52	3	3
✅ redis-dev	3	0	2
❌ redis	53	2	3
✅ turso-dev	3	0	2
❌ turso	4	51	3

✅ 📋 Other

App	Passed	Skipped
✅ e2e-local-dev-nest-stable	66	9
✅ e2e-local-postgres-nest-stable	66	9
✅ e2e-local-prod-nest-stable	66	9

📋 View full workflow run

❌ Some E2E test jobs failed:

Vercel Prod: failure
Local Dev: success
Local Prod: success
Local Postgres: success
Windows: success

Check the workflow run for details.

post-eval improvement — gap analysis found: - pagination nesting ({ pagination: { cursor } } not { cursor }) - resolveData param guidance ('none' for polling, 'all' for inspection) - runs.cancel() method missing - name parsing returns null (needs optional chaining) - event types taxonomy - devalue format clarification (opaque arrays without hydration) eval: new skill 89% vs old skill 32% vs no skill 32% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ralph-20 · 2026-03-20T15:08:36Z

E2E Eval Update: Devalue Hydration Finding

Finding

world.steps.list({ resolveData: 'all' }) returns devalue-serialized byte arrays for input/output, not deserialized JS values. Must call hydrateResourceIO(step, revivers) from workflow/observability to get human-readable I/O.

Raw (from resolveData: 'all' without hydration):

input: {"0":100,"1":101,"2":118,"3":108,...}  ← Uint8Array as JSON (devalue wire format)

Hydrated (after hydrateResourceIO()):

input: {"args":[123, 2]}  ← actual function arguments
output: 125               ← actual return value

Changes

Commit	What
`18c1f5e`	SKILL.md: clarified `resolveData` controls data inclusion not hydration, fixed raw format description from `[["Object",...]]` to actual Uint8Array-like objects, added "common mistake" callout
`5167201`	skill-eval.test.ts: calls `hydrateResourceIO()` on all World SDK step data, captures both raw + hydrated I/O in findings

Eval Results

3/3 tests pass, all 18 checks ✅ — hydration works correctly for:

addTenWorkflow: add(123, 2) → 125 → 128 → 133
promiseAllWorkflow: parallel randomDelay("a","b","c") → "A","B","C"
errorRetryFatal: FatalError with full stack in hydrated error field

Full findings: eval/skill-e2e-findings.md

TODO: Verify skill wording is clear enough that an agent using it produces correct hydration code on first try (manual check against next eval iteration)

…nalysis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Static analysis + runtime eval proving SKILL.md materially helps agents with observability patterns: hydration, imports, pagination, name parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ralph-20 · 2026-03-20T15:49:48Z

With/Without Skill Comparison Eval

Result: 3/8 → 8/8 (+5 delta)

Ran addTenWorkflow(123) against the dev server, then scored each condition's inspection code (static analysis of imports, API patterns, hydration) + runtime verification.

Check	Without Skill	With Skill
Valid workflow	✅	✅
Correct output	✅	✅
Hydration	❌	✅
Correct imports	❌	✅
Pagination shape	❌	✅
Display names	❌	✅
Step timing	✅	✅
Step I/O values	❌	✅

Key failures without skill

No hydration — reads step.input directly (devalue bytes), assumes resolveData:'all' = usable data
Wrong imports — only knows workflow/api + workflow/runtime, doesn't know workflow/observability exists
Flat pagination — uses { cursor } not { pagination: { cursor } }
Manual name parsing — splits on "//" instead of using parseWorkflowName()/parseStepName()
No I/O validation — checks !== undefined (truthy on byte objects) instead of verifying actual values

Files

eval/run-comparison.ts — runner script
eval/without_skill/ — agent-generated code (no skill)
eval/with_skill/ — agent-generated code (with skill)
eval/skill-comparison-findings.md — full findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ralph-20 requested a review from a team as a code owner March 19, 2026 23:09

vercel bot deployed to Preview – workflow-swc-playground March 19, 2026 23:09 View deployment

vercel bot deployed to Preview – workbench-express-workflow March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – workbench-astro-workflow March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – workbench-hono-workflow March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – workbench-fastify-workflow March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – workbench-vite-workflow March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – workbench-nitro-workflow March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – workbench-sveltekit-workflow March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – workflow-nest March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – workbench-nuxt-workflow March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – example-workflow March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – example-nextjs-workflow-webpack March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – example-nextjs-workflow-turbopack March 19, 2026 23:10 View deployment

vercel bot deployed to Preview – workflow-docs March 19, 2026 23:11 View deployment

vercel bot deployed to Preview – workflow-swc-playground March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workbench-sveltekit-workflow March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workbench-express-workflow March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workbench-astro-workflow March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workbench-fastify-workflow March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workbench-hono-workflow March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workbench-nitro-workflow March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workbench-vite-workflow March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workbench-nuxt-workflow March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workflow-nest March 19, 2026 23:26 View deployment

vercel bot deployed to Preview – workbench-nitro-workflow March 20, 2026 14:52 View deployment

vercel bot deployed to Preview – workbench-astro-workflow March 20, 2026 14:52 View deployment

vercel bot deployed to Preview – workbench-fastify-workflow March 20, 2026 14:52 View deployment

vercel bot deployed to Preview – workbench-express-workflow March 20, 2026 14:52 View deployment

vercel bot deployed to Preview – example-workflow March 20, 2026 14:52 View deployment

vercel bot deployed to Preview – workbench-nuxt-workflow March 20, 2026 14:52 View deployment

vercel bot deployed to Preview – workflow-nest March 20, 2026 14:52 View deployment

vercel bot deployed to Preview – workflow-docs March 20, 2026 14:52 View deployment

vercel bot deployed to Preview – example-nextjs-workflow-turbopack March 20, 2026 14:52 View deployment

vercel bot deployed to Preview – example-nextjs-workflow-webpack March 20, 2026 14:52 View deployment

Ralph-20 and others added 2 commits March 20, 2026 10:08

skill: clarify devalue wire format, make hydration mandatory pattern

18c1f5e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eval: add hydrateResourceIO, show raw vs hydrated I/O in findings

5167201

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel bot deployed to Preview – workflow-swc-playground March 20, 2026 15:09 View deployment

vercel bot deployed to Preview – workbench-hono-workflow March 20, 2026 15:09 View deployment

vercel bot deployed to Preview – workbench-sveltekit-workflow March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – workflow-nest March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – workbench-express-workflow March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – workbench-vite-workflow March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – workbench-nitro-workflow March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – workbench-astro-workflow March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – workbench-fastify-workflow March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – workbench-nuxt-workflow March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – example-workflow March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – example-nextjs-workflow-turbopack March 20, 2026 15:10 View deployment

vercel bot deployed to Preview – example-nextjs-workflow-webpack March 20, 2026 15:10 View deployment

Ralph-20 and others added 2 commits March 20, 2026 10:16

eval: rewrite findings as human-readable doc w/ context + hydration a…

272dc4b

…nalysis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eval: with/without skill comparison — 3/8 vs 8/8 (+5 delta)

1de11d4

Static analysis + runtime eval proving SKILL.md materially helps agents with observability patterns: hydration, imports, pagination, name parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eval: merge comparison findings into single eval doc

14f0897

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: agent discoverability for World SDK docs#1457

feat: agent discoverability for World SDK docs#1457
Ralph-20 wants to merge 9 commits intolucas/dse-2334-world-docsfrom
lucas/dse-2337-agent-discoverability

Ralph-20 commented Mar 19, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

Ralph-20 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Ralph-20 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ralph-20 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Skill Improvements (post-eval gap analysis)

Eval Results

Task breadth (with/without skill, 8 binary checks)

Code quality (3-condition deep eval)

Test Results

Manual TODOs

Uh oh!

changeset-bot bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

vercel bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 E2E Test Results

Summary

❌ Failed Tests

Details by Category

Uh oh!

Ralph-20 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Eval Update: Devalue Hydration Finding

Finding

Changes

Eval Results

Uh oh!

Ralph-20 commented Mar 20, 2026

With/Without Skill Comparison Eval

Key failures without skill

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ralph-20 commented Mar 19, 2026 •

edited

Loading

changeset-bot bot commented Mar 19, 2026 •

edited

Loading

vercel bot commented Mar 19, 2026 •

edited

Loading

github-actions bot commented Mar 19, 2026 •

edited

Loading

Ralph-20 commented Mar 20, 2026 •

edited

Loading