Skip to content

feat: agent discoverability for World SDK docs#1457

Open
Ralph-20 wants to merge 9 commits intolucas/dse-2334-world-docsfrom
lucas/dse-2337-agent-discoverability
Open

feat: agent discoverability for World SDK docs#1457
Ralph-20 wants to merge 9 commits intolucas/dse-2334-world-docsfrom
lucas/dse-2337-agent-discoverability

Conversation

@Ralph-20
Copy link

@Ralph-20 Ralph-20 commented Mar 19, 2026

Summary

Adds keywords frontmatter, file path comments, and World SDK method signatures to all 8 World SDK doc pages + the workflow skill to improve agent discoverability.

Linear: DSE-2337 (child of DSE-2334) | Depends on: PR #1456 (World SDK docs breakout)

What Changed

File Change
source.config.ts Added keywords to frontmatter schema
8 world/*.mdx files Added keywords frontmatter with method names + search terms
8 world/*.mdx files Added file path comments to complete code examples
skills/workflow/SKILL.md Rewrote Observability section with World SDK method signatures, resolveData guidance, name parsing return types, event types table, devalue format clarification

Skill Improvements (post-eval gap analysis)

  • Added exact World SDK method signatures with pagination nesting warning
  • Added resolveData parameter guidance ('none' for polling, 'all' for inspection)
  • Added runs.cancel() method
  • Added name parsing return type warning (returns null — always use optional chaining)
  • Added event types taxonomy table
  • Added devalue format clarification (without hydration, I/O is opaque arrays)

Eval Results

Two evals confirmed the skill improvements work. Full writeup: eval/skill-e2e-findings.md

Task breadth (with/without skill, 8 binary checks)

Check Without Skill With Skill
Valid workflow
Correct output
Hydration
Correct imports
Pagination shape
Display names
Step timing
Step I/O values

3/8 → 8/8 (+5 delta) — without-skill agent misses hydration, imports, pagination nesting, name parsers, and I/O validation entirely.

Code quality (3-condition deep eval)

Condition Score % Notes
New skill (this PR) 25/28 89% Correct getWorld from workflow/runtime, hydrateResourceIO, parseStepName, proper pagination
Old skill (main) 9/28 32% No World SDK docs — invented raw HTTP workarounds
No skill (baseline) 9/28 32% Same as old skill — old skill had zero observability coverage

Key finding: Old skill = no skill for World SDK tasks. The new skill's Observability section is what makes the difference.

Deep eval methodology

Each agent got the same task: "Build a 3-step workflow + an API route using the World SDK to list runs with pagination, get steps with hydrated I/O, calculate step duration, and parse display names."

Scored on 6 criteria (0-2 each):

  1. Import correctnessworkflow/runtime for getWorld, workflow/observability for hydration
  2. API completenessworld.runs.list(), world.steps.get(), hydrateResourceIO()
  3. Pattern quality — cursor pagination nesting, resolveData optimization, name parsing
  4. Code correctness — would typecheck against real types
  5. Observability coverage — step duration, hydrated I/O, display name parsing
  6. File structure — correct file placement guided by doc file path comments

Where new skill won decisively:

  • Only new-skill agent used getWorld() from workflow/runtime (others used raw HTTP or local helpers)
  • Only new-skill agent used hydrateResourceIO() + observabilityRevivers (others wrote naive JSON.parse fallbacks)
  • Only new-skill agent used FatalError + RetryableError with retryAfter
  • New-skill agent placed files at app/api/ (matching doc file path comments) vs others using src/app/

Test Results

  • Docs typecheck: 302 passed, 23 failed (all pre-existing — zero new failures)
  • Grep dry run: All 5 skill grep hints return hits across world/ pages

Manual TODOs

…rability

- Add `keywords` field to Fumadocs schema (source.config.ts)
- Add keywords to all 8 world/ MDX page frontmatter (method names, search terms)
- Add realistic file path comments to complete code examples (e.g., `// app/api/workflow-runs/route.ts`)
- Both changes improve agent greppability and code context

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Ralph-20 Ralph-20 requested a review from a team as a code owner March 19, 2026 23:09
@changeset-bot
Copy link

changeset-bot bot commented Mar 19, 2026

⚠️ No Changeset found

Latest commit: 14f0897

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Contributor

vercel bot commented Mar 19, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
❌ ▲ Vercel Production 757 1 67 825
✅ 💻 Local Development 782 0 118 900
✅ 📦 Local Production 782 0 118 900
✅ 🐘 Local Postgres 782 0 118 900
✅ 🪟 Windows 72 0 3 75
❌ 🌍 Community Worlds 118 56 15 189
✅ 📋 Other 198 0 27 225
Total 3491 57 466 4014

❌ Failed Tests

▲ Vercel Production (1 failed)

nitro (1 failed):

🌍 Community Worlds (56 failed)

mongodb (3 failed):

  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KM5ZGQ0F64XZ2FKRVV79MNSY
  • webhookWorkflow | wrun_01KM5ZGZ5NY5BHH9FGVMENZAXV
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KM5ZNWCJ0AEZK9P2J3VXBDVE

redis (2 failed):

  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KM5ZGQ0F64XZ2FKRVV79MNSY
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KM5ZNWCJ0AEZK9P2J3VXBDVE

turso (51 failed):

  • addTenWorkflow | wrun_01KM5ZFKXKEF82XJD2SH3RRHTJ
  • addTenWorkflow | wrun_01KM5ZFKXKEF82XJD2SH3RRHTJ
  • wellKnownAgentWorkflow (.well-known/agent) | wrun_01KM5ZGXKWRF81FSSZEJH78QRZ
  • should work with react rendering in step
  • promiseAllWorkflow | wrun_01KM5ZFSNVEMQ2DAHHG4VSJ3DW
  • promiseRaceWorkflow | wrun_01KM5ZG05GYDGPRBY2DDT7JDE2
  • promiseAnyWorkflow | wrun_01KM5ZG1Y8V0M4GBZMSBFPBNYR
  • importedStepOnlyWorkflow | wrun_01KM5ZHAW1WENXZAQGMQY2WSZR
  • hookWorkflow | wrun_01KM5ZGDNH15BJGN61JXVPAV5Q
  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KM5ZGQ0F64XZ2FKRVV79MNSY
  • webhookWorkflow | wrun_01KM5ZGZ5NY5BHH9FGVMENZAXV
  • sleepingWorkflow | wrun_01KM5ZH4GGQKHSZZK88Z8KPTA1
  • parallelSleepWorkflow | wrun_01KM5ZHGYKWZKTDT9A8A8W85RC
  • nullByteWorkflow | wrun_01KM5ZHMWWA2PRE8N14BG95WSR
  • workflowAndStepMetadataWorkflow | wrun_01KM5ZHPS33M0NNHWWP0CRJ2SB
  • fetchWorkflow | wrun_01KM5ZJJ9SYFA65M82HPVHTK0W
  • promiseRaceStressTestWorkflow | wrun_01KM5ZJN6J5SAA18X1DPEK75A4
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion | wrun_01KM5ZN9V0PV15V9V2MMXNBTM5
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KM5ZNWCJ0AEZK9P2J3VXBDVE
  • hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running | wrun_01KM5ZPFCRS2V6Y5BFG68AESF0
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars) | wrun_01KM5ZQ27FC8J24RFF3WDSG4JQ
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument | wrun_01KM5ZQ9TB3J2H0TPWEKJD5A5T
  • closureVariableWorkflow - nested step functions with closure variables | wrun_01KM5ZQEHHR68Z85160H1T73GD
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step | wrun_01KM5ZQGA4N3W9FX3QRWVMSBHK
  • health check (queue-based) - workflow and step endpoints respond to health check messages
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly | wrun_01KM5ZQY0M008AV1SSMWBF117G
  • Calculator.calculate - static workflow method using static step methods from another class | wrun_01KM5ZR2JYXVW0RSGE09QEEZPZ
  • AllInOneService.processNumber - static workflow method using sibling static step methods | wrun_01KM5ZR7NPGAB2SRC32RZESNVG
  • ChainableService.processWithThis - static step methods using this to reference the class | wrun_01KM5ZRDB6YFCXHFRNXT65FBZ3
  • thisSerializationWorkflow - step function invoked with .call() and .apply() | wrun_01KM5ZRK58TKXJV3H4DQY177JR
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE | wrun_01KM5ZRRTHJPCAYPX2M7CYSDFR
  • instanceMethodStepWorkflow - instance methods with "use step" directive | wrun_01KM5ZRZR7H2W09NJ2ZDRNTGXY
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context | wrun_01KM5ZS95QK9GY8CWB5XC2TR1E
  • stepFunctionAsStartArgWorkflow - step function reference passed as start() argument | wrun_01KM5ZSG1M8ZQ915B35FKBJC09
  • cancelRun - cancelling a running workflow | wrun_01KM5ZSNRP44D87EQ0G5AQBH4S
  • cancelRun via CLI - cancelling a running workflow | wrun_01KM5ZSY7CDM34DWMH0DD3CKPS
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router
  • hookWithSleepWorkflow - hook payloads delivered correctly with concurrent sleep | wrun_01KM5ZT98HMAZ8P7TYSN21CMA6
  • sleepInLoopWorkflow - sleep inside loop with steps actually delays each iteration | wrun_01KM5ZTVB3G1BJAKNACY3SC4R0
  • sleepWithSequentialStepsWorkflow - sequential steps work with concurrent sleep (control) | wrun_01KM5ZV5XFQV9DSB9KFACACCZ7

Details by Category

❌ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 68 0 7
✅ example 68 0 7
✅ express 68 0 7
✅ fastify 68 0 7
✅ hono 68 0 7
✅ nextjs-turbopack 73 0 2
✅ nextjs-webpack 73 0 2
❌ nitro 67 1 7
✅ nuxt 68 0 7
✅ sveltekit 68 0 7
✅ vite 68 0 7
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 66 0 9
✅ express-stable 66 0 9
✅ fastify-stable 66 0 9
✅ hono-stable 66 0 9
✅ nextjs-turbopack-canary 55 0 20
✅ nextjs-turbopack-stable 72 0 3
✅ nextjs-webpack-canary 55 0 20
✅ nextjs-webpack-stable 72 0 3
✅ nitro-stable 66 0 9
✅ nuxt-stable 66 0 9
✅ sveltekit-stable 66 0 9
✅ vite-stable 66 0 9
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 66 0 9
✅ express-stable 66 0 9
✅ fastify-stable 66 0 9
✅ hono-stable 66 0 9
✅ nextjs-turbopack-canary 55 0 20
✅ nextjs-turbopack-stable 72 0 3
✅ nextjs-webpack-canary 55 0 20
✅ nextjs-webpack-stable 72 0 3
✅ nitro-stable 66 0 9
✅ nuxt-stable 66 0 9
✅ sveltekit-stable 66 0 9
✅ vite-stable 66 0 9
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 66 0 9
✅ express-stable 66 0 9
✅ fastify-stable 66 0 9
✅ hono-stable 66 0 9
✅ nextjs-turbopack-canary 55 0 20
✅ nextjs-turbopack-stable 72 0 3
✅ nextjs-webpack-canary 55 0 20
✅ nextjs-webpack-stable 72 0 3
✅ nitro-stable 66 0 9
✅ nuxt-stable 66 0 9
✅ sveltekit-stable 66 0 9
✅ vite-stable 66 0 9
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 72 0 3
❌ 🌍 Community Worlds
App Passed Failed Skipped
✅ mongodb-dev 3 0 2
❌ mongodb 52 3 3
✅ redis-dev 3 0 2
❌ redis 53 2 3
✅ turso-dev 3 0 2
❌ turso 4 51 3
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 66 0 9
✅ e2e-local-postgres-nest-stable 66 0 9
✅ e2e-local-prod-nest-stable 66 0 9

📋 View full workflow run


Some E2E test jobs failed:

  • Vercel Prod: failure
  • Local Dev: success
  • Local Prod: success
  • Local Postgres: success
  • Windows: success

Check the workflow run for details.

post-eval improvement — gap analysis found:
- pagination nesting ({ pagination: { cursor } } not { cursor })
- resolveData param guidance ('none' for polling, 'all' for inspection)
- runs.cancel() method missing
- name parsing returns null (needs optional chaining)
- event types taxonomy
- devalue format clarification (opaque arrays without hydration)

eval: new skill 89% vs old skill 32% vs no skill 32%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ralph-20 and others added 2 commits March 20, 2026 10:08
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Ralph-20
Copy link
Author

Ralph-20 commented Mar 20, 2026

E2E Eval Update: Devalue Hydration Finding

Finding

world.steps.list({ resolveData: 'all' }) returns devalue-serialized byte arrays for input/output, not deserialized JS values. Must call hydrateResourceIO(step, revivers) from workflow/observability to get human-readable I/O.

Raw (from resolveData: 'all' without hydration):

input: {"0":100,"1":101,"2":118,"3":108,...}  ← Uint8Array as JSON (devalue wire format)

Hydrated (after hydrateResourceIO()):

input: {"args":[123, 2]}  ← actual function arguments
output: 125               ← actual return value

Changes

Commit What
18c1f5e SKILL.md: clarified resolveData controls data inclusion not hydration, fixed raw format description from [["Object",...]] to actual Uint8Array-like objects, added "common mistake" callout
5167201 skill-eval.test.ts: calls hydrateResourceIO() on all World SDK step data, captures both raw + hydrated I/O in findings

Eval Results

3/3 tests pass, all 18 checks ✅ — hydration works correctly for:

  • addTenWorkflow: add(123, 2) → 125 → 128 → 133
  • promiseAllWorkflow: parallel randomDelay("a","b","c") → "A","B","C"
  • errorRetryFatal: FatalError with full stack in hydrated error field

Full findings: eval/skill-e2e-findings.md

  • TODO: Verify skill wording is clear enough that an agent using it produces correct hydration code on first try (manual check against next eval iteration)

Ralph-20 and others added 2 commits March 20, 2026 10:16
…nalysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Static analysis + runtime eval proving SKILL.md materially helps agents
with observability patterns: hydration, imports, pagination, name parsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Ralph-20
Copy link
Author

With/Without Skill Comparison Eval

Result: 3/8 → 8/8 (+5 delta)

Ran addTenWorkflow(123) against the dev server, then scored each condition's inspection code (static analysis of imports, API patterns, hydration) + runtime verification.

Check Without Skill With Skill
Valid workflow
Correct output
Hydration
Correct imports
Pagination shape
Display names
Step timing
Step I/O values

Key failures without skill

  1. No hydration — reads step.input directly (devalue bytes), assumes resolveData:'all' = usable data
  2. Wrong imports — only knows workflow/api + workflow/runtime, doesn't know workflow/observability exists
  3. Flat pagination — uses { cursor } not { pagination: { cursor } }
  4. Manual name parsing — splits on "//" instead of using parseWorkflowName()/parseStepName()
  5. No I/O validation — checks !== undefined (truthy on byte objects) instead of verifying actual values

Files

  • eval/run-comparison.ts — runner script
  • eval/without_skill/ — agent-generated code (no skill)
  • eval/with_skill/ — agent-generated code (with skill)
  • eval/skill-comparison-findings.md — full findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant