Skip to content

[codex] Reproduce reused-sleep replay divergence in core runtime#2169

Draft
pranaygp wants to merge 3 commits into
stablefrom
codex/runtime-only-reused-sleep-repro
Draft

[codex] Reproduce reused-sleep replay divergence in core runtime#2169
pranaygp wants to merge 3 commits into
stablefrom
codex/runtime-only-reused-sleep-repro

Conversation

@pranaygp
Copy link
Copy Markdown
Contributor

@pranaygp pranaygp commented May 29, 2026

What this proves

This adds core-runtime regression and discriminator tests for the observed Promise.race([iterator.next(), reused sleep]) divergence. The tests drive setupWorkflowContext() with an explicitly ordered in-memory event history, so they do not involve DynamoDB, Postgres, world-local, a Vercel deployment, or network timing.

The ordered durable history is:

hook_created
wait_created
hook_received
step_created setupStep
step_started setupStep
step_completed setupStep
wait_completed
step_created drainStep

That history records the hook branch having won: the durable next operation is drainStep. Current stable replay instead follows the sleep branch and attempts to consume syncNextStep, reporting the same path-divergence corruption observed in hosted runs.

Expected failing validation

fnm exec --using 24 pnpm --filter '@workflow/core...' build
fnm exec --using 24 pnpm --filter @workflow/core exec vitest run src/hook-sleep-interaction.test.ts --reporter=verbose

The targeted suite deterministically fails for both late-waiter race expressions in both deserialization modes:

Promise.race([
  iterator.next().then(value => ({ kind: "hook", value })),
  pendingSleep.then(() => ({ kind: "sleep" })),
])

Promise.race([iterator.next(), pendingSleep])

Corrupted event log: step event step_created ... belongs to "drainStep", but the current step consumer is "syncNextStep"

As a positive control, the same history passes in both deserialization modes when the hook waiter exists before setupStep completes:

const pendingHook = iterator.next();
await setupStep();
await Promise.race([pendingHook, pendingSleep]);

This PR is intentionally test-only and expected to be red. Its purpose is to demonstrate that the failure is in core replay behavior even when storage has already supplied a valid ordered event history, while identifying the subscription timing boundary.

Promise-shape discriminator

The raw Promise.race([iterator.next(), pendingSleep]) comparison fails with the same drainStep versus syncNextStep divergence as the mapped then(...) comparison. I also ran a local exploratory direct-hook control, Promise.race([hook, pendingSleep]), and it fails the same way. Conversely, installing iterator.next() before awaiting setupStep passes.

That rules out the additional result-mapping .then(...) and the async iterator layer as the deciding cause for this event history. The sharper explanation is that hook_received is buffered before the post-setupStep hook waiter exists; while the step_completed promise is queued for delivery, EventsConsumer advances via process.nextTick() and consumes wait_completed, queuing the reused sleep resolution before resumed workflow code has installed its next race. Pre-installing the waiter gives the buffered hook delivery an opportunity to be ordered correctly and makes replay deterministic for the recorded hook-winning path.

Relationship to the candidate fix

#2048 contains the candidate runtime repair. The equivalent replay case passes on that branch, together with its handler-level stale-snapshot/delta replay tests. Keeping this PR separate provides a small failing control that can be cherry-picked or used while evaluating alternate fixes.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 29, 2026

⚠️ No Changeset found

Latest commit: 268d155

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
example-nextjs-workflow-turbopack Ready Ready Preview, Comment May 29, 2026 10:38pm
example-nextjs-workflow-webpack Ready Ready Preview, Comment May 29, 2026 10:38pm
example-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workbench-astro-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workbench-express-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workbench-fastify-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workbench-hono-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workbench-nitro-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workbench-nuxt-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workbench-sveltekit-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workbench-tanstack-start-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workbench-vite-workflow Ready Ready Preview, Comment May 29, 2026 10:38pm
workflow-swc-playground Ready Ready Preview, Comment May 29, 2026 10:38pm
workflow-tarballs Ready Ready Preview, Comment May 29, 2026 10:38pm
workflow-web Ready Ready Preview, Comment May 29, 2026 10:38pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
workflow-docs Skipped Skipped May 29, 2026 10:38pm

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 901 0 67 968
✅ 💻 Local Development 970 0 86 1056
✅ 📦 Local Production 970 0 86 1056
✅ 🐘 Local Postgres 970 0 86 1056
❌ 🌍 Community Worlds 15 69 0 84
✅ 📋 Other 492 0 36 528
Total 4318 69 361 4748

❌ Failed Tests

🌍 Community Worlds (69 failed)

mongodb-dev (1 failed):

  • dev e2e should rebuild on imported step dependency change

redis-dev (1 failed):

  • dev e2e should rebuild on imported step dependency change

turso-dev (1 failed):

  • dev e2e should rebuild on imported step dependency change

turso (66 failed):

  • addTenWorkflow | wrun_01KSTY2VRPNVJJAGHSK5DNB5R0
  • addTenWorkflow | wrun_01KSTY2VRPNVJJAGHSK5DNB5R0
  • wellKnownAgentWorkflow (.well-known/agent) | wrun_01KSTY4194191HBNM13ABH2N01
  • should work with react rendering in step
  • promiseAllWorkflow | wrun_01KSTY32PYS1YN95PNCM1FWJBY
  • promiseRaceWorkflow | wrun_01KSTY3973SCGQ9YFRH28ZN4PR
  • promiseAnyWorkflow | wrun_01KSTY3B2XJQV1HSQF4AFJXDB1
  • importedStepOnlyWorkflow | wrun_01KSTY4C8XTA9JDTFABFPT3HDG
  • readableStreamWorkflow | wrun_01KSTY3DZGZAZKMSE7XH94RDFE
  • hookWorkflow | wrun_01KSTY3T0ZSN2MWVRMK8WR0Z68
  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KSTY40ST41VQY3GXSC9W226R
  • webhookWorkflow | wrun_01KSTY4624ZZJA4YR77Y0F0WFB
  • sleepingWorkflow | wrun_01KSTY4BB22CJFSNGH4JE9YN7V
  • parallelSleepWorkflow | wrun_01KSTY4T93BXF4G388C0BXHVYM
  • nullByteWorkflow | wrun_01KSTY4YBMKBFZ4VRQ80WW2M9H
  • workflowAndStepMetadataWorkflow | wrun_01KSTY507KHM38A6HG99HVCP6S
  • outputStreamWorkflow no startIndex (reads all chunks)
  • outputStreamWorkflow positive startIndex (skips first chunk)
  • outputStreamWorkflow negative startIndex (reads from end)
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns correct index after stream completes
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns -1 before any chunks are written
  • outputStreamWorkflow - getTailIndex and getStreamChunks getStreamChunks returns same content as reading the stream
  • outputStreamInsideStepWorkflow - getWritable() called inside step functions | wrun_01KSTY7D9QB8W531RPJC305MGG
  • fetchWorkflow | wrun_01KSTY7TT3Z5A3WF912M44NME1
  • promiseRaceStressTestWorkflow | wrun_01KSTY7XTRR3J2GNXNMR29DQJW
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • error handling not registered WorkflowNotRegisteredError fails the run when workflow does not exist
  • error handling not registered StepNotRegisteredError fails the step but workflow can catch it
  • error handling not registered StepNotRegisteredError fails the run when not caught in workflow
  • hookCleanupTestWorkflow - hook token reuse after workflow completion | wrun_01KSTYAZN5ZEHZQJV0A92RWR5W
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KSTYBA9PRNX5E3DG7HESH2J3
  • hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running | wrun_01KSTYBR8BEQDP44R02D26ZGCX
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars) | wrun_01KSTYC6YXG7P13BH65AD6NZ94
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument | wrun_01KSTYCF0Q22XTRKPEDW6JCQR1
  • closureVariableWorkflow - nested step functions with closure variables | wrun_01KSTYCKNXXS63SRQCYCA3BB2M
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step | wrun_01KSTYCNHF1QQPGF9N1AT51RWA
  • health check (queue-based) - workflow and step endpoints respond to health check messages
  • health check (CLI) - workflow health command reports healthy endpoints
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly | wrun_01KSTYD3998VFXCTW0NGWH9T5N
  • Calculator.calculate - static workflow method using static step methods from another class | wrun_01KSTYD82JE7SS524EZWB1QPZ3
  • AllInOneService.processNumber - static workflow method using sibling static step methods | wrun_01KSTYDF33HCQD4BKP1T8MV6CK
  • ChainableService.processWithThis - static step methods using this to reference the class | wrun_01KSTYDMV6D530RBYPNTEFVT2S
  • thisSerializationWorkflow - step function invoked with .call() and .apply() | wrun_01KSTYDTT81KQ8RKFGXT71K8JA
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE | wrun_01KSTYE1VNA2163Y19HMANDGAH
  • instanceMethodStepWorkflow - instance methods with "use step" directive | wrun_01KSTYE8TJ83RBCWJ7YDBP7G55
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context | wrun_01KSTYEM1W0CS0P8H8XQD4HJ03
  • stepFunctionAsStartArgWorkflow - step function reference passed as start() argument | wrun_01KSTYEW3WQJAF672Y9DZQSCGE
  • cancelRun - cancelling a running workflow | wrun_01KSTYF30T6NGJS1EPFDMAYP5D
  • cancelRun via CLI - cancelling a running workflow | wrun_01KSTYFBYS3YSYAH4XC51X5XY8
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router
  • hookWithSleepWorkflow - hook payloads delivered correctly with concurrent sleep | wrun_01KSTYFQE0SN55218AQZFSEF33
  • sleepInLoopWorkflow - sleep inside loop with steps actually delays each iteration | wrun_01KSTYG6274S1FAQ2KW5V5910Y
  • sleepWithSequentialStepsWorkflow - sequential steps work with concurrent sleep (control) | wrun_01KSTYGH3RXAM0TCGV2AF9GZ0M
  • importMetaUrlWorkflow - import.meta.url is available in step bundles | wrun_01KSTYGR937NESEBZ2F0F5NEHA
  • metadataFromHelperWorkflow - getWorkflowMetadata/getStepMetadata work from module-level helper (#1577) | wrun_01KSTYGT3VXDMMR5KKQJREEMSZ
  • resilient start: addTenWorkflow completes when run_created returns 500 | wrun_01KSTYGW1BYMN5Z9GJ7XHMMDC5

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 81 0 7
✅ example 81 0 7
✅ express 81 0 7
✅ fastify 81 0 7
✅ hono 81 0 7
✅ nextjs-turbopack 86 0 2
✅ nextjs-webpack 86 0 2
✅ nitro 81 0 7
✅ nuxt 81 0 7
✅ sveltekit 81 0 7
✅ vite 81 0 7
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 82 0 6
✅ express-stable 82 0 6
✅ fastify-stable 82 0 6
✅ hono-stable 82 0 6
✅ nextjs-turbopack-canary 69 0 19
✅ nextjs-turbopack-stable 88 0 0
✅ nextjs-webpack-canary 69 0 19
✅ nextjs-webpack-stable 88 0 0
✅ nitro-stable 82 0 6
✅ nuxt-stable 82 0 6
✅ sveltekit-stable 82 0 6
✅ vite-stable 82 0 6
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 82 0 6
✅ express-stable 82 0 6
✅ fastify-stable 82 0 6
✅ hono-stable 82 0 6
✅ nextjs-turbopack-canary 69 0 19
✅ nextjs-turbopack-stable 88 0 0
✅ nextjs-webpack-canary 69 0 19
✅ nextjs-webpack-stable 88 0 0
✅ nitro-stable 82 0 6
✅ nuxt-stable 82 0 6
✅ sveltekit-stable 82 0 6
✅ vite-stable 82 0 6
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 82 0 6
✅ express-stable 82 0 6
✅ fastify-stable 82 0 6
✅ hono-stable 82 0 6
✅ nextjs-turbopack-canary 69 0 19
✅ nextjs-turbopack-stable 88 0 0
✅ nextjs-webpack-canary 69 0 19
✅ nextjs-webpack-stable 88 0 0
✅ nitro-stable 82 0 6
✅ nuxt-stable 82 0 6
✅ sveltekit-stable 82 0 6
✅ vite-stable 82 0 6
❌ 🌍 Community Worlds
App Passed Failed Skipped
❌ mongodb-dev 4 1 0
❌ redis-dev 4 1 0
❌ turso-dev 4 1 0
❌ turso 3 66 0
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 82 0 6
✅ e2e-local-dev-tanstack-start-stable 82 0 6
✅ e2e-local-postgres-nest-stable 82 0 6
✅ e2e-local-postgres-tanstack-start-stable 82 0 6
✅ e2e-local-prod-nest-stable 82 0 6
✅ e2e-local-prod-tanstack-start-stable 82 0 6

📋 View full workflow run


Some E2E test jobs failed:

  • Vercel Prod: success
  • Local Dev: success
  • Local Prod: success
  • Local Postgres: success
  • Windows: failure

Check the workflow run for details.

expect(result).toEqual(['first', 'second']);
});

it('should let a queued hook payload win when a reused wait completes after the step that installs the race', async () => {
Copy link
Copy Markdown
Contributor

@vercel vercel Bot May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two newly added, intentionally-red reproduction tests use plain it(...) instead of it.fails(...), breaking the core package's CI test suite (vitest run src) on this and every subsequent PR until the fix lands.

Fix on Vercel

@vercel vercel Bot temporarily deployed to Preview – workflow-docs May 29, 2026 22:34 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant