[core] Optimistic concurrency control for event writes against stale logs by VaguelySerious · Pull Request #2113 · vercel/workflow

VaguelySerious · 2026-05-26T16:08:50Z

Summary

Adds optimistic-concurrency fencing to the event writes that go through workflow-server, closing the hook/sleep race that produces CORRUPTED_EVENT_LOG on production runs.

The elapsed-wait scan snapshots the loaded events' tail eventId and passes it as lastKnownEventId on each wait_completed write. If a concurrent resumeHook has already advanced the canonical log, the server's CAS rejects the write.
On a fence-conflict EntityConflictError, the runtime now retries in-place rather than throwing the whole tick away: it reloads events from the cursor, refreshes the fence, and tries again (up to 5x with backoff). Falling back to queue redelivery turned out to thunder-herd — every redelivery spawns another concurrent tick, which fences-conflicts again, and workflows stall in running. If the wait was completed by a concurrent writer between attempts, we observe it in the reloaded log and skip the write entirely.
resumeHook appends hook_received unconditionally. ULID ordering already places this write after anything committed before us, and applying CAS would only ever reject the hook in favor of an unrelated concurrent write (which would lose the user's signal). Stale-snapshot protection lives on the tick writes that consume hooks, not on the write that delivers them.
CreateEventParams on @workflow/world grows lastKnownEventId and asOfTimestamp (both optional). Worlds that don't implement OCC can pass them through or ignore them.

Pairs with the workflow-server PR which materializes run.lastKnownEventId and gates event writes on it. The server's CAS is explicit opt-in — unfenced writers (most paths) still atomically advance the materialized value so fenced writers can chain off it, but they don't reject on contention.

Test plan

All 1013 core unit tests passing
Typecheck clean
Changeset included
End-to-end repro against a Vercel preview deployment of this branch + the matching workflow-server preview:

Stress reproduction details

The original CORRUPTED_EVENT_LOG bug was reproduced on stable at the rate of ~0.1–0.4% of runs under the following shape (Promise.race([hook, sleep]) with sleepBranchWaitCount parallel sleeps when sleep wins, fired 10 hook payloads per token at fireAfterMs=3000).

Re-ran the same shape against the fix on 2026-05-27 — two back-to-back cycles, 200 workflows each, identical params to the original repro:

{
  "count": 200, "iterations": 8, "sleepMs": 500,
  "sleepBranchWaitCount": 2, "sleepBranchWaitMs": 100,
  "drainDelayMs": 50, "fireAfterMs": 3000,
  "fireCount": 10, "fireBurstSpacingMs": 0
}

Results across 400 workflows:

Outcome	Count
`completed`	241
Still `running` at final check (low-priority queue tail)	132
`failed` with `errorCode: CORRUPTED_EVENT_LOG`	0
`failed` with `errorCode: USER_ERROR`	23
`failed` with `errorCode: WORLD_CONTRACT_ERROR`	4

The target bug — CORRUPTED_EVENT_LOG from the hook/sleep race — does not reproduce.

The remaining failures are a different pattern: in workflows whose Promise.all([sleep, sleep, …]) (sleep-branch waits) commits two wait_created events microseconds apart, sometimes only one of them gets a wait_completed and the workflow hangs on the unresolved promise until it eventually surfaces as USER_ERROR / WORLD_CONTRACT_ERROR. Root cause looks like the runtime's broadened "treat any non-fence 409 as already-completed" branch (eats a genuine conflict that should be retried). Tracking as a follow-up — the fix here closes the original CORRUPTED_EVENT_LOG bug, which is the production-visible defect.

🤖 Generated with Claude Code

The elapsed-wait scan now snapshots the loaded events' tail eventId and passes it as `lastKnownEventId` on each `wait_completed` write, so a concurrent `resumeHook` that has already advanced the canonical log is detected — the server's CAS rejects the write, we surface it as the existing `EntityConflictError`, and the next iteration re-replays against the fresh event list (mirroring the duplicate-wait fall-through that was already there). `resumeHook` sends `asOfTimestamp` (Date.now() at call time) so the server resolves the fence to the highest eventId strictly before resume time — no client-side event pre-read needed. Plumbed through `CreateEventParams` on `@workflow/world` so future worlds can forward as-is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

changeset-bot · 2026-05-26T16:08:55Z

🦋 Changeset detected

Latest commit: 1e69c82

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 20 packages

Name	Type
@workflow/core	Patch
@workflow/world	Patch
@workflow/world-vercel	Patch
@workflow/builders	Patch
@workflow/cli	Patch
@workflow/next	Patch
@workflow/nitro	Patch
@workflow/vitest	Patch
@workflow/web-shared	Patch
@workflow/web	Patch
workflow	Patch
@workflow/world-testing	Patch
@workflow/world-local	Patch
@workflow/world-postgres	Patch
@workflow/astro	Patch
@workflow/nest	Patch
@workflow/rollup	Patch
@workflow/sveltekit	Patch
@workflow/vite	Patch
@workflow/nuxt	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2026-05-26T16:08:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
example-nextjs-workflow-turbopack	Ready	Preview, Comment	May 27, 2026 9:58am
example-nextjs-workflow-webpack	Ready	Preview, Comment	May 27, 2026 9:58am
example-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workbench-astro-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workbench-express-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workbench-fastify-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workbench-hono-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workbench-nitro-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workbench-nuxt-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workbench-sveltekit-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workbench-tanstack-start-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workbench-vite-workflow	Ready	Preview, Comment	May 27, 2026 9:58am
workflow-docs	Ready	Preview, Comment, Open in v0	May 27, 2026 9:58am
workflow-swc-playground	Ready	Preview, Comment	May 27, 2026 9:58am
workflow-tarballs	Ready	Preview, Comment	May 27, 2026 9:58am
workflow-web	Ready	Preview, Comment	May 27, 2026 9:58am

github-actions · 2026-05-26T16:09:03Z

🧪 E2E Test Results

❌ Some tests failed

Summary

	Passed	Failed	Skipped	Total
❌ ▲ Vercel Production	1221	1	219	1441
✅ 💻 Local Development	1615	0	219	1834
✅ 📦 Local Production	1615	0	219	1834
✅ 🐘 Local Postgres	1615	0	219	1834
✅ 🪟 Windows	131	0	0	131
✅ 📋 Other	741	0	176	917
Total	6938	1	1052	7991

❌ Failed Tests

▲ Vercel Production (1 failed)

fastify (1 failed):

hookWorkflow | wrun_01KSMDV4HMXV2QCAPPMREZQFEH | 🔍 observability

Details by Category

❌ ▲ Vercel Production

App	Passed	Failed	Skipped
✅ astro	105	0	26
✅ example	105	0	26
✅ express	105	0	26
❌ fastify	104	1	26
✅ hono	105	0	26
✅ nextjs-turbopack	129	0	2
✅ nextjs-webpack	129	0	2
✅ nitro	105	0	26
✅ nuxt	105	0	26
✅ sveltekit	124	0	7
✅ vite	105	0	26

✅ 💻 Local Development

App	Passed	Skipped
✅ astro-stable	106	25
✅ express-stable	106	25
✅ fastify-stable	106	25
✅ hono-stable	106	25
✅ nextjs-turbopack-canary	112	19
✅ nextjs-turbopack-stable-lazy-discovery-disabled	131	0
✅ nextjs-turbopack-stable-lazy-discovery-enabled	131	0
✅ nextjs-webpack-canary	112	19
✅ nextjs-webpack-stable-lazy-discovery-disabled	131	0
✅ nextjs-webpack-stable-lazy-discovery-enabled	131	0
✅ nitro-stable	106	25
✅ nuxt-stable	106	25
✅ sveltekit-stable	125	6
✅ vite-stable	106	25

✅ 📦 Local Production

App	Passed	Skipped
✅ astro-stable	106	25
✅ express-stable	106	25
✅ fastify-stable	106	25
✅ hono-stable	106	25
✅ nextjs-turbopack-canary	112	19
✅ nextjs-turbopack-stable-lazy-discovery-disabled	131	0
✅ nextjs-turbopack-stable-lazy-discovery-enabled	131	0
✅ nextjs-webpack-canary	112	19
✅ nextjs-webpack-stable-lazy-discovery-disabled	131	0
✅ nextjs-webpack-stable-lazy-discovery-enabled	131	0
✅ nitro-stable	106	25
✅ nuxt-stable	106	25
✅ sveltekit-stable	125	6
✅ vite-stable	106	25

✅ 🐘 Local Postgres

App	Passed	Skipped
✅ astro-stable	106	25
✅ express-stable	106	25
✅ fastify-stable	106	25
✅ hono-stable	106	25
✅ nextjs-turbopack-canary	112	19
✅ nextjs-turbopack-stable-lazy-discovery-disabled	131	0
✅ nextjs-turbopack-stable-lazy-discovery-enabled	131	0
✅ nextjs-webpack-canary	112	19
✅ nextjs-webpack-stable-lazy-discovery-disabled	131	0
✅ nextjs-webpack-stable-lazy-discovery-enabled	131	0
✅ nitro-stable	106	25
✅ nuxt-stable	106	25
✅ sveltekit-stable	125	6
✅ vite-stable	106	25

✅ 🪟 Windows

App	Passed	Failed	Skipped
✅ nextjs-turbopack	131	0	0

✅ 📋 Other

App	Passed	Skipped
✅ e2e-local-dev-nest-stable	106	25
✅ e2e-local-dev-tanstack-start-	106	25
✅ e2e-local-postgres-nest-stable	106	25
✅ e2e-local-postgres-tanstack-start-	106	25
✅ e2e-local-prod-nest-stable	106	25
✅ e2e-local-prod-tanstack-start-	106	25
✅ e2e-vercel-prod-tanstack-start	105	26

📋 View full workflow run

❌ Some E2E test jobs failed:

Vercel Prod: failure
Local Dev: success
Local Prod: success
Local Postgres: success
Windows: success

Check the workflow run for details.

github-actions · 2026-05-26T16:09:03Z

📊 Benchmark Results

📈 Comparing against baseline from main branch. Green 🟢 = faster, Red 🔺 = slower.

workflow with no steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
💻 Local	🥇 Nitro	0.030s (-29.5% 🟢)	1.006s (~)	0.975s	10	1.00x
💻 Local	Express	0.031s (-30.5% 🟢)	1.006s (~)	0.975s	10	1.01x
🐘 Postgres	Nitro	0.046s (-51.4% 🟢)	1.011s (-3.1%)	0.965s	10	1.52x
💻 Local	Next.js (Turbopack)	0.047s	1.005s	0.958s	10	1.56x
🐘 Postgres	Express	0.049s (-14.8% 🟢)	1.012s (~)	0.962s	10	1.63x
🐘 Postgres	Next.js (Turbopack)	0.057s	1.013s	0.955s	10	1.88x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	0.270s (+7.3% 🔺)	2.339s (~)	2.069s	10	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 1 step

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
💻 Local	🥇 Nitro	1.069s (-5.5% 🟢)	2.007s (~)	0.938s	10	1.00x
💻 Local	Express	1.075s (-4.5%)	2.006s (~)	0.931s	10	1.01x
🐘 Postgres	Nitro	1.081s (-5.2% 🟢)	2.010s (~)	0.929s	10	1.01x
🐘 Postgres	Express	1.083s (-5.6% 🟢)	2.010s (~)	0.928s	10	1.01x
💻 Local	Next.js (Turbopack)	1.116s	2.005s	0.889s	10	1.04x
🐘 Postgres	Next.js (Turbopack)	1.124s	2.009s	0.885s	10	1.05x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	1.510s (-25.8% 🟢)	3.632s (-5.2% 🟢)	2.122s	10	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 10 sequential steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	10.395s (-5.2% 🟢)	11.015s (~)	0.620s	3	1.00x
🐘 Postgres	Nitro	10.412s (-4.2%)	11.016s (~)	0.604s	3	1.00x
💻 Local	Nitro	10.414s (-4.9%)	11.021s (~)	0.607s	3	1.00x
💻 Local	Express	10.419s (-4.6%)	11.021s (~)	0.602s	3	1.00x
💻 Local	Next.js (Turbopack)	10.648s	11.020s	0.372s	3	1.02x
🐘 Postgres	Next.js (Turbopack)	10.721s	11.020s	0.299s	3	1.03x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	13.431s (-22.5% 🟢)	15.343s (-20.9% 🟢)	1.912s	2	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 25 sequential steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
💻 Local	🥇 Nitro	13.459s (-10.6% 🟢)	14.026s (-12.5% 🟢)	0.568s	5	1.00x
🐘 Postgres	Express	13.474s (-7.6% 🟢)	14.019s (-6.7% 🟢)	0.545s	5	1.00x
🐘 Postgres	Nitro	13.477s (-7.7% 🟢)	14.021s (-6.7% 🟢)	0.544s	5	1.00x
💻 Local	Express	13.490s (-9.9% 🟢)	14.027s (-6.7% 🟢)	0.537s	5	1.00x
💻 Local	Next.js (Turbopack)	14.019s	14.628s	0.608s	5	1.04x
🐘 Postgres	Next.js (Turbopack)	14.123s	15.019s	0.896s	4	1.05x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	21.775s (-58.6% 🟢)	23.776s (-56.5% 🟢)	2.002s	3	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 50 sequential steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
💻 Local	🥇 Nitro	11.882s (-29.2% 🟢)	12.022s (-29.4% 🟢)	0.140s	8	1.00x
🐘 Postgres	Express	11.892s (-15.1% 🟢)	12.141s (-16.8% 🟢)	0.249s	8	1.00x
💻 Local	Express	11.905s (-28.3% 🟢)	12.147s (-28.7% 🟢)	0.242s	8	1.00x
🐘 Postgres	Nitro	12.053s (-13.7% 🟢)	12.644s (-11.6% 🟢)	0.590s	8	1.01x
💻 Local	Next.js (Turbopack)	12.941s	13.166s	0.225s	7	1.09x
🐘 Postgres	Next.js (Turbopack)	13.139s	14.016s	0.876s	7	1.11x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	30.903s (-92.1% 🟢)	33.322s (-91.6% 🟢)	2.420s	3	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

Promise.all with 10 concurrent steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	1.145s (-9.2% 🟢)	2.006s (~)	0.862s	15	1.00x
🐘 Postgres	Nitro	1.149s (-9.8% 🟢)	2.008s (~)	0.858s	15	1.00x
💻 Local	Express	1.175s (-21.1% 🟢)	2.007s (~)	0.832s	15	1.03x
💻 Local	Nitro	1.178s (-27.8% 🟢)	2.006s (-3.3%)	0.828s	15	1.03x
🐘 Postgres	Next.js (Turbopack)	1.210s	2.007s	0.797s	15	1.06x
💻 Local	Next.js (Turbopack)	1.284s	2.005s	0.722s	15	1.12x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	2.458s (-27.7% 🟢)	3.840s (-22.1% 🟢)	1.383s	8	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

Promise.all with 25 concurrent steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	1.206s (-48.9% 🟢)	2.007s (-33.3% 🟢)	0.801s	15	1.00x
🐘 Postgres	Nitro	1.221s (-48.1% 🟢)	2.008s (-33.2% 🟢)	0.787s	15	1.01x
🐘 Postgres	Next.js (Turbopack)	1.353s	2.007s	0.653s	15	1.12x
💻 Local	Express	1.725s (-41.6% 🟢)	2.006s (-41.9% 🟢)	0.281s	15	1.43x
💻 Local	Next.js (Turbopack)	1.749s	2.073s	0.324s	15	1.45x
💻 Local	Nitro	1.750s (-44.3% 🟢)	2.006s (-48.4% 🟢)	0.255s	15	1.45x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	3.493s (-50.8% 🟢)	4.977s (-44.1% 🟢)	1.484s	7	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

Promise.all with 50 concurrent steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	1.305s (-62.6% 🟢)	2.008s (-49.9% 🟢)	0.703s	15	1.00x
🐘 Postgres	Nitro	1.360s (-60.9% 🟢)	2.008s (-49.9% 🟢)	0.648s	15	1.04x
🐘 Postgres	Next.js (Turbopack)	1.612s	2.007s	0.396s	15	1.24x
💻 Local	Next.js (Turbopack)	4.832s	5.345s	0.513s	6	3.70x
💻 Local	Nitro	5.161s (-38.2% 🟢)	5.679s (-37.0% 🟢)	0.519s	6	3.96x
💻 Local	Express	5.508s (-33.9% 🟢)	5.846s (-35.2% 🟢)	0.338s	6	4.22x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	6.890s (-22.7% 🟢)	8.580s (-21.7% 🟢)	1.690s	4	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

Promise.race with 10 concurrent steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	1.145s (-9.0% 🟢)	2.010s (~)	0.866s	15	1.00x
🐘 Postgres	Nitro	1.148s (-8.7% 🟢)	2.008s (~)	0.860s	15	1.00x
🐘 Postgres	Next.js (Turbopack)	1.202s	2.008s	0.806s	15	1.05x
💻 Local	Next.js (Turbopack)	1.305s	2.006s	0.701s	15	1.14x
💻 Local	Express	1.410s (-25.5% 🟢)	2.006s (-15.1% 🟢)	0.596s	15	1.23x
💻 Local	Nitro	1.415s (-24.2% 🟢)	2.007s (-14.3% 🟢)	0.592s	15	1.24x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	2.454s (-16.3% 🟢)	3.868s (-16.7% 🟢)	1.414s	8	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

Promise.race with 25 concurrent steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	1.208s (-48.4% 🟢)	2.008s (-33.3% 🟢)	0.800s	15	1.00x
🐘 Postgres	Nitro	1.228s (-47.5% 🟢)	2.009s (-33.3% 🟢)	0.782s	15	1.02x
🐘 Postgres	Next.js (Turbopack)	1.339s	2.007s	0.668s	15	1.11x
💻 Local	Express	2.030s (-35.2% 🟢)	2.469s (-34.4% 🟢)	0.438s	13	1.68x
💻 Local	Next.js (Turbopack)	2.048s	2.736s	0.688s	11	1.70x
💻 Local	Nitro	2.101s (-31.5% 🟢)	2.509s (-35.4% 🟢)	0.408s	12	1.74x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	3.652s (+16.2% 🔺)	5.229s (+15.6% 🔺)	1.577s	6	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

Promise.race with 50 concurrent steps

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	1.317s (-62.4% 🟢)	2.008s (-49.9% 🟢)	0.691s	15	1.00x
🐘 Postgres	Nitro	1.375s (-60.5% 🟢)	2.008s (-49.9% 🟢)	0.633s	15	1.04x
🐘 Postgres	Next.js (Turbopack)	1.614s	2.008s	0.394s	15	1.23x
💻 Local	Next.js (Turbopack)	5.531s	6.211s	0.680s	5	4.20x
💻 Local	Express	5.862s (-33.4% 🟢)	6.215s (-33.0% 🟢)	0.354s	5	4.45x
💻 Local	Nitro	6.248s (-31.7% 🟢)	6.613s (-34.0% 🟢)	0.365s	5	4.74x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	6.242s (-7.6% 🟢)	7.983s (-6.6% 🟢)	1.742s	4	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 10 sequential data payload steps (10KB)

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Nitro	0.444s (-45.9% 🟢)	1.007s (~)	0.563s	60	1.00x
🐘 Postgres	Express	0.449s (-46.5% 🟢)	1.007s (-1.6%)	0.558s	60	1.01x
💻 Local	Express	0.464s (-52.9% 🟢)	1.004s (-6.7% 🟢)	0.540s	60	1.04x
💻 Local	Nitro	0.474s (-51.6% 🟢)	1.004s (-8.2% 🟢)	0.530s	60	1.07x
🐘 Postgres	Next.js (Turbopack)	0.667s	1.006s	0.339s	60	1.50x
💻 Local	Next.js (Turbopack)	0.704s	1.004s	0.300s	60	1.59x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	5.314s (-63.4% 🟢)	6.832s (-57.5% 🟢)	1.518s	9	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 25 sequential data payload steps (10KB)

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	1.041s (-47.3% 🟢)	1.614s (-28.5% 🟢)	0.573s	56	1.00x
🐘 Postgres	Nitro	1.076s (-44.2% 🟢)	1.693s (-19.4% 🟢)	0.616s	54	1.03x
💻 Local	Express	1.186s (-60.7% 🟢)	2.006s (-44.1% 🟢)	0.820s	45	1.14x
💻 Local	Nitro	1.186s (-60.9% 🟢)	2.006s (-46.6% 🟢)	0.820s	45	1.14x
🐘 Postgres	Next.js (Turbopack)	1.629s	2.008s	0.378s	45	1.57x
💻 Local	Next.js (Turbopack)	1.806s	2.028s	0.222s	45	1.73x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	13.860s (-72.2% 🟢)	15.887s (-69.3% 🟢)	2.027s	6	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 50 sequential data payload steps (10KB)

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Nitro	2.078s (-49.4% 🟢)	2.508s (-45.5% 🟢)	0.430s	48	1.00x
🐘 Postgres	Express	2.110s (-47.1% 🟢)	2.675s (-38.8% 🟢)	0.565s	45	1.02x
💻 Local	Express	2.669s (-71.0% 🟢)	3.008s (-70.0% 🟢)	0.339s	40	1.28x
💻 Local	Nitro	2.684s (-71.1% 🟢)	3.008s (-70.0% 🟢)	0.324s	40	1.29x
🐘 Postgres	Next.js (Turbopack)	3.262s	4.043s	0.780s	30	1.57x
💻 Local	Next.js (Turbopack)	3.719s	4.008s	0.289s	30	1.79x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	26.999s (-74.8% 🟢)	29.226s (-73.2% 🟢)	2.227s	5	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 10 concurrent data payload steps (10KB)

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Nitro	0.175s (-38.4% 🟢)	1.006s (~)	0.831s	60	1.00x
🐘 Postgres	Express	0.176s (-37.8% 🟢)	1.006s (~)	0.831s	60	1.01x
🐘 Postgres	Next.js (Turbopack)	0.231s	1.006s	0.775s	60	1.33x
💻 Local	Express	0.387s (-30.9% 🟢)	1.004s (~)	0.617s	60	2.22x
💻 Local	Nitro	0.411s (-32.0% 🟢)	1.004s (-1.7%)	0.593s	60	2.35x
💻 Local	Next.js (Turbopack)	0.483s	1.004s	0.521s	60	2.77x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	2.656s (+31.3% 🔺)	4.319s (+13.8% 🔺)	1.663s	14	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 25 concurrent data payload steps (10KB)

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Nitro	0.301s (-39.3% 🟢)	1.006s (~)	0.705s	90	1.00x
🐘 Postgres	Express	0.330s (-35.3% 🟢)	1.028s (+2.2%)	0.698s	88	1.09x
🐘 Postgres	Next.js (Turbopack)	0.436s	1.006s	0.570s	90	1.45x
💻 Local	Next.js (Turbopack)	2.146s	2.884s	0.739s	32	7.12x
💻 Local	Nitro	2.190s (-13.7% 🟢)	2.853s (-5.2% 🟢)	0.663s	32	7.26x
💻 Local	Express	2.199s (-12.5% 🟢)	2.766s (-8.1% 🟢)	0.567s	33	7.29x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	5.974s (+69.0% 🔺)	7.826s (+50.7% 🔺)	1.852s	12	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

workflow with 50 concurrent data payload steps (10KB)

💻 Local Development

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	0.605s (-26.1% 🟢)	1.006s (-1.1%)	0.401s	120	1.00x
🐘 Postgres	Nitro	0.628s (-20.6% 🟢)	1.006s (~)	0.378s	120	1.04x
🐘 Postgres	Next.js (Turbopack)	0.901s	1.118s	0.217s	108	1.49x
💻 Local	Express	10.018s (-10.5% 🟢)	10.778s (-9.7% 🟢)	0.760s	12	16.55x
💻 Local	Nitro	10.268s (-8.2% 🟢)	10.861s (-6.9% 🟢)	0.593s	12	16.96x
💻 Local	Next.js (Turbopack)	10.462s	11.117s	0.655s	11	17.28x

▲ Production (Vercel)

World	Framework	Workflow Time	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	16.620s (+60.9% 🔺)	18.683s (+52.1% 🔺)	2.063s	7	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-

🔍 Observability: Next.js (Turbopack)

Stream Benchmarks (includes TTFB metrics)

workflow with stream

💻 Local Development

World	Framework	Workflow Time	TTFB	Slurp	Wall Time	Overhead	Samples	vs Fastest
💻 Local	🥇 Nitro	1.131s (+429.2% 🔺)	2.005s (+99.6% 🔺)	0.012s (-2.4%)	2.020s (+98.3% 🔺)	0.889s	10	1.00x
🐘 Postgres	Express	1.137s (+454.3% 🔺)	1.998s (+100.1% 🔺)	0.001s (-25.0% 🟢)	2.011s (+98.8% 🔺)	0.874s	10	1.01x
🐘 Postgres	Nitro	1.141s (+456.8% 🔺)	1.999s (+100.0% 🔺)	0.001s (-26.7% 🟢)	2.011s (+98.8% 🔺)	0.869s	10	1.01x
💻 Local	Express	1.143s (+474.1% 🔺)	2.006s (+99.7% 🔺)	0.012s (+2.5%)	2.020s (+98.4% 🔺)	0.877s	10	1.01x
💻 Local	Next.js (Turbopack)	1.167s	2.003s	0.012s	2.019s	0.851s	10	1.03x
🐘 Postgres	Next.js (Turbopack)	1.210s	2.001s	0.001s	2.010s	0.800s	10	1.07x

▲ Production (Vercel)

World	Framework	Workflow Time	TTFB	Slurp	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	2.147s (-68.7% 🟢)	3.299s (-61.9% 🟢)	2.045s (+223.6% 🔺)	5.833s (-40.4% 🟢)	3.685s	10	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-	-

🔍 Observability: Next.js (Turbopack)

stream pipeline with 5 transform steps (1MB)

💻 Local Development

World	Framework	Workflow Time	TTFB	Slurp	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	1.496s (+137.4% 🔺)	2.004s (+99.1% 🔺)	0.004s (+4.4%)	2.024s (+97.9% 🔺)	0.529s	30	1.00x
💻 Local	Nitro	1.524s (+81.7% 🔺)	2.011s (+98.7% 🔺)	0.010s (+5.3% 🔺)	2.022s (+81.2% 🔺)	0.499s	30	1.02x
🐘 Postgres	Nitro	1.524s (+144.2% 🔺)	2.002s (+98.8% 🔺)	0.004s (-4.1%)	2.026s (+98.1% 🔺)	0.502s	30	1.02x
💻 Local	Express	1.530s (+102.1% 🔺)	2.012s (+95.6% 🔺)	0.011s (+15.1% 🔺)	2.025s (+94.7% 🔺)	0.495s	30	1.02x
🐘 Postgres	Next.js (Turbopack)	1.674s	2.010s	0.004s	2.025s	0.352s	30	1.12x
💻 Local	Next.js (Turbopack)	1.845s	2.012s	0.009s	2.202s	0.357s	28	1.23x

▲ Production (Vercel)

World	Framework	Workflow Time	TTFB	Slurp	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	6.069s (-64.1% 🟢)	7.539s (-58.7% 🟢)	0.289s (+36.9% 🔺)	8.337s (-56.0% 🟢)	2.267s	8	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-	-

🔍 Observability: Next.js (Turbopack)

10 parallel streams (1MB each)

💻 Local Development

World	Framework	Workflow Time	TTFB	Slurp	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	0.641s (-33.3% 🟢)	1.031s (-19.3% 🟢)	0.000s (+19.0% 🔺)	1.051s (-19.6% 🟢)	0.410s	58	1.00x
🐘 Postgres	Nitro	0.666s (-31.2% 🟢)	1.015s (-18.7% 🟢)	0.000s (-17.2% 🟢)	1.037s (-17.6% 🟢)	0.371s	58	1.04x
🐘 Postgres	Next.js (Turbopack)	0.761s	1.054s	0.000s	1.060s	0.300s	57	1.19x
💻 Local	Nitro	1.336s (+9.3% 🔺)	2.015s (~)	0.000s (+200.0% 🔺)	2.017s (~)	0.681s	30	2.08x
💻 Local	Express	1.349s (+10.2% 🔺)	2.015s (~)	0.000s (-30.0% 🟢)	2.017s (~)	0.667s	30	2.10x
💻 Local	Next.js (Turbopack)	1.374s	2.014s	0.000s	2.017s	0.643s	30	2.14x

▲ Production (Vercel)

World	Framework	Workflow Time	TTFB	Slurp	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	3.459s (-66.0% 🟢)	4.699s (-59.2% 🟢)	0.000s (+Infinity% 🔺)	5.149s (-57.3% 🟢)	1.689s	12	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-	-

🔍 Observability: Next.js (Turbopack)

fan-out fan-in 10 streams (1MB each)

💻 Local Development

World	Framework	Workflow Time	TTFB	Slurp	Wall Time	Overhead	Samples	vs Fastest
🐘 Postgres	🥇 Express	1.335s (-24.7% 🟢)	1.995s (-8.4% 🟢)	0.000s (+Infinity% 🔺)	2.010s (-8.6% 🟢)	0.675s	30	1.00x
🐘 Postgres	Nitro	1.375s (-23.3% 🟢)	2.067s (-3.5%)	0.000s (-3.4%)	2.092s (-3.8%)	0.717s	29	1.03x
🐘 Postgres	Next.js (Turbopack)	1.775s	2.262s	0.000s	2.269s	0.495s	27	1.33x
💻 Local	Next.js (Turbopack)	2.654s	3.291s	0.000s	3.294s	0.641s	19	1.99x
💻 Local	Nitro	3.093s (-8.7% 🟢)	3.776s (-6.3% 🟢)	0.000s (-18.0% 🟢)	3.782s (-6.3% 🟢)	0.689s	16	2.32x
💻 Local	Express	3.192s (-8.0% 🟢)	4.029s (~)	0.000s (-41.7% 🟢)	4.031s (~)	0.839s	15	2.39x

▲ Production (Vercel)

World	Framework	Workflow Time	TTFB	Slurp	Wall Time	Overhead	Samples	vs Fastest
▲ Vercel	🥇 Next.js (Turbopack)	5.088s (-9.4% 🟢)	6.519s (-6.6% 🟢)	0.000s (-11.1% 🟢)	6.961s (-7.7% 🟢)	1.873s	9	1.00x
▲ Vercel	Express	⚠️ missing	-	-	-	-	-
▲ Vercel	Nitro	⚠️ missing	-	-	-	-	-

🔍 Observability: Next.js (Turbopack)

Summary

Fastest Framework by World

Winner determined by most benchmark wins

World	🥇 Fastest Framework	Wins
💻 Local	Nitro	8/21
🐘 Postgres	Express	15/21
▲ Vercel	Next.js (Turbopack)	21/21

Fastest World by Framework

Winner determined by most benchmark wins

Framework	🥇 Fastest World	Wins
Express	🐘 Postgres	19/21
Next.js (Turbopack)	🐘 Postgres	15/21
Nitro	🐘 Postgres	15/21

Column Definitions

Workflow Time: Runtime reported by workflow (completedAt - createdAt) - primary metric
TTFB: Time to First Byte - time from workflow start until first stream byte received (stream benchmarks only)
Slurp: Time from first byte to complete stream consumption (stream benchmarks only)
Wall Time: Total testbench time (trigger workflow + poll for result)
Overhead: Testbench overhead (Wall Time - Workflow Time)
Samples: Number of benchmark iterations run
vs Fastest: How much slower compared to the fastest configuration for this benchmark

Worlds:

💻 Local: In-memory filesystem world (local development)
🐘 Postgres: PostgreSQL database world (local development)
▲ Vercel: Vercel production/preview deployment
🌐 Turso: Community world (local development)
🌐 MongoDB: Community world (local development)
🌐 Redis: Community world (local development)
🌐 Jazz: Community world (local development)
🌐 Redis: Community world (local development)
🌐 Redis + BullMQ: Community world (local development)
🌐 Cloudflare: Community world (local development)
🌐 MySQL: Community world (local development)
🌐 Azure: Community world (local development)
🌐 NATS JetStream: Community world (local development)
🌐 Upstash: Community world (local development)

📋 View full workflow run

❌ Some benchmark jobs failed:

Local: success
Postgres: success
Vercel: failure

Check the workflow run for details.

vercel

Additional Suggestion:

OCC fence parameters (lastKnownEventId, asOfTimestamp) are silently dropped for wait_completed and hook_received events because the lazy branch of createWorkflowRunEventInner doesn't forward them.

The lazy-refs branch of createWorkflowRunEventInner forgot to thread `lastKnownEventId` and `asOfTimestamp` into the request body, so the fence was silently dropped for any event whose type went through the lazy path (i.e., not in `eventsNeedingResolve`). The resolve branch already had the forwarding. Caught by Vercel Agent Review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

VaguelySerious · 2026-05-27T08:03:36Z

Vercel Agent review acknowledged + addressed in 1e69c82 — the lazy branch of createWorkflowRunEventInner now forwards lastKnownEventId and asOfTimestamp alongside the resolve branch. Good catch — without this the fence was silently dropped for any event whose type didn't appear in eventsNeedingResolve (including wait_completed and hook_received).

VaguelySerious · 2026-05-27T08:25:01Z

Status after 1e69c82b:

✅ All Local E2E (Dev / Prod / Postgres / Windows) green.
✅ Vercel Prod E2E: 11/12 apps green (astro, example, express, fastify build, hono, nextjs-turbopack, nextjs-webpack, nitro, nuxt, sveltekit, tanstack-start, vite). Including hono and vite that were red on the previous push.
❌ Vercel Prod fastify — single failure: abortAnyInStepWorkflow: AbortSignal.any inside a step composes deserialized signals (130/131 passed). The assertion is expect(returnValue.c2Aborted).toBe(true) — the workflow-side c2 controller didn't observe the abort by the time the workflow returned.

I don't think this is from anything in this PR:

The failing workflow has no sleep()/wait calls, so the elapsed-wait-scan fence path I touched never executes for it.
The abort signal here propagates from step → workflow via the controller's backing stream, not via hook_received. resumeHook is not in the path either.
Both stepResult.saw === true and stepResult.via === 'listener' passed on the same run, so the step-side composition worked correctly. Only the workflow-side controller hadn't caught up by the time the workflow returned.
The same test passes on Local fastify (and on every other Vercel Prod app on this PR).

Reads like a long-standing abort-stream-propagation timing flake that just happened to fire on fastify Vercel Prod this run. Will rerun the job once the workflow run is no longer in-progress; flagging here in case a reviewer hits it before I get back to it.

Co-authored-by: Peter Wielander <mittgfu@gmail.com> Signed-off-by: Peter Wielander <mittgfu@gmail.com>

VaguelySerious · 2026-05-27T11:02:45Z

Good catch — the answer is "yes, conceptually they can race the same way." Here's the breakdown of where we currently fence vs. don't, and what's at risk:

Where the SDK currently sends a fence

Only one site: the elapsed-wait scan in runtime.ts, when it writes wait_completed. The fence is the loaded events' tail eventId; on a fence conflict the tick retries in-place with a fresh fence (the loop I added).

Where it doesn't, but could race in the same shape

Any write that the workflow runtime makes based on a branch decision driven by the loaded events array can race in the same way the production hook/sleep bug did. Concretely:

suspension-handler.ts — these are exactly the writes the workflow VM emits when its replay decides to allocate a new entity. If that decision was made on a stale snapshot, the write is "stale-branch":

Write	Currently fenced?	Can race?
`hook_created`	no	yes — workflow decided to allocate a hook based on its branch
`hook_disposed`	no	yes — workflow decided to dispose
`step_created`	no	yes — workflow decided to invoke a step
`wait_created`	no	yes — workflow decided to `sleep()`

runtime.ts — terminal-state writes after a successful replay:

Write	Currently fenced?	Can race?
`run_completed` (line 974)	no	yes — workflow returned based on its branch
`run_failed` (catch path)	no	yes — workflow threw based on its branch

What doesn't need a fence (and why)

step_started, step_completed, step_failed, step_retrying (step-executor.ts, step-handler.ts): these record facts about a step that's already in the log via step_created. They're not making a new branch decision; if the step was allocated, finishing it is just bookkeeping.
run_created (start.ts): no prior events for the run; nothing to be stale against.
run_started (initial tick): same — first thing the runtime writes.
run_failed from MAX_DELIVERIES_EXCEEDED / replay-budget exhaustion (replay-budget.ts): terminal escape hatch; needs to land regardless.
hook_received from resumeHook: deliberately unfenced — fencing would lose the user's signal.

Why the wait_completed fence alone catches the production bug today

The hook/sleep race needs: (1) tick takes sleep branch with stale events, (2) tick writes wait_completed, and (3) tick writes any sleep-branch follow-up events (more sleeps, the next iteration's allocations). My current fence trips at step (2): the wait_completed CAS fails, the retry loop reloads events, sees hook_received, and the next replay picks the hook branch — so step (3)'s sleep-branch writes never happen. That's why the repro went from corrupted to clean.

But that's specific to the hook-vs-sleep shape because wait_completed is on the critical path of the sleep branch. A race that doesn't go through wait_completed (e.g., a branch decision around Promise.race([hook, someStep]) where the workflow allocates a different step depending on which side wins) wouldn't be caught — the stale step_created/hook_created would land without a fence check.

Recommendation

For full coverage of the general "stale-snapshot branch decision" race, fence all 6 sites in the table above. Implementation cost is modest: the suspension-handler already has the events array available (the runtime invokes it with the loaded events); we'd thread the tail eventId in the same way I did for the elapsed-wait scan, and add the same retry-on-fence-conflict pattern to each catch block. run_completed/run_failed need the same plus distinguishing fence-conflict from "run already finished" (mirror of the workflow-server-side fix).

Want me to do it in this PR or keep it as a follow-up? My slight lean is follow-up: the current PR cleanly resolves the production bug and it's already a non-trivial behavioral change; the broader fencing is mostly defensive against shapes we haven't seen in the wild. But the risk of leaving them unfenced is real if we're betting that no other workflow patterns trigger the same race.

VaguelySerious · 2026-05-27T11:11:56Z

You're right, I was sloppy. Let me walk through it.

Your example: `Promise.race([someStep, someHook])`

Timeline that breaks:

T+0: workflow starts, runtime writes step_created for someStep. Suspends.
T+1: step finishes elsewhere. Step executor writes step_completed (eventId timestamp ≈ T+1).
T+3: resumeHook writes hook_received (eventId timestamp ≈ T+3).
T+3.5: tick A is re-invoked, loads events.

If at T+3.5 the eventually-consistent read returns [..., step_created, hook_received] but misses step_completed, then tick A's Promise.race resolves with the hook (because the step's resolution event isn't in its events array). Tick A writes hook-branch follow-up events (say step_created for the post-hook step).

But canonically: step_completed (T+1) < hook_received (T+3). A future replay walks events in eventId order:

consumes step_completed → step subscriber resolves first via promiseQueue
then hook_received → hook subscriber resolves second

→ step wins the race, not hook. Tick A's hook-branch writes are orphan. Future replay tries to take the step branch, hits unconsumed events, fires CORRUPTED_EVENT_LOG.

So yes, the race exists and step_completed is on the critical path.

And my fence doesn't catch this one

This is the embarrassing part. The fence compares run.lastKnownEventId (latest written) against the tick's claim (its loaded tail). At T+3.5:

server run.lastKnownEventId = E_hook (because hook_received was the most-recently-written event, and my server .set() is unconditional, not max())
tick A's fence = E_hook (its loaded tail)
CAS: E_hook == E_hook → passes

The tick wrote based on a stale view of "everything ≤ E_hook" but the server's fence only tracks "latest write," which the tick did see. The fence catches "missed events that came after my tail" (the hook/sleep production shape) but not "missed events that came before my tail."

Why the production fix still held

The hook/sleep production race has the opposite shape: the tick's tail is from T+0 (a wait_created written at workflow start), and the missed event (hook_received) lands at T+5. That missed event has a higher eventId than the tick's tail, so the server's lastKnownEventId advances past it and CAS fails. That's the case the current fence covers.

The shape you're describing — missed event has a lower eventId than what the tick saw — slips past a single-attribute fence.

What would actually catch it

Three options I can think of, none free:

Strongly-consistent reads on every event load instead of the split-read with the 1.5 s freshness window. Catches it definitively. Reads get more expensive; existing perf decisions go out the window.
Server-side gap detection: server checks that the events table has no eventId in the range (tick.fence_value, tick.now()) other than the events the tick has acknowledged. Requires either sending the full set of seen eventIds or a per-run counter the tick increments.
Reject any event whose eventId is < run.lastKnownEventId: makes eventId monotonic per run (no "stragglers"). Eliminates the failure mode but rejects legitimate concurrent writers whose clocks lag — turns into spurious 409s under any clock skew between SDK processes.

(2) is the cleanest but requires server-side work that isn't in this PR's scope. (3) is cheap but trades correctness for skew-tolerance. (1) is correct but a perf regression.

So how should we think about this PR

Honestly: the current fence is a partial fix. It closes the hook/sleep production shape but the general "stale-snapshot branch decision" hazard still exists for races where the missed event is canonically earlier than what the tick saw. The Promise.race([someStep, someHook]) shape you've described is a real instance of that.

I think we should:

Land this PR for the production bug as-is, but be honest in the description that it's the asymmetric coverage above.
File a follow-up to design proper gap-detection (probably option 2) — that requires server-side schema/logic that we shouldn't tack onto this PR.

Want me to update the PR description with this caveat, or take a swing at extending the fence in this PR? If the latter, I'd want to whiteboard the design with you first — none of the three options above are obviously right.

karthikscale3 · 2026-05-27T13:27:41Z

+                            if (!EntityConflictError.is(err)) {
+                              throw err;
+                            }
+                            // Fence conflicts surface a specific error


nit: would be good to add a log message here so we can capture these in datadog

karthikscale3 · 2026-05-27T13:30:16Z

+                            // too, but to guarantee correctness, will be re-tried here directly.
+                            // TODO: We can remove the retry here after extensive validation.
+                            // The cost is low in the meantime.
+                            const isFenceConflict = /fence conflict/i.test(


AI Review: brittle coupling to server error wording. /fence conflict/i.test(err.message) against the free-form 409 message means any rewording on the workflow-server side silently routes fence conflicts into the Wait already completed, skipping branch below — workflows keep moving but stale-snapshot protection is gone, and you'd only notice via the CORRUPTED_EVENT_LOG you're trying to eliminate. Prefer surfacing a typed code from the server (e.g. errorData.code === 'FENCE_CONFLICT' carried on EntityConflictError) and matching on that. Worth doing in the paired server PR so this regex never has to ship.

karthikscale3 · 2026-05-27T13:30:16Z

-                            continue;
+                            if (result.event) {
+                              fenceEventId = result.event.eventId;
+                            }


AI Review: EventResultResolveWireSchema.event is .optional() (packages/world-vercel/src/events.ts:63), so on any response without event you don't refresh fenceEventId and the next iteration's write uses the now-stale tail — that triggers a spurious fence conflict and forces a full reload/backoff cycle for every subsequent wait in waitsToComplete. In practice the server probably always returns the created event, but the type allows the foot-gun. Either tighten the response schema to require event on create, or fall back to a deterministic value when missing.

karthikscale3 · 2026-05-27T13:30:16Z

+                                  !events.some((x) => x.eventId === e.eventId)
+                                ) {
+                                  events.push(e);
+                                }


AI Review: events.some(x => x.eventId === e.eventId) inside the for makes this O(n²) over the existing log on every fence-retry reload. Event logs aren't huge today, but the retry path is exactly where they'll be longest. A Set of existing ids built once before the loop avoids it for free.

karthikscale3 · 2026-05-27T13:30:16Z

+   * `hook_received` after anything the caller could have observed without paying
+   * for a separate read. Ignored when `lastKnownEventId` is also set.
+   */
+  asOfTimestamp?: number;


AI Review: asOfTimestamp is added to the public CreateEventParams and threaded through world-vercel, but no caller in this PR uses it — the docstring points at resumeHook, which the PR explicitly keeps unfenced. Public API surface with no exerciser tends to rot (or drift from the server's interpretation) before its first real caller arrives. Consider dropping it from this PR until resumeHook (or another caller) actually needs it, or wiring up that single caller now so the contract is tested end-to-end.

vercel Bot deployed to Preview – workflow-web May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workflow-tarballs May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workbench-hono-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workbench-nitro-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workbench-express-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workbench-fastify-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workbench-sveltekit-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workbench-vite-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – example-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workbench-astro-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workbench-nuxt-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – workbench-tanstack-start-workflow May 26, 2026 16:10 View deployment

vercel Bot deployed to Preview – example-nextjs-workflow-webpack May 26, 2026 16:11 View deployment

vercel Bot deployed to Preview – example-nextjs-workflow-turbopack May 26, 2026 16:11 View deployment

vercel Bot deployed to Preview – workflow-docs May 26, 2026 16:11 View deployment

vercel Bot reviewed May 26, 2026

View reviewed changes

vercel Bot deployed to Preview – workflow-swc-playground May 26, 2026 16:13 View deployment

resume-hook: drop asOfTimestamp fence (let hook_received always append)

e65e9b0

vercel Bot deployed to Preview – workflow-web May 26, 2026 16:36 View deployment

vercel Bot deployed to Preview – workflow-tarballs May 26, 2026 16:36 View deployment

vercel Bot deployed to Preview – workbench-hono-workflow May 26, 2026 16:36 View deployment

vercel Bot deployed to Preview – workbench-nitro-workflow May 26, 2026 16:36 View deployment

vercel Bot deployed to Preview – workbench-astro-workflow May 26, 2026 16:36 View deployment

vercel Bot deployed to Preview – workbench-fastify-workflow May 26, 2026 16:36 View deployment

vercel Bot deployed to Preview – workbench-vite-workflow May 26, 2026 16:36 View deployment

vercel Bot deployed to Preview – workbench-nitro-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – workbench-hono-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – workbench-express-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – workbench-fastify-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – workbench-sveltekit-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – workbench-astro-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – workbench-vite-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – example-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – workbench-tanstack-start-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – workbench-nuxt-workflow May 26, 2026 19:27 View deployment

vercel Bot deployed to Preview – example-nextjs-workflow-webpack May 26, 2026 19:28 View deployment

vercel Bot deployed to Preview – example-nextjs-workflow-turbopack May 26, 2026 19:28 View deployment

vercel Bot deployed to Preview – workflow-docs May 26, 2026 19:28 View deployment

vercel Bot deployed to Preview – workflow-swc-playground May 26, 2026 19:30 View deployment

vercel Bot deployed to Preview – workflow-web May 27, 2026 08:01 View deployment

vercel Bot deployed to Preview – workflow-tarballs May 27, 2026 08:01 View deployment

vercel Bot deployed to Preview – workbench-fastify-workflow May 27, 2026 08:01 View deployment

vercel Bot deployed to Preview – workbench-nitro-workflow May 27, 2026 08:01 View deployment

vercel Bot deployed to Preview – workbench-hono-workflow May 27, 2026 08:01 View deployment

VaguelySerious commented May 27, 2026

View reviewed changes

Comment thread .changeset/event-write-occ-fence.md Outdated

Comment thread packages/core/src/runtime/resume-hook.ts Outdated

Comment thread packages/world/src/events.ts Outdated

Comment thread packages/world/src/events.ts Outdated

VaguelySerious commented May 27, 2026

View reviewed changes

Comment thread packages/core/src/runtime.ts Outdated

VaguelySerious commented May 27, 2026

View reviewed changes

Comment thread packages/core/src/runtime.ts Outdated

Apply suggestions from code review

ec7cad1

Co-authored-by: Peter Wielander <mittgfu@gmail.com> Signed-off-by: Peter Wielander <mittgfu@gmail.com>

karthikscale3 reviewed May 27, 2026

View reviewed changes

Conversation

VaguelySerious commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Stress reproduction details

Uh oh!

changeset-bot Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 E2E Test Results

Summary

❌ Failed Tests

Details by Category

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Benchmark Results

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

💻 Local Development

▲ Production (Vercel)

Summary

Uh oh!

vercel Bot left a comment

Choose a reason for hiding this comment

Uh oh!

VaguelySerious commented May 27, 2026

Uh oh!

VaguelySerious commented May 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VaguelySerious commented May 26, 2026 •

edited

Loading

changeset-bot Bot commented May 26, 2026 •

edited

Loading

vercel Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

Your example: `Promise.race([someStep, someHook])`