feat: lazy per-route cache seeding for Workers#653
feat: lazy per-route cache seeding for Workers#653NathanDrake2406 wants to merge 4 commits intocloudflare:mainfrom
Conversation
commit: |
3e588f6 to
1b23050
Compare
- Only seed routes with router: "app" — Pages Router uses different keys (pages:...) and value shape (kind: "PAGES"), so seeding them as APP_PAGE entries would be unreachable at runtime. - Routes without a router field are skipped defensively (pre-cloudflare#653 manifests lack this field, but also lack buildId so populateKV would already skip). - Remove --app-prefix CLI flag — it writes prefixed keys without wiring the same prefix into the generated runtime, creating silently unreadable cache entries. Keep appPrefix on the programmatic PopulateKVOptions for advanced users who configure both sides manually.
- Only seed routes with router: "app" — Pages Router uses different keys (pages:...) and value shape (kind: "PAGES"), so seeding them as APP_PAGE entries would be unreachable at runtime. - Routes without a router field are skipped defensively (pre-cloudflare#653 manifests lack this field, but also lack buildId so populateKV would already skip). - Remove --app-prefix CLI flag — it writes prefixed keys without wiring the same prefix into the generated runtime, creating silently unreadable cache entries. Keep appPrefix on the programmatic PopulateKVOptions for advanced users who configure both sides manually.
- Only seed routes with router: "app" — Pages Router uses different keys (pages:...) and value shape (kind: "PAGES"), so seeding them as APP_PAGE entries would be unreachable at runtime. - Routes without a router field are skipped defensively (pre-cloudflare#653 manifests lack this field, but also lack buildId so populateKV would already skip). - Remove --app-prefix CLI flag — it writes prefixed keys without wiring the same prefix into the generated runtime, creating silently unreadable cache entries. Keep appPrefix on the programmatic PopulateKVOptions for advanced users who configure both sides manually.
On Cloudflare Workers with MemoryCacheHandler (the default when KV is not configured), pre-rendered routes now seed the memory cache on first request via the assets binding instead of triggering a full re-render. - Add seed-cache-workers.ts with lazy per-route seeding, manifest caching per isolate, and concurrent request dedup - Copy prerender files to dist/client/__prerender/ during deploy so they're available via env.ASSETS.fetch() - Add seeding call to generated App Router worker entry - 9 unit tests covering seeding, dedup, manifest caching, and graceful degradation
The Node.js and Workers seed-cache modules duplicated manifest types, revalidateCtx, and cache value construction. Extract to seed-cache-shared.ts to prevent drift between the two strategies. Also fixes stale JSDoc referencing "manifestPromise" (renamed to "loadedPromise").
1b23050 to
8011d47
Compare
|
/bigbonk review |
There was a problem hiding this comment.
Good PR — the lazy per-route seeding approach is well-suited for Workers' ephemeral isolate model, and the shared extraction from seed-cache.ts is a clean refactor. The test coverage is solid. A few issues to address, one of which could cause request failures in production.
Summary of findings:
- P0 (error resilience):
doSeedRoutehas no try/catch, and the generated worker entryawaits without a safety net — a transientenv.ASSETS.fetch()failure orhandler.set()error would crash the request instead of gracefully degrading. - P1 (permanent null caching):
loadManifestpermanently cachesnullon transient fetch failure — the isolate can never seed any route after a single hiccup. - Minor:
copyPrerenderToAssetsflattens the manifest into the same directory as the route files, which works but could cause filename collisions on a route literally named/vinext-prerender(extremely unlikely, but worth noting).
| baseKey, | ||
| revalidateSeconds, | ||
| fetchAsset, | ||
| ).finally(() => seedInFlight.delete(htmlKey)); |
There was a problem hiding this comment.
Error resilience: doSeedRoute can throw on fetchAsset() network errors, handler.set() failures, etc. Since the generated worker entry (line 503 of deploy.ts) does await seedRouteFromAssets(...) without try/catch, any throw here becomes an unhandled error that crashes the request with a 500.
Seeding is a best-effort optimization — a failure should log and degrade gracefully, not break the request. Wrap this in try/catch:
| ).finally(() => seedInFlight.delete(htmlKey)); | |
| const promise = doSeedRoute( | |
| loaded.manifest, | |
| pathname, | |
| baseKey, | |
| revalidateSeconds, | |
| fetchAsset, | |
| ).catch(() => {/* seeding is best-effort */}).finally(() => seedInFlight.delete(htmlKey)); |
Alternatively (and perhaps better), wrap the entire seedRouteFromAssets body in try/catch so that any unexpected error — including from getCacheHandler().get() — is swallowed. The loadManifest function already follows this pattern.
|
|
||
| return { manifest, lookup }; | ||
| } catch { | ||
| return null; |
There was a problem hiding this comment.
Permanent null caching on transient failure: If fetchAsset("/__prerender/vinext-prerender.json") fails transiently (e.g., a brief ASSETS binding hiccup, timeout), loadedPromise is permanently set to a promise that resolves to null. Every subsequent call to seedRouteFromAssets in this isolate will get null and no-op — the isolate can never recover.
Since the manifest is immutable per deployment, a permanent cache of the successful result is correct. But caching the failure permanently is not — a retry on the next request would likely succeed.
Consider only caching successful results:
| return null; | |
| loadedPromise = (async () => { | |
| try { | |
| const res = await fetchAsset("/__prerender/vinext-prerender.json"); | |
| if (!res.ok) return null; | |
| const manifest: PrerenderManifest = await res.json(); | |
| if (!manifest.buildId || !Array.isArray(manifest.routes)) return null; | |
| const lookup = new Map<string, PrerenderManifestRoute>(); | |
| for (const route of manifest.routes) { | |
| if (route.status !== "rendered") continue; | |
| if (route.router !== "app") continue; | |
| lookup.set(route.path ?? route.route, route); | |
| } | |
| return { manifest, lookup }; | |
| } catch { | |
| loadedPromise = null; // allow retry on next request | |
| return null; | |
| } | |
| })(); |
Note: the !res.ok and validation-failure cases are arguably permanent (malformed deployment), so you could choose to only reset loadedPromise in the catch branch. Either way, the catch case (network/timeout) should not be permanently cached.
| const sourceDir = path.join(serverDir, "prerendered-routes"); | ||
| if (!fs.existsSync(sourceDir)) return; | ||
|
|
||
| fs.cpSync(sourceDir, targetDir, { recursive: true }); |
There was a problem hiding this comment.
Nit: fs.cpSync(sourceDir, targetDir, { recursive: true }) copies the contents of sourceDir into targetDir where targetDir already exists. This means vinext-prerender.json (copied on line 1219) and route files (e.g., about.html) all live at the same level under __prerender/. This works, but means a pre-rendered route literally named /vinext-prerender (producing vinext-prerender.html) would coexist fine — just calling it out.
A more robust layout would be __prerender/manifest.json + __prerender/routes/..., but that's a future cleanup, not a blocker.
| } | ||
|
|
||
| /** Build an APP_PAGE cache value for an RSC entry. */ | ||
| export function makeRscCacheValue(rscData: ArrayBuffer): CachedAppPageValue { |
There was a problem hiding this comment.
The rscData parameter is typed as ArrayBuffer, which matches the CachedAppPageValue field type. But in seed-cache.ts:142-145, the Node.js codepath does rscBuffer.buffer.slice(byteOffset, byteOffset + byteLength) to extract a clean ArrayBuffer from the Node Buffer. Worth verifying that the ArrayBuffer from Response.arrayBuffer() in the Workers path (line 179) doesn't need similar treatment — in practice Workers' Response.arrayBuffer() returns a standalone ArrayBuffer with byteOffset === 0, so this should be fine. Just flagging the asymmetry.
| // On first request for a pre-rendered page, fetches HTML/RSC from the | ||
| // assets binding and inserts into the cache. Subsequent requests in the | ||
| // same isolate get cache HITs. No-op for non-prerendered routes. | ||
| await seedRouteFromAssets(url.pathname, (p) => env.ASSETS.fetch(new Request(new URL(p, request.url)))); |
There was a problem hiding this comment.
This await is on the critical path of every request — it blocks the response until seeding completes for pre-rendered routes. For the first cold request this adds the latency of two env.ASSETS.fetch() calls (HTML + RSC) plus two handler.set() calls.
This is probably acceptable since env.ASSETS.fetch() is local to the colo and fast, but worth a comment noting that this is intentionally blocking (to guarantee the cache is populated before the RSC handler runs). An alternative would be to let the RSC handler serve the request normally while seeding in the background via ctx.waitUntil(), but that would mean the first request always re-renders — so blocking is the right call here.
Also: should this be wrapped in try/catch as defense-in-depth? If seedRouteFromAssets throws (see other comment), the request dies.
| await seedRouteFromAssets(url.pathname, (p) => env.ASSETS.fetch(new Request(new URL(p, request.url)))); | |
| // Seed the memory cache from pre-rendered assets (lazy, per-route). | |
| // On first request for a pre-rendered page, fetches HTML/RSC from the | |
| // assets binding and inserts into the cache. Subsequent requests in the | |
| // same isolate get cache HITs. No-op for non-prerendered routes. | |
| // Blocking (not waitUntil) so the RSC handler sees the seeded cache. | |
| try { | |
| await seedRouteFromAssets(url.pathname, (p) => env.ASSETS.fetch(new Request(new URL(p, request.url)))); | |
| } catch { /* seeding is best-effort — fall through to normal render */ } |
|
|
||
| // ── Concurrent dedup ────────────────────────────────────────────────────── | ||
|
|
||
| it("deduplicates concurrent seed requests for the same route", async () => { |
There was a problem hiding this comment.
Good test, but the dedup assertion depends on the mock fetcher resolving synchronously (via microtask). In a real Workers environment, env.ASSETS.fetch() would yield to the event loop, and the timing of when the second caller hits the seedInFlight.get() check vs the getCacheHandler().get() check could differ.
Consider adding a test variant where the mock fetcher introduces a small delay (e.g., a Promise that resolves after a setTimeout(0)) to verify the dedup still works when fetches genuinely yield. This would give more confidence in the concurrency model.
|
Review submitted successfully. Here's a summary of the feedback I posted: Key findings:
|
… seeding - seedRouteFromAssets never throws — errors degrade gracefully instead of crashing the request with a 500 - doSeedRoute failures are caught via .catch() on the shared promise, protecting both the originator and joiners via seedInFlight - loadManifest resets loadedPromise on transient failure (network error, timeout) so the isolate can retry on the next request; permanent failures (404, invalid JSON) remain cached - Added tests: error resilience (fetch failure, cache write failure), transient retry, permanent 404 caching, delayed async dedup
|
@james-elicx been a good run bro! https://x.com/nextjs/status/2037186404116291603?s=46 |
|
/bigbonk review |
Not over yet - the Adapter API has been in the works for a long time and we've discussed it on many occasions (I work on OpenNext as well) |
There was a problem hiding this comment.
Well-structured PR. The lazy per-route seeding approach is a good fit for Workers' ephemeral isolate model, and the previous review's P0/P1 findings have been addressed — seedRouteFromAssets now has a top-level try/catch, doSeedRoute has .catch(() => {}), and loadManifest resets loadedPromise = null on transient failures while caching permanent failures (404, malformed JSON). The shared extraction into seed-cache-shared.ts is clean. Test coverage is thorough, including the error resilience and transient retry scenarios that were missing from the first revision.
A few observations, none blocking:
-
Concurrency comment accuracy (line 96-97): The comment says "between the await above and this sync check, no other microtask can interleave" — this is correct for the JS execution model, but worth noting that two requests can both pass the
getCacheHandler().get()check if they interleave on that await. The second caller correctly falls into theseedInFlight.get()dedup path, so the behavior is correct. The comment could be slightly clearer that the dedup window is between the.get()await resumption and theseedInFlight.set()call. -
doSeedRouteearly return on RSC 404: When the RSC fetch returns 404,doSeedRoutereturns early without logging. This is fine for the "RSC file missing" case (tested), but it means a transient RSC fetch failure (caught by.catch()) is indistinguishable from a permanent RSC absence. Not a problem in practice sinceenv.ASSETS.fetch()is local, but worth knowing. -
Pages Router: The seeding is App Router only (
route.router !== "app"is skipped), and the Pages Router worker entry doesn't include theseedRouteFromAssetscall. This is correct for the current scope.
| if (existing) return; | ||
|
|
||
| // Dedup concurrent cold hits — between the await above and this sync check, | ||
| // no other microtask can interleave, so at most one caller creates the promise. |
There was a problem hiding this comment.
Nit: The comment says "no other microtask can interleave" between the await getCacheHandler().get() (line 93) and seedInFlight.get() (line 98). This is true — after the .get() await resumes, execution runs synchronously until the next await. But the scenario the dedup protects against is two requests that both pass the .get() check (because both interleave on that await and both see no cache entry). The comment could make this clearer:
| // no other microtask can interleave, so at most one caller creates the promise. | |
| // Dedup concurrent cold hits — two callers can both pass the cache-miss | |
| // check above (they interleave on the await), but after resuming from | |
| // await, execution is synchronous until the next yield. So at most one | |
| // caller creates the seeding promise; the second joins it here. |
| // Fetch and seed RSC (optional) | ||
| const rscRelPath = getRscOutputPath(pathname); | ||
| const rscRes = await fetchAsset(`/__prerender/${rscRelPath}`); | ||
| if (!rscRes.ok) return; |
There was a problem hiding this comment.
Minor: When RSC fetch returns a non-OK response, doSeedRoute returns early. This means the HTML is seeded but RSC is not — which is the correct behavior (tested in "seeds HTML even when RSC file is missing"). However, the return here silently swallows the case where RSC should exist but the fetch had a transient error. Since the HTML is already cached at this point, a subsequent call to seedRouteFromAssets will see the HTML cache hit and no-op, so the RSC will never be retried for this isolate.
This is acceptable for the env.ASSETS.fetch() use case (local, reliable), but if the seeding mechanism is ever used with a remote asset store, you'd want to either:
- Seed HTML and RSC independently (so an RSC failure doesn't prevent RSC retry)
- Or track per-key seeding rather than per-route
Not a blocker — just something to keep in mind for #562.
| const sourceDir = path.join(serverDir, "prerendered-routes"); | ||
| if (!fs.existsSync(sourceDir)) return; | ||
|
|
||
| fs.cpSync(sourceDir, targetDir, { recursive: true }); |
There was a problem hiding this comment.
Observation: fs.cpSync(sourceDir, targetDir) merges the contents of prerendered-routes/ into __prerender/, which already contains vinext-prerender.json. Since route files are .html/.rsc and the manifest is .json, there's no collision risk. But if a future route produces a .json output file, it could theoretically overwrite the manifest. Very unlikely, but a __prerender/manifest/ vs __prerender/routes/ split would make this impossible. Low priority.
| } | ||
|
|
||
| /** Build an APP_PAGE cache value for an RSC entry. */ | ||
| export function makeRscCacheValue(rscData: ArrayBuffer): CachedAppPageValue { |
There was a problem hiding this comment.
Worth noting for future readers: rscData here accepts an ArrayBuffer directly. In the Workers codepath (seed-cache-workers.ts:187), Response.arrayBuffer() returns a standalone ArrayBuffer with byteOffset === 0, so no slicing is needed. In the Node.js codepath (seed-cache.ts:142-145), fs.readFileSync() returns a Buffer whose underlying ArrayBuffer may have a non-zero byteOffset, so the slice is required there. The asymmetry is correct but easy to miss.
|
Review submitted successfully. Here's a summary of what I posted: Verdict: Approved The PR addresses all P0/P1 findings from the previous review — error resilience is now comprehensive with a top-level try/catch in 4 inline comments posted (none blocking):
|
- Resolve conflict in seed-cache.ts (keep shared module imports over upstream's inline types, which were changed by the interface→type lint rule) - Apply interface→type lint rule to seed-cache-shared.ts and seed-cache-workers.ts to match upstream convention - Improve dedup comment in seed-cache-workers.ts to clarify why two callers can both pass the cache-miss check (they interleave on the await) per review feedback
Summary
MemoryCacheHandler(the default when KV is not configured), pre-rendered routes now lazily seed the memory cache on first request viaenv.ASSETS.fetch()Pre-requisite
Depends on #645 for the manifest format changes (
buildId,router,trailingSlashfields).How it works
deploy.ts): After prerendering, copiesvinext-prerender.jsonand HTML/RSC files todist/client/__prerender/so they're deployed as static assetsseed-cache-workers.ts): On each request, checks if the route has prerendered data. On first miss, fetches HTML/RSC from the assets binding and populatesMemoryCacheHandler. Subsequent requests in the same isolate get cache HITsseedRouteFromAssets()before delegating to the RSC handlerChanges
src/server/seed-cache-workers.tssrc/deploy.tspackage.json./server/seed-cache-workerstests/seed-cache-workers.test.tsScope
This is Workers memory stub support only, not the final durable caching story. For persistent caching, users should use
KVCacheHandler(opt-in) with TPR for deploy-time KV population. Issue #562 tracks the generalised remote cache seeding.Ref #561
Test plan