Add recycle signal to prerenderers on host update#5205
Conversation
A prerender pool page holds the host bundle it warmed against for its lifetime, so after a host deploy a page can keep rendering against a stale bundle (and, in the incident this addresses, poison the public modules cache). This makes the prerender fleet drop its host when the host changes, coordinated through the manager. - The realm server reports the host-shell token it serves (an md5 of the fetched index.html) to the prerender manager at boot. A deploy restarts the realm server, so the new bundle is reported here. Best-effort; a missing/unreachable manager never blocks boot. - The manager stores the latest reported token and echoes it on every heartbeat response (new X-Boxel-Prerender-Host-Shell-Hash header). Storing the token (not a counter) keeps this robust across the manager's own restart in the deploy train — the next realm-server boot re-reports it. - Each prerender server records the token it warmed against and, when a heartbeat reports a different one, recycles its browser (Prerenderer.recycle() → closeAll + restart Chrome + re-warm, which also clears Chrome's HTTP cache so re-warmed pages load the new shell). The first token seen is adopted as a baseline without recycling. Ordering is structural: the realm server only reports once IT is serving the new shell (and realm-server restart is the deploy train's last step), so prerender pages that warmed against the old shell are recycled only after the new shell is actually being served. Tests: manager echoes/updates the reported token on heartbeats and requires a hash; unit tests for the server's recycle decision. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Preview deploymentsHost Test Results 1 files ±0 1 suites ±0 1h 37m 29s ⏱️ - 1m 37s Results for commit be64529. ± Comparison against earlier commit 67a1cff. Realm Server Test Results 1 files ±0 1 suites ±0 11m 26s ⏱️ +7s Results for commit be64529. ± Comparison against earlier commit 67a1cff. |
The prerender service deploys before the realm server (the manager, worker, and realm server all depend on the prerender fleet being up for boot indexing), so its tabs warm against the host shell the realm server was serving at that earlier point — the old bundle. Add a final recycle-prerender job, gated on post-deploy-realm-server, that re-deploys the prerender service once the realm server is up serving the new shell. The reusable ecs-deploy workflow always passes force-new-deployment, so this rolls fresh tasks (which re-warm against the new shell) even though the prerender image is unchanged. A deploy-side safety net that complements the in-process recycle (the prerender server recycles when the manager reports a new host-shell token); this covers the common full-deploy case without depending on the heartbeat round-trip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eploy hook recycle-prerender was gated on post-deploy-realm-server, which only POSTs the realm server's /_post-deployment endpoint and can fail on its own (observed: an edge 403). That skipped the recycle. Gate on deploy-realm-server instead — it waits for service stability, so the new realm server is up and serving the new shell by then, and the recycle no longer depends on the flaky post-deploy endpoint call. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a host-shell “version token” that propagates from the realm server → prerender manager → prerender servers, so prerender servers can recycle their browser when the host shell changes after a deploy (addressing stale-bundle prerendering and related cache poisoning).
Changes:
- Add a
/host-shellreporting endpoint to the prerender manager and echo the latest host-shell token on heartbeat responses viaX-Boxel-Prerender-Host-Shell-Hash. - Teach prerender servers to reconcile the manager-reported host-shell token and recycle the browser when the token changes.
- Update manual deploy workflow to redeploy (recycle) prerender servers after the realm server is stable.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/realm-server/tests/prerender-manager-test.ts | Adds test coverage for manager host-shell reporting + heartbeat echo behavior. |
| packages/realm-server/tests/prerender-host-shell-recycle-test.ts | Adds unit tests for the prerender server’s recycle decision logic. |
| packages/realm-server/tests/index.ts | Registers the new recycle-decision unit test file. |
| packages/realm-server/prerender/prerenderer.ts | Adds Prerenderer.recycle() to reuse the browser restart path for host updates. |
| packages/realm-server/prerender/prerender-constants.ts | Introduces the new host-shell hash header constant. |
| packages/realm-server/prerender/prerender-app.ts | Implements host-shell reconciliation on heartbeats and exports pure decision logic for testing. |
| packages/realm-server/prerender/manager-app.ts | Stores latest host-shell hash and echoes it on heartbeats; adds /host-shell route. |
| packages/realm-server/main.ts | Reports the current host-shell token to the prerender manager at realm-server boot. |
| .github/workflows/manual-deploy.yml | Adds a post-realm-server “recycle prerender” ECS deploy step. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Trim the reported host-shell token and reject values over 64 chars before storing it. The token is echoed into a response header on every heartbeat, so a whitespace variant would spuriously read as a change and an oversized value would bloat every heartbeat response. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A prerender server learned its host-shell baseline from the first heartbeat that carried a token, treating `warmed === undefined` as "just warmed against the current shell." When a server warmed against an old shell but its first token-carrying heartbeat reported a newer one, it adopted the new token without recycling and stayed pinned to the stale shell. Seed the baseline at startup from the realm server the prerender loads its shell from: the realm server now serves its current host-shell token at GET /_host-shell-hash (the same value it reports to the manager, computed via a shared helper), and the prerender fetches it before the first heartbeat. The baseline is then the token actually warmed against, so a later change correctly recycles. Best-effort: if the seed fails the server falls back to adopting the first reported token. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
reportHostShellToManager resolved the manager URL via PRERENDER_MANAGER_URL, which is only set on the prerender-server tasks. On the realm server that env var is unset, so it fell back to localhost and the report never reached the manager (fetch failed at boot). The manager therefore never learned the new host-shell token and prerender servers never recycled via heartbeat. Use prerendererUrl — the manager address the realm server already sends prerender requests to — so there is a single source of truth. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
you need some help getting these tests to pass? I think it might be specific to this branch, these have been more stable lately. |
it definitely is, it was working when I submitted for review, but after an updated related to Copilot feedback, it’s broken. I’m going to open a different branch for diagnostics |
The seed's startup fetch to `${boxelHostURL}/_host-shell-hash` wedged the
software-factory Playwright harness, where the prerender's boxelHostURL points
at the compat proxy whose port is reserved/rebound during fixture setup; the
extra startup connection stalled the compat-proxy bring-up past its 300s budget
(every SF test failed). Confirmed by bisect: reverting only the seed makes SF
pass in normal time.
The seed addressed a narrow `warmed === undefined` edge in decideHostShellRecycle;
reverting restores the safe adopt-first-token fallback. Base CS-11468 (heartbeat
recycle, /host-shell hardening) and the manager-URL report fix are kept.
This reverts the seed portion of e6b3881; main.ts retains the prerendererUrl
report target.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 67a1cffdde
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…oy hook The boot-time host-shell report fired right after fetching the host dist, before the realm server was listening. In a rolling deploy the manager could echo the new token while the load balancer still routed to the old task, so a prerender would recycle against the old shell, record the new token, and stop retrying — leaving stale tabs on any path relying on the heartbeat signal. Move the boot report to after server.start() (the listener is then serving the new shell), and also report from the post-deployment hook, which runs once the deploy reports the service stable and load-balancer-routable. The boot report keeps manager-restart resilience; the post-deploy report closes the rolling- deploy window. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This is a different approach to solve the problem #5094 was about, where the Checkly rehydration check has failed periodically in staging, and recently in production for 3.5h. The SSR version looks fine, but when Ember
hosttakes over, a runtime error shows:Lengthy Claude explanation of the problem
A prerender pool page holds the host bundle it warmed against for its lifetime, so after a host deploy a page can keep rendering against a stale bundle (and, in the incident this addresses, poison the public modules cache). This makes the prerender fleet drop its host when the host changes, coordinated through the manager.Ordering is structural: the realm server only reports once IT is serving the new shell (and realm-server restart is the deploy train's last step), so prerender pages that warmed against the old shell are recycled only after the new shell is actually being served.
Tests: manager echoes/updates the reported token on heartbeats and requires a hash; unit tests for the server's recycle decision.
This adds a step in the deployment pipeline that tells the prerenderers to refetch
hostwhen it’s been updated, so this problem should no longer happen.