Skip to content

Add recycle signal to prerenderers on host update#5205

Open
backspace wants to merge 21 commits into
mainfrom
prerender-refetch-host-cs-11468
Open

Add recycle signal to prerenderers on host update#5205
backspace wants to merge 21 commits into
mainfrom
prerender-refetch-host-cs-11468

Conversation

@backspace

@backspace backspace commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

This is a different approach to solve the problem #5094 was about, where the Checkly rehydration check has failed periodically in staging, and recently in production for 3.5h. The SSR version looks fine, but when Ember host takes over, a runtime error shows:

CleanShot 2026-06-09 at 08 38 14@2x
Lengthy Claude explanation of the problemA prerender pool page holds the host bundle it warmed against for its lifetime, so after a host deploy a page can keep rendering against a stale bundle (and, in the incident this addresses, poison the public modules cache). This makes the prerender fleet drop its host when the host changes, coordinated through the manager.
  • The realm server reports the host-shell token it serves (an md5 of the fetched index.html) to the prerender manager at boot. A deploy restarts the realm server, so the new bundle is reported here. Best-effort; a missing/unreachable manager never blocks boot.
  • The manager stores the latest reported token and echoes it on every heartbeat response (new X-Boxel-Prerender-Host-Shell-Hash header). Storing the token (not a counter) keeps this robust across the manager's own restart in the deploy train — the next realm-server boot re-reports it.
  • Each prerender server records the token it warmed against and, when a heartbeat reports a different one, recycles its browser (Prerenderer.recycle() → closeAll + restart Chrome + re-warm, which also clears Chrome's HTTP cache so re-warmed pages load the new shell). The first token seen is adopted as a baseline without recycling.

Ordering is structural: the realm server only reports once IT is serving the new shell (and realm-server restart is the deploy train's last step), so prerender pages that warmed against the old shell are recycled only after the new shell is actually being served.

Tests: manager echoes/updates the reported token on heartbeats and requires a hash; unit tests for the server's recycle decision.

This adds a step in the deployment pipeline that tells the prerenderers to refetch host when it’s been updated, so this problem should no longer happen.

A prerender pool page holds the host bundle it warmed against for its
lifetime, so after a host deploy a page can keep rendering against a
stale bundle (and, in the incident this addresses, poison the public
modules cache). This makes the prerender fleet drop its host when the
host changes, coordinated through the manager.

- The realm server reports the host-shell token it serves (an md5 of the
  fetched index.html) to the prerender manager at boot. A deploy restarts
  the realm server, so the new bundle is reported here. Best-effort; a
  missing/unreachable manager never blocks boot.
- The manager stores the latest reported token and echoes it on every
  heartbeat response (new X-Boxel-Prerender-Host-Shell-Hash header).
  Storing the token (not a counter) keeps this robust across the
  manager's own restart in the deploy train — the next realm-server boot
  re-reports it.
- Each prerender server records the token it warmed against and, when a
  heartbeat reports a different one, recycles its browser
  (Prerenderer.recycle() → closeAll + restart Chrome + re-warm, which also
  clears Chrome's HTTP cache so re-warmed pages load the new shell). The
  first token seen is adopted as a baseline without recycling.

Ordering is structural: the realm server only reports once IT is serving
the new shell (and realm-server restart is the deploy train's last step),
so prerender pages that warmed against the old shell are recycled only
after the new shell is actually being served.

Tests: manager echoes/updates the reported token on heartbeats and
requires a hash; unit tests for the server's recycle decision.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Preview deployments

Host Test Results

    1 files  ±0      1 suites  ±0   1h 37m 29s ⏱️ - 1m 37s
3 078 tests ±0  3 063 ✅ ±0  15 💤 ±0  0 ❌ ±0 
3 097 runs  ±0  3 082 ✅ ±0  15 💤 ±0  0 ❌ ±0 

Results for commit be64529. ± Comparison against earlier commit 67a1cff.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   11m 26s ⏱️ +7s
1 721 tests ±0  1 721 ✅ ±0  0 💤 ±0  0 ❌ ±0 
1 814 runs  ±0  1 814 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit be64529. ± Comparison against earlier commit 67a1cff.

backspace and others added 3 commits June 11, 2026 12:01
The prerender service deploys before the realm server (the manager,
worker, and realm server all depend on the prerender fleet being up for
boot indexing), so its tabs warm against the host shell the realm server
was serving at that earlier point — the old bundle. Add a final
recycle-prerender job, gated on post-deploy-realm-server, that re-deploys
the prerender service once the realm server is up serving the new shell.
The reusable ecs-deploy workflow always passes force-new-deployment, so
this rolls fresh tasks (which re-warm against the new shell) even though
the prerender image is unchanged.

A deploy-side safety net that complements the in-process recycle (the
prerender server recycles when the manager reports a new host-shell
token); this covers the common full-deploy case without depending on the
heartbeat round-trip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
backspace and others added 2 commits June 11, 2026 15:34
…eploy hook

recycle-prerender was gated on post-deploy-realm-server, which only POSTs
the realm server's /_post-deployment endpoint and can fail on its own
(observed: an edge 403). That skipped the recycle. Gate on
deploy-realm-server instead — it waits for service stability, so the new
realm server is up and serving the new shell by then, and the recycle no
longer depends on the flaky post-deploy endpoint call.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@backspace backspace changed the title Recycle prerender browsers when the host is redeployed Add recycle signal to prerenders on host update Jun 12, 2026
@backspace backspace marked this pull request as ready for review June 12, 2026 15:58
@backspace backspace changed the title Add recycle signal to prerenders on host update Add recycle signal to prerenderers on host update Jun 12, 2026
@backspace backspace requested a review from a team June 12, 2026 17:43
@habdelra habdelra requested a review from Copilot June 12, 2026 21:43

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a host-shell “version token” that propagates from the realm server → prerender manager → prerender servers, so prerender servers can recycle their browser when the host shell changes after a deploy (addressing stale-bundle prerendering and related cache poisoning).

Changes:

  • Add a /host-shell reporting endpoint to the prerender manager and echo the latest host-shell token on heartbeat responses via X-Boxel-Prerender-Host-Shell-Hash.
  • Teach prerender servers to reconcile the manager-reported host-shell token and recycle the browser when the token changes.
  • Update manual deploy workflow to redeploy (recycle) prerender servers after the realm server is stable.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/realm-server/tests/prerender-manager-test.ts Adds test coverage for manager host-shell reporting + heartbeat echo behavior.
packages/realm-server/tests/prerender-host-shell-recycle-test.ts Adds unit tests for the prerender server’s recycle decision logic.
packages/realm-server/tests/index.ts Registers the new recycle-decision unit test file.
packages/realm-server/prerender/prerenderer.ts Adds Prerenderer.recycle() to reuse the browser restart path for host updates.
packages/realm-server/prerender/prerender-constants.ts Introduces the new host-shell hash header constant.
packages/realm-server/prerender/prerender-app.ts Implements host-shell reconciliation on heartbeats and exports pure decision logic for testing.
packages/realm-server/prerender/manager-app.ts Stores latest host-shell hash and echoes it on heartbeats; adds /host-shell route.
packages/realm-server/main.ts Reports the current host-shell token to the prerender manager at realm-server boot.
.github/workflows/manual-deploy.yml Adds a post-realm-server “recycle prerender” ECS deploy step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/realm-server/prerender/manager-app.ts
Comment thread packages/realm-server/prerender/prerender-app.ts
Comment thread packages/realm-server/tests/prerender-host-shell-recycle-test.ts
backspace and others added 4 commits June 15, 2026 07:39
Trim the reported host-shell token and reject values over 64 chars
before storing it. The token is echoed into a response header on every
heartbeat, so a whitespace variant would spuriously read as a change and
an oversized value would bloat every heartbeat response.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A prerender server learned its host-shell baseline from the first
heartbeat that carried a token, treating `warmed === undefined` as
"just warmed against the current shell." When a server warmed against an
old shell but its first token-carrying heartbeat reported a newer one, it
adopted the new token without recycling and stayed pinned to the stale
shell.

Seed the baseline at startup from the realm server the prerender loads
its shell from: the realm server now serves its current host-shell token
at GET /_host-shell-hash (the same value it reports to the manager,
computed via a shared helper), and the prerender fetches it before the
first heartbeat. The baseline is then the token actually warmed against,
so a later change correctly recycles. Best-effort: if the seed fails the
server falls back to adopting the first reported token.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
reportHostShellToManager resolved the manager URL via PRERENDER_MANAGER_URL,
which is only set on the prerender-server tasks. On the realm server that env
var is unset, so it fell back to localhost and the report never reached the
manager (fetch failed at boot). The manager therefore never learned the new
host-shell token and prerender servers never recycled via heartbeat.

Use prerendererUrl — the manager address the realm server already sends
prerender requests to — so there is a single source of truth.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@backspace backspace marked this pull request as draft June 15, 2026 20:01
@habdelra

habdelra commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

you need some help getting these tests to pass? I think it might be specific to this branch, these have been more stable lately.

@backspace

Copy link
Copy Markdown
Contributor Author

you need some help getting these tests to pass? I think it might be specific to this branch, these have been more stable lately.

it definitely is, it was working when I submitted for review, but after an updated related to Copilot feedback, it’s broken. I’m going to open a different branch for diagnostics

The seed's startup fetch to `${boxelHostURL}/_host-shell-hash` wedged the
software-factory Playwright harness, where the prerender's boxelHostURL points
at the compat proxy whose port is reserved/rebound during fixture setup; the
extra startup connection stalled the compat-proxy bring-up past its 300s budget
(every SF test failed). Confirmed by bisect: reverting only the seed makes SF
pass in normal time.

The seed addressed a narrow `warmed === undefined` edge in decideHostShellRecycle;
reverting restores the safe adopt-first-token fallback. Base CS-11468 (heartbeat
recycle, /host-shell hardening) and the manager-URL report fix are kept.

This reverts the seed portion of e6b3881; main.ts retains the prerendererUrl
report target.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@backspace backspace marked this pull request as ready for review June 16, 2026 00:01

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67a1cffdde

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/realm-server/main.ts Outdated
…oy hook

The boot-time host-shell report fired right after fetching the host dist,
before the realm server was listening. In a rolling deploy the manager could
echo the new token while the load balancer still routed to the old task, so a
prerender would recycle against the old shell, record the new token, and stop
retrying — leaving stale tabs on any path relying on the heartbeat signal.

Move the boot report to after server.start() (the listener is then serving the
new shell), and also report from the post-deployment hook, which runs once the
deploy reports the service stable and load-balancer-routable. The boot report
keeps manager-restart resilience; the post-deploy report closes the rolling-
deploy window.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants