ci: Change host tests to run in environment mode#5008
Conversation
Dev realms that are public-readable in standard mode (skills, catalog, experiments) returned 401 to unauthenticated readers in environment mode. Their public-read grants are seeded by static-URL migrations keyed on localhost:4201, but environment mode mounts the same realms at realm-server.<slug>.localhost URLs that no migration row matches. The base realm is unaffected because its public grant is keyed on the canonical https://cardstack.com/base/ URL. Add an environment-mode-only step to the boot-time registry backfill that mirrors the already-public set (matched by URL path) onto each bootstrap realm's actual URL. It is idempotent and only grants read on paths already declared public by policy, so it cannot expose a realm that is meant to stay private. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Boots the realm-server stack under BOXEL_ENVIRONMENT (Traefik, *.localhost hostnames, per-slug database, dynamic ports) and asserts that the skills realm — public in standard mode — also returns 200 to an unauthenticated reader here, the parity the host integration tests depend on. This is a deliberately small first step: it de-risks the env-mode CI infrastructure (Traefik in CI, *.localhost resolution, env-mode boot) before converting the full sharded host suite, and it guards the public-read parity restored in the previous commit. A fixed "ci" slug is safe because each job runs on an isolated VM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preview deploymentsHost Test Results 1 files ±0 1 suites ±0 1h 54m 55s ⏱️ +49s Results for commit 3562962. ± Comparison against earlier commit 50fa6bb. For more details on these errors, see this check. Realm Server Test Results 1 files ±0 1 suites ±0 10m 15s ⏱️ +4s Results for commit 3562962. ± Comparison against earlier commit 50fa6bb. |
Extends the environment-mode proof-of-concept to run the realm-touching suite that the ticket reported failing in env mode, against the env-mode dist. Validates the fix beyond the public-read curl check and is the first step toward running the full host suite in environment mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bare 'ember' invocation failed with 'No such file or directory' because node_modules/.bin was not on PATH; route through pnpm exec like the other host test steps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The public-read parity assertion is the job's gate; running the realm-touching suite in env mode still surfaces harness issues unrelated to that fix, so keep the slice visible but non-blocking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probes the prerender's failing cross-origin module fetch (host.ci.localhost -> realm-server.ci.localhost/base/card-api) to determine whether the realm-server behind Traefik emits an Access-Control-Allow-Origin header for a cross-subdomain origin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Boots the realm-server test stack under BOXEL_ENVIRONMENT (Traefik, *.localhost hostnames, per-slug paths) and asserts the same public-read parity gate the host PoC validates. Runs one shard of the realm-server suite under the env-mode stack as a non-blocking slice while integration fallout against env-mode services is matured. Reuses the env-mode CI infrastructure proven by the host PoC (no new infra required). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shard 1/6 ran cleanly under env mode (244/0/0), so expand to a 6-shard matrix mirroring the existing standard-mode realm-server-test job. Kept non-blocking (continue-on-error on the test step) so the per-shard public-read parity gate remains the required signal while any integration-level fallout is matured. Promote to required once stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wait for /base/_readiness-check (the same gate test-services:host uses to start its second stage, which covers full indexing including prerender) before declaring readiness. Then pause 60s to let the prerender's standby pool fully populate before launching the test runner's own chrome instance, since concurrent chrome lifecycle events trigger NetworkChangeNotifier and abort still-loading standby pages with ERR_NETWORK_CHANGED. Visible in the previous run as 167 ERR_NETWORK_CHANGED events and FilterRefersToNonexistentType errors from cards being indexed against a base whose modules table never fully populated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A botched edit left the prior parity step's bash tail merged into the Settle step's run value, so the executed command became 'sleep 60 cat /tmp/skills_info.json ...' and sleep exited with code 2. Replace with a clean run: sleep 60 and drop the now-redundant CORS diagnostic step (CORS through Traefik is already confirmed working). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
card-catalog ran 10/10 cleanly under the env-mode stack with the strengthened readiness gate + standby settle, so expand the host PoC into the full 20-shard partitioned suite mirroring the standard host-test job (HOST_TEST_PARTITION/COUNT consumed by ember-test-pre-built). Kept non-blocking (continue-on-error on the test step) so the per-shard public-read parity gate remains the required signal while any env-mode-specific fallout across the full suite is matured; promote to required once stable across all shards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Path B: host test fixtures and helpers hardcode the live test realm URL as https://localhost:4202/test/, which doesn't exist in environment mode (the test realm-server runs at a per-environment Traefik hostname instead). Expose the running URL as config.resolvedTestRealmURL (derived from REALM_TEST_URL, which env-vars.sh sets in both modes) and have the host's NetworkService rewrite the hardcoded URL to it at fetch time when the two differ. Mirrors the existing addURLMapping pattern used for the canonical base realm; no-op in standard mode and production. Closes ~80 env-mode test failures concentrated in shards 4, 10, 11, 12 (realm-querying, realm-indexing, realm tests). Cluster C: mirror the standard host-test job's chunk-fetch retry block in the env-mode host job so a transient ChunkLoadError / Failed to fetch dynamically imported module / NetworkError aborting a whole shard before any tests run gets one retry. Previously cost shard 13 its entire run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The appendFileSync call in env-mode-lock.js (added in #5021, merged into this branch via 4d9c587) was multilined past prettier's print width, breaking the Lint job's prettier check. Format it on a single line so CI lint goes green on this branch; the fix is in main territory and can be cherry-picked there independently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolvedTestRealmURL was added to environment.js in the previous commit but missing from environment.ts's exported config type, so network.ts saw it as unknown and ember-tsc failed. Declare it as string. local-testing.md picked up a trailing-whitespace prettier warning in the merge from main; reformat to keep the Lint job green here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last run showed Path B closing ~66 env-mode failures (shards 4/10/11/12) but introducing 24 new failures concentrated in shard 9 (Integration | operator-mode | links: 'waitFor timed out waiting for [data-test-stack-card=http://test-realm/test/BlogPost/1]'). The same suite passes 103/0 in standard mode on the same SHA, so the regression is env-mode-specific and either Path B or a commit pulled in via the merge from main caused it. Comment out the addURLMapping call (keep the config + type plumbing) so the next CI run isolates which: if shard 9 returns to 0 and shards 10/11/12 regress, Path B was the source; if shard 9 stays at 24, the merge introduced it independently. Not a final state — this commit is meant to be reverted/replaced once the source is identified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When Path B was first added (commit 746e9bd) it caused a 24-test regression in shard 9's Integration | operator-mode | links: half the host-app code paths went through VirtualNetwork (with the new mapping) while the other half still went through the deprecated global prefixMappings (which didn't), producing asymmetric URL resolution that broke card-instance lookups. CS-10752 has since landed in full — the global prefixMappings table and the deprecated card-reference-resolver module are gone from main, so every resolution site now goes through VirtualNetwork uniformly. The asymmetry that broke shard 9 no longer has anywhere to live, so the mapping is safe to restore. With it back, ~66 hardcoded-URL test failures across shards 4/10/11/12/15 should close again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The env-mode parity gate was waiting on base/_readiness-check, which returns 200 once the base realm finishes its from-scratch index but leaves skills indexing still in flight (the realm-server indexes base and skills sequentially). The AI-assistant tests then fetch Skill/boxel-environment from the skills realm and get 404 because skills hasn't been indexed yet, accounting for the 11+7=18 ai-assistant failures clustered in shards 7 and 17. Add a parallel wait on skills/_readiness-check so the test step does not start until both base and skills have completed indexing. Standard mode does not need this because the host-test job's wait orchestration naturally delays past skills indexing via its longer asset-restore path; env-mode CI builds the dist inline and gets to the gate earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Env-mode CI runs `pnpm build` (in packages/host) before launching test-services, which materializes the pnpm workspace and creates an empty packages/skills-realm/contents directory. services/realm-server later invokes `pnpm skills:setup`, whose `[ -d contents ] || git clone` heuristic sees the directory already exists and skips the clone — so the skills realm boots with no content. The realm's `#startup` finds no files to index, completes immediately, and `_readiness-check` returns 200 promptly. Tests then fetch `@cardstack/skills/Skill/ boxel-environment` and 404, breaking every ai-assistant-panel | skills and | commands test in env mode (~18 failures total). Standard mode doesn't hit this because it downloads a pre-built dist artifact rather than running `pnpm build`, so contents/ doesn't exist when services/realm-server's skills:setup check runs and the clone fires normally. Add an explicit skills:setup step in both env-mode jobs (host and realm-server) before any pnpm workspace op runs, so contents/ has real boxel-skills content on disk when test-services starts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The skills:setup script's SSH-then-HTTPS fallback chain has a silent failure mode in the env-mode CI: SSH clone fails with 'Permission denied (publickey)' as expected (no SSH key in CI) but leaves a non-empty contents/ directory, which the HTTPS clone fallback then refuses to overwrite. The script exits 0 because git clone's failure output is swallowed in the chain, so the step succeeds but packages/skills-realm/contents has no real content. The skills realm boots empty, _readiness-check returns 200 fast (no work), and tests 404 on Skill/boxel-environment. Replace the pnpm skills:setup call in both env-mode CI jobs with a direct, verifiable HTTPS clone: bypass SSH, skip if an actual Skill/ directory is already present (idempotent), and ls the result so any future regression is visible in the log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…arts Two prior fix attempts populated packages/skills-realm/contents and ls confirmed Skill/boxel-environment.json was present, but the skills realm still indexed zero files in env mode. Either something between the populate step and the realm-server's NodeAdapter read is clobbering the content, or the realm-server is reading from a different path. Add an explicit ls at three vantage points (workspace-relative, Skill subdir, realm-server-cwd-relative) right before test-services start to nail down which case applies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scoped-fromUrl bootstrap realms (skills, openrouter) take their realm.url from the env-mode served URL, not the canonical, so the static-URL migration rows in realm_user_permissions never match. getRealmOwnerUserId then throws "Cannot determine realm owner" inside from-scratch-index on boot, which fullIndex catches and swallows -- the realm mounts but indexes zero files. Every ai-assistant-panel | skills test then 404s. Extend the env-mode parity step to mirror realm-owner, write, and named-user permissions by URL pathname (was: only *: read). Keys off existing standard-mode rows so it stays in lockstep with the migrations; per-(username, env-url) check preserves any custom admin permission across reruns.
The realm-server tests run against a template DB whose migrations already seed standard-mode permission rows for /skills/, /catalog/, etc. Tests that deepEqual the full row set at the env-mode URL pick those up alongside their own inserts and yield false "actual" results: my coalesces + custom-row tests saw three rows including * and @skills_writer instead of the one row they set up. Switch the three deepEqual tests to a synthetic /probe-realm/ pathname that no migration touches. The publicReadGranted tests stay on /skills/ since they assert one property and aren't affected by extra rows.
Host: remove the create-realm-users workflow step. In env mode `mise run test-services:host` calls `start:matrix`, which both boots Synapse and auto-registers users on first run; the workflow backgrounds that and the extra register step raced the Synapse boot, failing with "No such container: boxel-synapse-ci". Realm-server: skip tests/index.ts's TLS env-var deletion when BOXEL_ENVIRONMENT is set. The deletion was added to defeat the first-byte dispatcher's plain-HTTP→https redirect in standard mode, but env-mode createListener uses a pure h2 server (no dispatcher) and treats the cert as a system invariant — wiping it throws "HTTP/2 requires a TLS cert/key but ... are not set" on every fixture spawn.
`env-vars.sh` exports REALM_SERVER_TLS_CERT_FILE / KEY_FILE only when the cert files are already on disk at the moment it's sourced. In CI, mise-action sources env-vars.sh at workflow init — which runs before `actions/init`'s `mise run infra:ensure-dev-cert` provisions the cert — so the conditional silently skips and the env vars never reach the test step. The result was every in-process fixture realm-server in env mode throwing "HTTP/2 requires a TLS cert/key but ... are not set" (92 failures per realm-server shard, previously hidden by continue-on-error: true on the standalone env-mode PoC job). Set the paths explicitly at the job level to match where infra:ensure-dev-cert writes them on ubuntu-latest.
`tests/helpers/index.ts`'s `stripTlsEnvVars()` runs inside `runTestRealmServer` and the other fixture helpers, so even with the tests/index.ts gate in place the helpers still re-stripped the env vars on every fixture spawn — leaving in-process realm-servers throwing "HTTP/2 requires a TLS cert/key but ... are not set" in env mode. Same gate as tests/index.ts: standard-mode rationale (defeat the dispatcher's plain-HTTP→https redirect) doesn't apply in env mode where createListener uses a pure h2 server with the cert as a system invariant.
The realm-server tests use supertest with plain HTTP/1.1 against in- process fixture realm-servers on random `127.0.0.1:444X` ports. Env-mode `createListener` requires HTTPS+HTTP/2 with no plain-HTTP fallback, so fixtures spawned under BOXEL_ENVIRONMENT=ci either throw "HTTP/2 requires a TLS cert/key" or the h2 server rejects supertest's plain bytes as garbage. The cost of running this job in env mode is a test-infrastructure refactor (h2-aware fixture client, or a test-only bypass in createListener) that is out of scope here. This restores the standard-mode job (test-web-assets dependency, register-realm-users, pnpm test:wait-for-servers) and removes the env-mode-aware gates from tests/index.ts and tests/helpers/index.ts that were paired with the now-reverted env-mode job env.
`testModuleRealm` is the URL of the live test realm-server's /test/
realm — used by ~57 integration tests as the base for module
references (e.g. `${testModuleRealm}some-card`). It was hardcoded to
the standard-mode `https://localhost:4202/test/`; in env mode the
test realm-server serves it at `https://realm-test.<slug>.localhost/test/`.
Derive it from ENV.resolvedTestRealmURL, which `environment.js`
already populates from REALM_TEST_URL.
`baseTestMatrix.url` in the host test helpers was hardcoded to `http://localhost:8008` — standard-mode Synapse. In env mode Synapse lives at `https://matrix.<slug>.localhost`, so every test setup that spun up MatrixService against this URL failed `POST .../login` with "Failed to fetch", cascading into "Error loading env default system card" and the 11 `ai-assistant-panel | skills` failures with "Timed out waiting for env skill to be active". ENV.matrixURL is already resolved by environment.js from BOXEL_ENVIRONMENT (or MATRIX_URL) — use that.
Replace literal `https://localhost:4202/test/...` (live test realm) and `http://localhost:4206/...` (icon server) string usages with `testModuleRealm` and an `iconsBase` const sourced from `ENV.iconsURL`. Both already resolve correctly per mode — env mode gets the `realm-test.<slug>.localhost` / `icons.<slug>.localhost` hostnames, standard mode keeps the original ports — so the same test bundle works against either stack. Adds the testModuleRealm import to files that referenced it via the URL but didn't yet import the helper.
The two `indexing identifies an instance's [card|polymorphic] references` tests deepEqual `refs!.sort()` against a hand-written expected list that had `http://localhost:4206/...` (standard-mode icons) at the top — lexically first because `http:` sorts before `https:`. In env mode icons resolve to `https://icons.<slug>.localhost/...`, which sorts between `https://cardstack.com/...` and `https://packages/...`, so the hand-written order no longer matches the sort output. Add `.sort()` to the expected array as well so the assertion is robust to the lexical position of `iconsBase` URLs in either mode.
The 6 `host submode > with a realm that is publishable > publish and unpublish realm` tests asserted against literal strings of the form `https://testuser.localhost:4201/test/` and friends. The publishing UI builds those URLs from `ENV.publishedRealmBoxelSpaceDomain` / `publishedRealmBoxelSiteDomain` (both fall back to `defaults.realmHost`: `localhost:4201` in standard mode, `realm-server.<slug>.localhost` in env mode), so the assertions need to derive the host the same way. Source the host from `ENV.publishedRealmBoxelSpaceDomain` and rebuild every `<username>|<custom-subdomain>.localhost:4201` literal as a template literal interpolating that host. Object literal keys are wrapped in computed-property brackets.
- ci.yaml: drop the comment block above `realm-server-test` — the job is identical to main, the comment described a hypothetical alternative that doesn't exist in the code. - ci-host.yaml: rewrite the `Verify *.localhost resolves to loopback` preamble to state the invariant the step checks instead of framing it as the "riskiest unknown"; rewrite the `Wait for realms` preamble to describe what the step blocks on rather than which fix it validates. - network.ts: tighten the URL-mapping comment to name the surviving hardcoded surface (fixture JSON / embedded card ids) instead of claiming all "test fixtures and helpers" hardcode. - realm-registry-backfill.ts: drop the "Following the same path-keyed design as the prior public-read-only parity" cross-reference to a prior version of this function.
`test-web-assets` takes an optional `boxel_environment` input that gets baked into the cache key, artifact name, and the host build step. Empty input keeps the existing standard-mode output (still consumed by live-test / matrix-client-test / vscode-boxel-tools-package); `boxel_environment: ci` produces a sibling artifact whose host bundle has `https://realm-server.ci.localhost`-shaped URLs baked in. ci-host.yaml adds a second `test-web-assets` call — `test-web-assets-env-mode` — that passes `boxel_environment: ci`. `host-test` depends on it, downloads the artifact, and skips the per-shard `build:ui` + `pnpm build` (~60–120 s × 20 shards reclaimed).
Both consumers of test-web-assets in ci-host.yaml — live-test and host-test — now run their chrome against the env-mode service stack. Single `test-web-assets` call with `boxel_environment: ci` produces one artifact both jobs share. (The standard-mode test-web-assets in ci.yaml is untouched; matrix-client-test and vscode-boxel-tools-package still get their standard-mode artifact.) live-test gains the env-mode setup steps already on host-test — mkcert root install, Traefik start, `*.localhost` loopback verify, skills-realm clone, parity gate, prerender settle. The `pnpm register-realm-users` step is dropped; env-mode `start:matrix` auto-registers users on first boot via the synapse-data marker file.
…eration Fast-feedback loop on live-test env mode: - live-test-wait-for-servers.sh: pick the realm-server + matrix hostnames from REALM_BASE_URL / MATRIX_URL_VAL when set (env mode); fall back to the standard-mode localhost ports. - ci-host.yaml live-test step: set REALM_URL to the env-mode skills URL so testem's default `test_page` points at it. - host-test (ci-host.yaml), matrix-client-test + realm-server-test (ci.yaml): `if: false` with a TEMPORARY comment. Revert before merging.
Live-test env-mode iteration is settled; restore the original gating on the three jobs that were temporarily off.
The parity gate already blocks on each realm's `_readiness-check`, which by construction means a standby page completed at least one render. The fixed 60s sleep that followed it didn't probe anything — it guessed at how long the rest of the standby pool takes to finish loading host.ci.localhost asset chunks. If the pool-fill race actually bites we'll see a concrete failure signature and design a real probe; meanwhile, dropping it removes ~60s × 20 shards of unconditional wait.
The env-mode host-test job invokes Percy inline as `percy exec --parallel -- pnpm ember-test-pre-built`, bypassing the `test:wait-for-servers` wrapper this script used to fan into. Nothing else in the repo calls it.
The URL-equality guard already short-circuited in prod (resolvedTestRealmURL defaults to the same hardcoded localhost:4202/test/ value the mapping points away from), so this is a documentation + defense-in-depth move. Matches the existing isTesting() pattern at monaco.ts / import.ts / auth-service-worker-registration.ts.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4bd0aff5dc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The env-mode host bundle is built by test-web-assets.yaml with BOXEL_ENVIRONMENT=ci set but no REALM_TEST_URL — environment.js was falling back to the hardcoded standard-mode default `https://localhost:4202/test/`, so the bundle baked the wrong value for testModuleRealm and the NetworkService URL rewrite saw hardcoded==resolved and short-circuited. Card-data references to testModuleRealm that escape the in-process realm-server mock would hit a localhost:4202 listener that doesn't exist in env-mode CI. Add testRealmURL to environmentDefaults() — standard mode keeps the original `https://localhost:4202/test/`, env mode derives `https://realm-test.<slug>.localhost/test/` from the slug the same way the other URLs already do. resolvedTestRealmURL falls back to defaults.testRealmURL; explicit REALM_TEST_URL still wins for callers that want a custom endpoint.
When the workflow ran \`percy\` directly via \`dbus-run-session -- $TEST_CMD\`, the percy binary in \`node_modules/.bin\` wasn't on PATH and the shard exited with \"failed to exec 'percy': No such file or directory\". Going through pnpm puts the local bin dir on PATH. The script body differs from what was on main (env-mode tests don't need the \`test:wait-for-servers\` wrapper — the workflow's parity gate already waits for the live realm-server / matrix to come up).
…-tests-cs-11275 # Conflicts: # packages/host/tests/integration/realm-indexing-test.gts
The parity gate waits on base + skills _readiness-check before running tests, but `icons.<slug>.localhost` has no readiness probe. A run where the Traefik route for icons hasn't registered yet (or where the http-server hasn't bound its port) lets the test step start anyway, and the first card that imports an icon fails with `TypeError: Failed to fetch` instead of a meaningful 4xx. Add a probe for a stable icon file (`folder-pen.js`) to both parity gates (live-test and host-test). 30 attempts at 2s = 60s headroom, which is comfortably more than the icons server's observed startup time but won't add noticeable wall-clock when it's healthy. This rules out the boot-time race as a cause of the recurring mid-run `Failed to fetch` against icons.ci.localhost. If failures persist after this lands, the cause is mid-run service drop, not readiness.
…real card
Two related fixes for env-mode host CI:
1. Run `pnpm register-realm-users` in both ci-host.yaml jobs
(live-test and host-test), matching what standard-mode CI already
does. Without this, the realm-server's worker fetches `_mtimes`
from the dev realm-server unauthenticated, the response is 404,
and from-scratch indexing finishes with zero files. The base
realm then 404s every card the host bundle loads
(welcome-to-boxel.json, ai-app-generator.json,
join-the-community.json, cards/skill, Skill/catalog-listing,
…), which cascades into the AI Assistant, create-file, and
highlight-cards tests failing on missing UI elements.
2. card-delete's "can delete a card that is a selected item" set
stack 1 to the bare realm URL `${testModuleRealm}`. The live
realm-server doesn't resolve a bare realm URL as a card, so it
404s. Point stack 1 at `${testModuleRealm}index` instead — the
index card exists in test-realm-cards/contents/, so the load
succeeds in both standard and env mode without changing the
test's intent (two distinct stacks for the selection assertion).
Env-mode CI sometimes sees a momentary `Failed to fetch` against `icons.<slug>.localhost` mid-run — a brief Traefik route loss or service-side hiccup. The realm-server hostnames see the same blip, but the realm fetch path has its own `withRetries` and recovers silently; the icons path doesn't, so the failure surfaces in whichever test imports an icon at the wrong instant. Treat this as the same retry-category as the existing chunk-fetch transient: broaden the shard retry regex to match `unable to fetch https://icons\.<host>: fetch failed`, which is specific to the env-mode wire URL and won't accidentally retry on real test failures.
Two integration tests
(`realm: realm can serve GET card requests with linksTo relationships to
other realms` and `realm can serve search requests whose results have
linksTo fields`) fetch `${testModuleRealm}hassan` from the live test
realm. In env-mode CI, the test realm-server occasionally finishes its
from-scratch index with zero files when the dev realm-server is
heavy-indexing on the same runner (shared matrix + prerender pool),
producing a 404 from a realm that should serve hassan.
Inverse-correlated across shards — shard A: dev indexes 0 (matrix
race), test indexes 75 (passes); shard B: dev indexes 200, test
indexes 0 (fails). Re-running the shard normally lands after the
race resolves.
Broaden the existing shard-retry regex (already covers chunk-fetch
and icons transients) to also match
`cross-realm fetch failed for https://realm-test.<host>` so this
class of env-mode boot race gets one automatic retry. The regex is
specific to the env-mode wire URL and the cross-realm-fetch log
prefix, so it won't mask real linksTo bugs in tests using
`http://test-realm/...` in-process URLs.
This lets us catch regressions with CI. Summary of changes:
live-testalso switches to use environment mode sotest-web-assetscan still be shared with host shardstest-web-assetsis configurable for environment mode (cienvironment name)https://localhost:4202/testin tests*.localhost:4201(published),localhost:4202(test), andlocalhost:4206(icons) assertions are now dynamicThis will tighten feedback cycles for host tests in parallel environments: