Skip to content

test(systest): upload deps to local cluster cache without redirect server#10487

Open
basvandijk wants to merge 2 commits into
masterfrom
basvandijk/systest-dep-local-cluster-cache
Open

test(systest): upload deps to local cluster cache without redirect server#10487
basvandijk wants to merge 2 commits into
masterfrom
basvandijk/systest-dep-local-cluster-cache

Conversation

@basvandijk

@basvandijk basvandijk commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What

rs/tests/upload_systest_dep.sh no longer queries the redirect server (https://artifacts.idx.dfinity.network). Instead it checks the local cluster's bazel-remote cache directly and uploads the dependency there if it is missing.

Why

To reduce flakiness and operational complexity 557d727 forced testnet allocation to the DC the github runner was hosted in. We would like to have similar behaviour for manually run system-tests from devenvs. To make this easier to achieve we should always serve images from the cluster we're running the test from. That is what this commit implements.

Users should then manually pass --set-required-host-features=dc=$DC where $DC is their local DC to force testnet allocation to that DC. We could also consider automating this but it will be more tricky for cloud-engines and performance tests.

How

  • Existence checkdep_in_cache() does a HEAD against http://server.bazel-remote.svc.cluster.local:8080/cas/<sha> (200 = present, 404 = absent, anything else is a hard error).
  • Cluster nameresolve_cluster() auto-detects the local cluster (e.g. zh1-idx1) from the in-cluster Kubernetes API server certificate SAN (api.<cluster>.dfinity.network). This works on both CI runners and devenvs and yields the exact cluster name. It can be overridden with the SYSTEST_UPLOAD_CLUSTER environment variable.
  • Upload — unchanged target (local bazel-remote), only when the blob is absent; post-upload it re-checks the local cache until the blob is served.
  • Output — still https://artifacts.<cluster>.dfinity.network/cas/<sha> on stdout, preserving the contract with run_systest.sh.

The NODE_NAME-based derivation was intentionally avoided: it only encodes the DC (e.g. dm1), not the full cluster (dm1-idx1 vs dm1-idx2), and it does not exist in devenvs.

Testing

Validated in a zh1-idx1 devenv:

  • bash -n and shellcheck clean.
  • First run uploads; second run reports "already uploaded"; both emit the same URL (stdout = URL only, diagnostics on stderr).
  • The emitted artifacts.zh1-idx1.dfinity.network/cas/<sha> URL actually serves the uploaded blob.
  • SYSTEST_UPLOAD_CLUSTER=dm1-idx9 correctly overrides auto-detection.
  • bazel test //rs/tests/idx:basic_health_test passes.

Notes

  • Requires openssl and reachability of kubernetes.default.svc:443 from the build container (both available on CI runners and devenvs). SYSTEST_UPLOAD_CLUSTER is the escape hatch otherwise.

…rver

upload_systest_dep.sh no longer queries the redirect server (artifacts.idx.dfinity.network). Instead it checks the local cluster's bazel-remote cache directly and uploads there if the dependency is missing.

The local cluster name (needed to build the artifacts.<cluster>.dfinity.network download URL) is auto-detected from the in-cluster Kubernetes API server certificate SAN (api.<cluster>.dfinity.network), which works on both CI runners and devenvs. It can be overridden via the SYSTEST_UPLOAD_CLUSTER environment variable.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates rs/tests/upload_systest_dep.sh to eliminate reliance on the cross-cluster redirect server by checking/uploading system-test dependencies directly against the local cluster’s bazel-remote cache, while preserving the output contract (stdout emits the artifacts URL).

Changes:

  • Replace redirect-server lookup with a direct HEAD existence check against in-cluster bazel-remote (dep_in_cache).
  • Add local cluster auto-detection via Kubernetes API server certificate SANs, with SYSTEST_UPLOAD_CLUSTER override (resolve_cluster).
  • Keep upload target the same (local bazel-remote) and re-check local cache after upload until the blob is served.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rs/tests/upload_systest_dep.sh
Comment thread rs/tests/upload_systest_dep.sh Outdated
Comment thread rs/tests/upload_systest_dep.sh
Comment thread rs/tests/upload_systest_dep.sh Outdated
- Validate SYSTEST_UPLOAD_CLUSTER (and the auto-detected name) against a strict cluster-name pattern before it is interpolated into the artifacts.<cluster>.dfinity.network URL.
- Add an explicit openssl presence check with a clear error message.
- Bound the API-server probe with 'timeout 15' so DNS/network stalls fail fast.
- Use 'grep -m1' instead of 'grep | head' to avoid SIGPIPE under pipefail.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread rs/tests/upload_systest_dep.sh
@basvandijk basvandijk marked this pull request as ready for review June 17, 2026 08:21
@basvandijk basvandijk requested a review from a team as a code owner June 17, 2026 08:21
@github-actions github-actions Bot added the @idx label Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants