Skip to content

ci: add OCI smoke gate workflow#2156

Draft
dliappis wants to merge 1 commit into
developfrom
feat/oci-smoke-gate
Draft

ci: add OCI smoke gate workflow#2156
dliappis wants to merge 1 commit into
developfrom
feat/oci-smoke-gate

Conversation

@dliappis
Copy link
Copy Markdown
Contributor

What kind of change does this PR introduce?

CI — adds a new GH Actions workflow. Not wired as a required check; just runs and reports.

What is the current behavior?

There's no fast pre-EC2 validation for PRs touching the AMI build. testinfra-ami-build.yml is the only path, and it does the full Packer + EC2 round-trip (~20-40 min) before any service-level behavior is exercised.

What is the new behavior?

A new workflow .github/workflows/oci-smoke-gate.yml that runs on PRs touching ansible/, nix/, migrations/, flake.{nix,lock}, Dockerfile-*, or itself (plus workflow_dispatch and merge_group).

It builds the AMI as an OCI image via supabase/supabox's support/ami/Dockerfile, brings the supabox platform stack up, and runs dctest test/supadev-smoke.yaml. Diagnostics + container logs (on failure) are uploaded as a 14-day artifact.

This PR does not make any other workflow depend on it. It runs alongside testinfra-ami-build.yml and reports its own status. Promoting it to a required check is a follow-up once we see how reliable it is across real PRs.

Additional context

  • Local trial: validated on macOS Docker against supabox a0fe25c on 2026-05-15 — 59/59 supadev-smoke tests pass, ~5.5 min of dctest after init. CI wall-clock expected ~15-25 min cold cache.
  • SUPABOX_REF is SHA-pinned, not tracking main, so sibling-team changes can't silently break this workflow. Bump deliberately.
  • pause-restore.yaml coverage is intentionally deferred — upstream supabox needs a YAML parse fix first, and the spec is slower.
  • No PG-version matrix yet (pg17 only). Add later.

Relates RELENG-31.

Builds the AMI as an OCI image via supabox's support/ami/Dockerfile,
brings up the supabox platform stack, and runs dctest's supadev-smoke
spec as a fast pre-flight check before any EC2/testinfra work.

Triggers on pull_request paths that affect the AMI build
(ansible/, nix/, migrations/, flake.{nix,lock}, Dockerfile-*), plus
workflow_dispatch and merge_group.

Flow:
  1. Checkout postgres at PR commit.
  2. Checkout supabox at pinned SHA (env.SUPABOX_REF).
  3. Substitute PR's postgres into supabox/repos/postgres.
  4. Install Nix + add the postgres binary cache substituter so
     stage 1 of the AMI image is mostly a cache pull.
  5. ./supabox init systemd,pg17 (generates env + certs, npm install).
  6. docker compose build supabase-postgres-17 (AMI-as-OCI).
  7. docker compose up -d --wait --wait-timeout 300.
  8. ./dctest test/supadev-smoke.yaml --results-file ... --results-verbose.
  9. Always capture docker state; on failure dump last 500 lines per
     container log.
 10. Upload supabox/diagnostics/ as a 14-day artifact.

Conventions followed:
- Runner blacksmith-2vcpu-ubuntu-2404 (matches testinfra-ami-build.yml).
- supabase/postgres/.github/actions/shared-checkout@HEAD for postgres
  checkout.
- ./postgres/.github/actions/nix-install-ephemeral for Nix.
- Concurrency group includes pull_request.number || github.ref.

Deliberate first-iteration omissions:
- Not gating testinfra-ami-build.yml yet — that wiring is a follow-up
  once this proves stable.
- pause-restore.yaml coverage is a follow-up (blocked on the upstream
  supabox YAML parse fix and on this gate stabilising).
- No matrix over PG 15 / 17 / 17-orioledb — starting with pg17.

SUPABOX_REF is SHA-pinned (not a tracking branch) so a sibling-team
change can't silently break postgres CI. Bump deliberately.

Local-trial evidence: validated end-to-end on macOS Docker against
supabox a0fe25c on 2026-05-15 with 59/59 supadev-smoke tests passing
in ~5.5 min after init. CI-side wall-clock expected ~15-25 min cold
cache, less on warm.

Tracks RELENG-31.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant