Skip to content

ci: require Gemara CUE schema validation for touched catalog YAMLs#1069

Open
eddie-knight wants to merge 3 commits into
finos:mainfrom
eddie-knight:ci/require-gemara-schema-on-touched-yaml
Open

ci: require Gemara CUE schema validation for touched catalog YAMLs#1069
eddie-knight wants to merge 3 commits into
finos:mainfrom
eddie-knight:ci/require-gemara-schema-on-touched-yaml

Conversation

@eddie-knight

Copy link
Copy Markdown
Collaborator

Summary

Adds a new CI job — Gemara Schema Check — that compiles and validates every catalog YAML touched in a PR against the Gemara CUE schema (gemaraproj/gemara@v1.2.0). Fails the PR on any compile error or schema violation.

Per-file scope: only the build targets whose YAMLs were touched get re-validated, not the whole repo. Changes to shared/core files (catalogs/categories.yaml, catalogs/core/ccc/**) cascade to every build target because every catalog depends on them.

How it works

  1. git diff base...head finds catalog YAMLs touched in the PR.
  2. Maps each changed file to its build target (catalogs/<category>/<service>/...).
  3. For each affected (target, asset):
    • Runs delivery-toolkit compile to produce a self-describing Gemara artifact.
    • Runs cue vet -d "#<ArtifactType>" gemara-spec <compiled> against the CUE schema.

CUE is the validator because:

  • grcli ships from a private repo and can't be used from public CI.
  • cuelang.org/go/cmd/cue is the upstream tool the Gemara CUE module was authored for.
  • The validation is exactly what grcli validate does internally — cue vet against #<Type>.

Pinned versions:

  • Gemara spec: v1.2.0 (matches gemaraSpecVersion in delivery-toolkit/cmd/compile.go:51).
  • cue: v0.16.0.

Why per-file, not full-repo

A full-repo run on every PR would be wasteful (~25 targets × 3 assets × seconds each). Per-file scoping makes the check fast and incentivises the team to fix what they touch without forcing a global fix-up to merge unrelated PRs.

Verification

Locally dry-ran the workflow's shell logic against:

  • catalogs/storage/object/* — compile + cue vet clean ✅
  • catalogs/ai-ml/multi-agent-refarch/* — fails compile (missing metadata.yaml + undefined groups) ❌ (correctly surfaces as a CI failure)
  • Heredoc target-loop iteration confirmed processes every line.

Confirmed cue vet catches real schema breaks: wrong metadata.type value, duplicate control ids, missing gemara-version.

Known surfaces this will now block

PRs that touch any of the 13 imports-only or otherwise broken catalog YAMLs will fail this check until the underlying tech debt is addressed:

  • 7 existing imports-only sources (e.g. catalogs/compute/virtual-machines/controls.yaml, catalogs/crypto/secrets/threats.yaml)
  • catalogs/compute/batchproc/controls.yaml using the wrong imports: key
  • catalogs/ai-ml/multi-agent-refarch/ (missing metadata, undefined groups)
  • The three targets with missing controls.yaml/threats.yaml (management/tracing, orchestration/etl, orchestration/k8s) — touching their metadata.yaml or capabilities.yaml would now flag any other touched compile failure as well.

That's the intended behaviour: this check is what was missing when those got merged.

Test plan

  • PR CI runs and the gemara-schema-checker job appears and passes (this PR touches no catalog YAMLs, so it should no-op).
  • A follow-up PR touching catalogs/categories.yaml should trigger the broad-cascade path and validate all 24 catalogs.
  • A follow-up PR touching a single catalog YAML should validate only that target.
  • Make this check a required status check in branch protection settings (maintainer action — not in this PR).

🤖 Generated with Claude Code

@eddie-knight eddie-knight requested a review from a team as a code owner June 9, 2026 20:30
@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 9, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: eddie-knight / name: Eddie Knight (385c89a)

@eddie-knight eddie-knight force-pushed the ci/require-gemara-schema-on-touched-yaml branch from c96b8d3 to 385c89a Compare June 9, 2026 20:34
@netlify

netlify Bot commented Jun 9, 2026

Copy link
Copy Markdown

Deploy Preview for common-cloud-controls failed. Why did it fail? →

Name Link
🔨 Latest commit c96b8d3
🔍 Latest deploy log https://app.netlify.com/projects/common-cloud-controls/deploys/6a2877cfda6f280008d59350

@netlify

netlify Bot commented Jun 9, 2026

Copy link
Copy Markdown

👷 Deploy request for common-cloud-controls pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 1c6b485

@eddie-knight

Copy link
Copy Markdown
Collaborator Author

CI Checks are failing because every PR currently attempts to build the website, and the site is currently under construction.

@jpower432 jpower432 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits, but requesting changes on the third party actions references.

Comment thread .github/workflows/gemara_check.yml Outdated
Comment thread .github/workflows/gemara_check.yml Outdated
Comment thread .github/workflows/gemara_check.yml Outdated
ref: ${{ env.GEMARA_SPEC_REF }}
path: gemara-spec

- name: Install cue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUE has a setup-cue third party action you could use - https://github.com/cue-lang/setup-cue

go-version-file: delivery-toolkit/go.mod
cache-dependency-path: delivery-toolkit/go.sum

- name: Checkout Gemara CUE spec

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why not just point directly to the module in the CUE registry when running cue vet?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because claude and I didn't know how to do that at the time we were working on this :D

eddie-knight and others added 2 commits June 11, 2026 10:38
Co-authored-by: Jennifer Power <jpower@redhat.com>
Co-authored-by: Jennifer Power <jpower@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants