OCPBUGS-78832: control-plane-operator/controllers/hostedcontrolplane/v2/cvo: Consume include.release.openshift.io/hypershift-bootstrap annotation by wking · Pull Request #7988 · openshift/hypershift

wking · 2026-03-17T18:12:18Z

What this PR does / why we need it:

The cluster-version operator has a complicated system for deciding whether a given release-image manifest should be managed in the current cluster. Implementing that system here, or even using library-go and remembering to vendor-bump here, both seem like an annoying maintenance load.

We could use the CVO's render command like the standalone installer, but that logic is fairly complicated because it needs to generate all the artifacts necessary for bootstrap MachineConfig rendering, or the production machine-config operator will complain about MachineConfigPools requesting rendered-... MachineConfig that don't exist.

All we actually need out of the bootstrap container are the resources that the cluster-version operator needs to launch and run, which are labeled with the grep target since openshift/cluster-version-operator#1352. That avoids installing anything the cluster doesn't actually need here by mistake. Once the production CVO container starts, it will apply the remaining resources that the cluster actually needs.

I'm also dropping the openshift-config and openshift-config-managed Namespace creation. They are from a30db71 (#5125), but that commit doesn't explain why they were added or hint at where they lived before (if anywhere). I would expect the cluster-version operator to be able to create those Namespaces from the release-image manifests when they are needed, as with other cluster resources.

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

Subject and description added to both, commit and PR.
Relevant issues have been referenced.
This change includes docs.
This change includes unit tests.

openshift-ci-robot · 2026-03-17T18:12:22Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

coderabbitai · 2026-03-17T18:12:27Z

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)

do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: e53f36a9-6ef3-448f-892f-f01b47321e5f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-03-17T18:18:12Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
Once this PR has been reviewed and has the lgtm label, please assign jparrill for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-03-18T01:42:21Z

@wking: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/verify	`1a59094`	link	true	`/test verify`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

… include.release.openshift.io/bootstrap-cluster-version-operator annotation The cluster-version operator has a complicated system for deciding whether a given release-image manifest should be managed in the current cluster [1,2]. Implementing that system here, or even using library-go and remembering to vendor-bump here, both seem like an annoying maintenance load. We could use the CVO's render command like the standalone installer [3,4], but that logic is fairly complicated because it needs to generate all the artifacts necessary for bootstrap MachineConfig rendering, or the production machine-config operator will complain about MachineConfigPools requesting rendered-... MachineConfig that don't exist. All we actually need out of the bootstrap container are the resources that the cluster-version operator needs to launch and run, which are labeled with the grep target since [5]. That avoids installing anything the cluster doesn't actually need here by mistake. Once the production CVO container starts, it will apply the remaining resources that the cluster actually needs. The new "is there a .status.history entry?" guard keeps this loop from running if we already have a functioning cluster-version operator (we don't want to be wrestling with the CVO over the state of the ClusterVersion CRD). The 'oc apply' (instead of 'oc create') gives us a clear "all of those exist now" exit code we can use to break out of the loop during the initial setup (because this init-container needs to complete before the long-running CVO container can start). I'm also dropping the openshift-config and openshift-config-managed Namespace creation. They are from a30db71 (Refactor cluster-version-operator, 2024-11-18, openshift#5125), but that commit doesn't explain why they were added or hint at where they lived before (if anywhere). I would expect the cluster-version operator to be able to create those Namespaces from the release-image manifests when they are needed, as with other cluster resources. I'm also shifting the ClusterVersion custom resource apply into the loop, to avoid attempting to apply before the ClusterVersion CRD exists and to more gracefully recover from temporary API hiccup sorts of things. I'm also adding some debugging echos and other output to make it easier to debug "hey, why is it applying these resources that I didn't expect it to?" or "... not applying the resources I did expect?". [1]: https://github.com/openshift/enhancements/blob/2b38513b8661632f08e64f4acc3b856e842f8669/dev-guide/cluster-version-operator/dev/operators.md#manifest-inclusion-annotations [2]: https://github.com/openshift/library-go/blob/ac826d10cb4081fe3034b027863c08953d95f602/pkg/manifest/manifest.go#L296-L376 [3]: https://github.com/openshift/installer/blob/a300d8c0e9d9d566a85740244a7da74d3d63e23c/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L189-L216 [4]: https://github.com/openshift/cluster-version-operator/blob/eaf28f5165bde27435b0f0c9a69458677034a58d/pkg/payload/render.go [5]: openshift/cluster-version-operator#1352

…r-version-operator: Regenerate Regenerate with: $ UPDATE=true make test

openshift-ci-robot · 2026-03-19T02:42:39Z

@wking: This pull request references Jira Issue OCPBUGS-78832, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

The cluster-version operator has a complicated system for deciding whether a given release-image manifest should be managed in the current cluster. Implementing that system here, or even using library-go and remembering to vendor-bump here, both seem like an annoying maintenance load.

We could use the CVO's render command like the standalone installer, but that logic is fairly complicated because it needs to generate all the artifacts necessary for bootstrap MachineConfig rendering, or the production machine-config operator will complain about MachineConfigPools requesting rendered-... MachineConfig that don't exist.

All we actually need out of the bootstrap container are the resources that the cluster-version operator needs to launch and run, which are labeled with the grep target since openshift/cluster-version-operator#1352. That avoids installing anything the cluster doesn't actually need here by mistake. Once the production CVO container starts, it will apply the remaining resources that the cluster actually needs.

I'm also dropping the openshift-config and openshift-config-managed Namespace creation. They are from a30db71 (#5125), but that commit doesn't explain why they were added or hint at where they lived before (if anywhere). I would expect the cluster-version operator to be able to create those Namespaces from the release-image manifests when they are needed, as with other cluster resources.

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-bot · 2026-04-18T09:30:06Z

Stale PRs are closed after 21d of inactivity.

If this PR is still relevant, comment to refresh it or remove the stale label.
Mark the PR as fresh by commenting /remove-lifecycle stale.

If this PR is safe to close now please do so with /close.

/lifecycle stale

hypershift-jira-solve-ci · 2026-04-18T12:40:40Z

Now I have all the evidence. Let me produce the final report.

Test Failure Analysis Complete

Job Information

Prow Job 1: pull-ci-openshift-hypershift-main-verify
Build ID: 2034407152830910464
Prow Job 2: pull-ci-openshift-hypershift-main-e2e-azure-self-managed
Build ID: 2034407152449228800
PR: OCPBUGS-78832: control-plane-operator/controllers/hostedcontrolplane/v2/cvo: Consume include.release.openshift.io/hypershift-bootstrap annotation #7988 — OCPBUGS-78832: control-plane-operator/controllers/hostedcontrolplane/v2/cvo: Consume include.release.openshift.io/hypershift-bootstrap annotation

Test Failure Analysis

Error

=== Job 1: ci/prow/verify ===
Commit 87457d8dd6:
  1: CT1 Title does not start with one of fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build: "control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator: Regenerate"

Commit bc26dbe16b:
  1: CT1 Title does not start with one of fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build
  1: T1  Title exceeds max length (144>120)
  48: B1 Line exceeds max length (175>140)
  50: B1 Line exceeds max length (160>140)

make: *** [Makefile:423: run-gitlint] Error 5

=== Job 2: ci/prow/e2e-azure-self-managed ===
--- FAIL: TestCreateCluster/ValidateHostedCluster (2841.86s)
  Failed to wait for 2 nodes to become ready in 45m0s: context deadline exceeded
  observed **v1.Node collection invalid: expected 2 nodes, got 0
  Degraded=True: UnavailableReplicas([catalog-operator, cluster-network-operator,
    cluster-storage-operator, cluster-version-operator, csi-snapshot-controller-operator,
    dns-operator, hosted-cluster-config-operator, ingress-operator, olm-operator, packageserver])

Summary

Both jobs fail due to issues introduced by PR #7988. The verify job fails because commit messages do not follow the required conventional commit format (gitlint CT1/T1/B1 violations). The e2e-azure-self-managed job fails because the PR changes the CVO bootstrap init container to grep for the annotation include.release.openshift.io/bootstrap-cluster-version-operator: .*hypershift in release payload manifests, but this annotation does not yet exist in the CI release payload. The annotation is supposed to be added by cluster-version-operator PR #1352, which has not been merged. Consequently, the CVO bootstrap init container runs indefinitely finding zero matching manifests, the CVO pod never initializes, all hosted cluster operators remain unavailable, zero nodes join, and the test times out after 45 minutes.

Root Cause

Job 1 (verify): The run-gitlint Makefile target (line 423) enforces conventional commit formatting via gitlint 0.19.1 with a custom CT1 rule. Both commits in the PR fail validation:

Commit 87457d8dd6 ("control-plane-operator/.../cluster-version-operator: Regenerate") — missing conventional commit prefix (e.g., chore:, feat:)
Commit bc26dbe16b ("control-plane-operator/.../v2/cvo: Consume include.release.openshift.io/bootstrap-cluster-version-operator annotation") — missing prefix, title exceeds 120 chars (144), and two body lines exceed 140 chars (URLs in commit message references)

Job 2 (e2e-azure-self-managed): The PR modifies the CVO deployment template to use a new bootstrap init container script that runs:

grep -rl 'include.release.openshift.io/bootstrap-cluster-version-operator: .*hypershift' /var/payload/manifests

This annotation (include.release.openshift.io/bootstrap-cluster-version-operator) is a new annotation that must be added to CVO manifests in the release payload by openshift/cluster-version-operator PR #1352. That PR is still open and has merge conflicts (mergeable_state: dirty). Since the annotation doesn't exist in any manifest in the current CI release payload:

The grep -rl returns nothing → ls -l of empty expansion fails
The oc apply in the loop applies zero manifests (only /tmp/clusterversion.json if it even gets there)
The bootstrap init container loops forever waiting for clusterversions.config.openshift.io version to have a status.history entry
The CVO pod stays in Pending phase with ContainersNotInitialized — the bootstrap init container shows ready: false, started: true, restartCount: 0, state: running since 23:55:40 and never completes
Without the CVO, no cluster operators are deployed → all 10 operators remain unavailable → zero nodes join → test times out after 45 minutes

Recommendations

Do not merge this PR until openshift/cluster-version-operator #1352 is merged and included in a CI release payload. The CVO PR must land first to add the include.release.openshift.io/bootstrap-cluster-version-operator annotation to the relevant manifests.
Alternatively, make the bootstrap script backward-compatible — if no manifests match the new annotation, fall back to the previous annotation (include.release.openshift.io/hypershift: "true") or skip the grep entirely so the init container can complete without matching manifests.
Fix commit messages to satisfy gitlint:
- Prefix both commits with a conventional type (e.g., chore: for the regeneration commit, feat: for the annotation consumption commit)
- Shorten the second commit title to ≤120 characters (e.g., feat(cvo): consume bootstrap-cluster-version-operator annotation)
- Shorten or wrap the body URL references to ≤140 characters per line
Consider coordinating the cross-repo dependency — file a dependency note on the CVO PR or use a ci-operator payload override to test against a payload that includes the CVO changes.

Evidence

Evidence	Detail
Verify gitlint error	`CT1 Title does not start with one of fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build` on both commits
Verify title length	Commit `bc26dbe16b` title is 144 chars (max 120)
Verify body lines	Lines 48 and 50 exceed 140 char limit (175 and 160 chars respectively — long GitHub URLs)
E2E test failure	`TestCreateCluster/ValidateHostedCluster` failed after 2841.86s (47.4 min)
Node count	`expected 2 nodes, got 0` — zero nodes joined the hosted cluster
CVO pod status	Pod `cluster-version-operator-665b89cb58-2pf4j` stuck in `Pending` phase with `ContainersNotInitialized`
Bootstrap init container	`state: running` since `2026-03-18T23:55:40Z`, `ready: false`, `restartCount: 0` — never completed
Bootstrap script	`grep -rl 'include.release.openshift.io/bootstrap-cluster-version-operator: .*hypershift' /var/payload/manifests` — annotation not in payload
Unavailable operators	All 10 operators unavailable: catalog-operator, cluster-network-operator, cluster-storage-operator, cluster-version-operator, csi-snapshot-controller-operator, dns-operator, hosted-cluster-config-operator, ingress-operator, olm-operator, packageserver
Missing dependency	openshift/cluster-version-operator PR #1352 — still open, has merge conflicts
All CVO conditions Unknown	`ClusterVersionSucceeding`, `ClusterVersionProgressing`, `ClusterVersionAvailable`, `ClusterVersionReleaseAccepted` all `Unknown: StatusUnknown(Condition not found in the CVO.)`

openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels Mar 17, 2026

openshift-ci Bot requested review from devguyio and muraee March 17, 2026 18:17

openshift-ci Bot added the area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release label Mar 17, 2026

openshift-ci Bot removed the do-not-merge/needs-area label Mar 17, 2026

wking mentioned this pull request Mar 17, 2026

OCPBUGS-78832: install: Annotate HyperShift bootstrap manifests with include.release.openshift.io/bootstrap-cluster-version-operator openshift/cluster-version-operator#1352

Open

wking force-pushed the narrowly-scoped-cvo-bootstrap branch 2 times, most recently from 1a59094 to b18cd52 Compare March 18, 2026 01:42

wking added 2 commits March 18, 2026 15:55

control-plane-operator/controllers/hostedcontrolplane/testdata/cluste…

87457d8

…r-version-operator: Regenerate Regenerate with: $ UPDATE=true make test

wking force-pushed the narrowly-scoped-cvo-bootstrap branch from b18cd52 to 87457d8 Compare March 18, 2026 23:10

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2026

openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Mar 19, 2026

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Mar 19, 2026

openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-78832: control-plane-operator/controllers/hostedcontrolplane/v2/cvo: Consume include.release.openshift.io/hypershift-bootstrap annotation#7988

OCPBUGS-78832: control-plane-operator/controllers/hostedcontrolplane/v2/cvo: Consume include.release.openshift.io/hypershift-bootstrap annotation#7988
wking wants to merge 2 commits intoopenshift:mainfrom
wking:narrowly-scoped-cvo-bootstrap

wking commented Mar 17, 2026

Uh oh!

openshift-ci-robot commented Mar 17, 2026

Uh oh!

coderabbitai Bot commented Mar 17, 2026 •

edited

Loading

Review skipped

Uh oh!

openshift-ci Bot commented Mar 17, 2026

Uh oh!

openshift-ci Bot commented Mar 18, 2026

Uh oh!

openshift-ci-robot commented Mar 19, 2026

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

openshift-bot commented Apr 18, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wking commented Mar 17, 2026

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

openshift-ci-robot commented Mar 17, 2026

Uh oh!

coderabbitai Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

openshift-ci Bot commented Mar 17, 2026

Uh oh!

openshift-ci Bot commented Mar 18, 2026

Uh oh!

openshift-ci-robot commented Mar 19, 2026

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

openshift-bot commented Apr 18, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Apr 18, 2026

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Mar 17, 2026 •

edited

Loading