OCPBUGS-78832: control-plane-operator/controllers/hostedcontrolplane/v2/cvo: Consume include.release.openshift.io/hypershift-bootstrap annotation#7988
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
1a59094 to
b18cd52
Compare
|
@wking: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
… include.release.openshift.io/bootstrap-cluster-version-operator annotation The cluster-version operator has a complicated system for deciding whether a given release-image manifest should be managed in the current cluster [1,2]. Implementing that system here, or even using library-go and remembering to vendor-bump here, both seem like an annoying maintenance load. We could use the CVO's render command like the standalone installer [3,4], but that logic is fairly complicated because it needs to generate all the artifacts necessary for bootstrap MachineConfig rendering, or the production machine-config operator will complain about MachineConfigPools requesting rendered-... MachineConfig that don't exist. All we actually need out of the bootstrap container are the resources that the cluster-version operator needs to launch and run, which are labeled with the grep target since [5]. That avoids installing anything the cluster doesn't actually need here by mistake. Once the production CVO container starts, it will apply the remaining resources that the cluster actually needs. The new "is there a .status.history entry?" guard keeps this loop from running if we already have a functioning cluster-version operator (we don't want to be wrestling with the CVO over the state of the ClusterVersion CRD). The 'oc apply' (instead of 'oc create') gives us a clear "all of those exist now" exit code we can use to break out of the loop during the initial setup (because this init-container needs to complete before the long-running CVO container can start). I'm also dropping the openshift-config and openshift-config-managed Namespace creation. They are from a30db71 (Refactor cluster-version-operator, 2024-11-18, openshift#5125), but that commit doesn't explain why they were added or hint at where they lived before (if anywhere). I would expect the cluster-version operator to be able to create those Namespaces from the release-image manifests when they are needed, as with other cluster resources. I'm also shifting the ClusterVersion custom resource apply into the loop, to avoid attempting to apply before the ClusterVersion CRD exists and to more gracefully recover from temporary API hiccup sorts of things. I'm also adding some debugging echos and other output to make it easier to debug "hey, why is it applying these resources that I didn't expect it to?" or "... not applying the resources I did expect?". [1]: https://github.com/openshift/enhancements/blob/2b38513b8661632f08e64f4acc3b856e842f8669/dev-guide/cluster-version-operator/dev/operators.md#manifest-inclusion-annotations [2]: https://github.com/openshift/library-go/blob/ac826d10cb4081fe3034b027863c08953d95f602/pkg/manifest/manifest.go#L296-L376 [3]: https://github.com/openshift/installer/blob/a300d8c0e9d9d566a85740244a7da74d3d63e23c/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L189-L216 [4]: https://github.com/openshift/cluster-version-operator/blob/eaf28f5165bde27435b0f0c9a69458677034a58d/pkg/payload/render.go [5]: openshift/cluster-version-operator#1352
…r-version-operator: Regenerate Regenerate with: $ UPDATE=true make test
b18cd52 to
87457d8
Compare
|
@wking: This pull request references Jira Issue OCPBUGS-78832, which is valid. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Stale PRs are closed after 21d of inactivity. If this PR is still relevant, comment to refresh it or remove the stale label. If this PR is safe to close now please do so with /lifecycle stale |
|
Now I have all the evidence. Let me produce the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryBoth jobs fail due to issues introduced by PR #7988. The verify job fails because commit messages do not follow the required conventional commit format (gitlint CT1/T1/B1 violations). The e2e-azure-self-managed job fails because the PR changes the CVO bootstrap init container to grep for the annotation Root CauseJob 1 (
Job 2 ( grep -rl 'include.release.openshift.io/bootstrap-cluster-version-operator: .*hypershift' /var/payload/manifestsThis annotation (
Recommendations
Evidence
|
What this PR does / why we need it:
The cluster-version operator has a complicated system for deciding whether a given release-image manifest should be managed in the current cluster. Implementing that system here, or even using
library-goand remembering to vendor-bump here, both seem like an annoying maintenance load.We could use the CVO's render command like the standalone installer, but that logic is fairly complicated because it needs to generate all the artifacts necessary for bootstrap MachineConfig rendering, or the production machine-config operator will complain about MachineConfigPools requesting
rendered-...MachineConfig that don't exist.All we actually need out of the bootstrap container are the resources that the cluster-version operator needs to launch and run, which are labeled with the
greptarget since openshift/cluster-version-operator#1352. That avoids installing anything the cluster doesn't actually need here by mistake. Once the production CVO container starts, it will apply the remaining resources that the cluster actually needs.I'm also dropping the
openshift-configandopenshift-config-managedNamespace creation. They are from a30db71 (#5125), but that commit doesn't explain why they were added or hint at where they lived before (if anywhere). I would expect the cluster-version operator to be able to create those Namespaces from the release-image manifests when they are needed, as with other cluster resources.Which issue(s) this PR fixes:
Fixes
Special notes for your reviewer:
Checklist: