Add opt-in workaround for OCM CA bundle race condition in acm-mch step #72976

ccardenosa · 2025-12-28T13:34:22Z

Summary

This PR adds an opt-in workaround to the acm-mch step to handle a race condition in the OCM cluster-manager controller that causes MultiClusterHub deployments to fail intermittently.

⚠️ Important: This workaround is disabled by default and must be explicitly enabled via ENABLE_WORKAROUND_LIST environment variable.

Opt-in Mechanism

Workarounds are controlled via the ENABLE_WORKAROUND_LIST environment variable:

env:
  ENABLE_WORKAROUND_LIST: "[72976]"  # Enable this workaround

Setting	Behavior
`ENABLE_WORKAROUND_LIST: "[]"`	Default - No workarounds enabled
`ENABLE_WORKAROUND_LIST: "[72976]"`	Enable OCM CA bundle race condition workaround
`ENABLE_WORKAROUND_LIST: "[72976, 12345]"`	Enable multiple workarounds

Benefits of Opt-in Design

Explicit - Workarounds must be deliberately enabled
Traceable - Each workaround identified by its PR number
Safe - No unexpected behavior in production jobs
Easy cleanup - Remove PR number from list once upstream fix is released

Currently Enabled For

Config File	Branch	Purpose
`openshift-kni-eco-ci-cd-ztp-left-shifting-kpi__ci-4.21.yaml`	`ztp-left-shifting-kpi`	Virtualised hub deployment
`openshift-kni-eco-ci-cd-main__ci-4.21.yaml`	`main`	Metal hub deployment

Related Issues

Issue	Repository	Status
Upstream Fix	open-cluster-management-io/ocm#1309	🔄 Open

Problem

The cluster-manager controller has a race condition where it may create CRDs (ClusterManagementAddOn, ManagedClusterAddOn) before the cert rotation controller creates the CA bundle ConfigMap. When this happens:

CRDs are created with caBundle: cGxhY2Vob2xkZXI= (base64 of literal string "placeholder")
Webhook conversion fails with InvalidCABundle error
CRDs remain in Established: False state
API endpoints are not registered
MCH fails with: "no matches for kind 'ClusterManagementAddOn' in version 'addon.open-cluster-management.io/v1alpha1'"

Evidence from Failed Prow Jobs

Job Run	Date	ACM Version	MCE Version
#2005051399989104640	Dec 27, 2025	2.16.0-113	2.11.0-142
#2005219283428184064	Dec 28, 2025	2.16.0-114	2.11.0-143

Solution

When ENABLE_WORKAROUND_LIST includes 72976, the workaround only triggers if the initial 30-minute wait for MCH fails:

Normal Flow (workaround disabled or upstream fix merged):
  Apply MCH → Wait 30min → Success ✓

Workaround Flow (enabled + race condition hit):
  Apply MCH → Wait 30min → Fail → Check if 72976 enabled → Detect race condition → Apply workaround → Wait 30min → Success ✓

Workaround Steps (when enabled and race condition detected)

Detect - Check if CRDs have the placeholder CA bundle (cGxhY2Vob2xkZXI=)
Patch Services - Add service.beta.openshift.io/serving-cert-secret-name annotation to webhook services
Wait for Secrets - Let service-ca-operator create TLS certificates
Create ConfigMap - Create ca-bundle-configmap from serving cert secret
Patch CRDs - Extract real CA bundle from secrets and update CRDs
Force Reconciliation - Restart cluster-manager and MCE operator

Design Decisions

Decision	Rationale
Opt-in via env var	No impact on jobs that don't explicitly enable it
Workaround only on failure	Doesn't add latency to normal deployments
Specific detection	Only triggers for this exact issue (placeholder CA bundle)
PR number as identifier	Easy to track and remove once fixed

Cleanup Path

Once ocm#1309 is merged and released in ACM/MCE:

Remove 72976 from ENABLE_WORKAROUND_LIST in affected configs
The workaround functions become dead code (never triggered)
They can be removed in a future cleanup PR

Testing

Bash syntax check passes
Workaround successfully applied manually on live cluster (sno-vhub-0)
MCH reached Running status (22/22 components) after workaround
Rehearsal job verified workaround works in CI

Changes

ci-operator/step-registry/acm/mch/acm-mch-commands.sh
- Added ENABLE_WORKAROUND_LIST opt-in mechanism
- Added workaround functions for OCM CA bundle race condition
- Modified main wait logic to check if workaround is enabled before applying
ci-operator/step-registry/acm/mch/acm-mch-ref.yaml
- Documented ENABLE_WORKAROUND_LIST environment variable
ci-operator/config/openshift-kni/eco-ci-cd/openshift-kni-eco-ci-cd-ztp-left-shifting-kpi__ci-4.21.yaml
- Added ENABLE_WORKAROUND_LIST: "[72976]"
ci-operator/config/openshift-kni/eco-ci-cd/openshift-kni-eco-ci-cd-main__ci-4.21.yaml
- Added ENABLE_WORKAROUND_LIST: "[72976]"

/cc @openshift/openshift-team-edge-ztp

openshift-ci · 2025-12-28T13:34:27Z

@ccardenosa: GitHub didn't allow me to request PR reviews from the following users: openshift/openshift-team-edge-ztp.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

Summary

This PR adds a workaround to the acm-mch step to handle a race condition in the OCM cluster-manager controller that causes MultiClusterHub deployments to fail intermittently.

Related Issues

Issue Repository Status

Upstream Fix open-cluster-management-io/ocm#1309 🔄 Open

Problem

The cluster-manager controller has a race condition where it may create CRDs (ClusterManagementAddOn, ManagedClusterAddOn) before the cert rotation controller creates the CA bundle ConfigMap. When this happens:

CRDs are created with caBundle: cGxhY2Vob2xkZXI= (base64 of literal string "placeholder")

Webhook conversion fails with InvalidCABundle error

CRDs remain in Established: False state

API endpoints are not registered

MCH fails with: "no matches for kind 'ClusterManagementAddOn' in version 'addon.open-cluster-management.io/v1alpha1'"

Evidence from Failed Prow Jobs

Job Run Date ACM Version MCE Version

#2005051399989104640 Dec 27, 2025 2.16.0-113 2.11.0-142

#2005219283428184064 Dec 28, 2025 2.16.0-114 2.11.0-143

Solution

This PR adds a workaround that only triggers if the initial 30-minute wait for MCH fails:
Normal Flow (upstream fix merged):
 Apply MCH → Wait 30min → Success ✓

Workaround Flow (race condition hit):
 Apply MCH → Wait 30min → Fail → Detect race condition → Apply workaround → Wait 30min → Success ✓
Workaround Steps

Detect - Check if CRDs have the placeholder CA bundle (cGxhY2Vob2xkZXI=)

Patch Services - Add service.beta.openshift.io/serving-cert-secret-name annotation to webhook services

Wait for Secrets - Let service-ca-operator create TLS certificates

Patch CRDs - Extract real CA bundle from secrets and update CRDs

Force Reconciliation - Restart MCE operator to pick up changes

Retry Wait - Wait again for MCH to reach Running status

Design Decisions

Decision Rationale

Workaround only on failure Doesn't add latency to normal deployments

Specific detection Only triggers for this exact issue (placeholder CA bundle)

Dead code after fix Once upstream PR #1309 is merged, detection returns false and workaround never runs

Clear documentation Functions are well-commented with links to upstream PR

Cleanup Path

Once ocm#1309 is merged and released in ACM/MCE:

The first 30min wait will always succeed

The workaround functions become dead code

They can be removed in a future cleanup PR

Testing

Bash syntax check passes

Workaround successfully applied manually on live cluster (sno-vhub-0)

MCH reached Running status after workaround

Changes

ci-operator/step-registry/acm/mch/acm-mch-commands.sh

Added workaround functions for OCM CA bundle race condition

Modified main wait logic to detect and remediate the issue on failure

/cc @openshift/openshift-team-edge-ztp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ccardenosa · 2025-12-28T13:44:48Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp

openshift-ci-robot · 2025-12-28T13:44:51Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2025-12-28T13:45:19Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-main-ci-4.21-telcov10n-metal-single-node-hub-ztp

openshift-ci-robot · 2025-12-28T13:45:22Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2025-12-28T16:28:09Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-main-ci-4.21-telcov10n-metal-single-node-hub-ztp

openshift-ci-robot · 2025-12-28T16:28:12Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2025-12-28T16:31:15Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp

openshift-ci-robot · 2025-12-28T16:31:18Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2025-12-28T18:35:07Z

✅ Workaround Verified Working

The rehearsal job confirms the workaround successfully resolves the OCM CA bundle race condition:

Successful run: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/72976/rehearse-72976-periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp/2005316456039845888

Execution Summary

MCH did not reach Running status in the first attempt.
Checking for known issues and applying workarounds if needed...

============================================================
Applying OCM CA Bundle Race Condition Workaround
Upstream fix: https://github.com/open-cluster-management-io/ocm/pull/1309
============================================================

Checking for OCM CA bundle race condition (PR #1309)...
DETECTED: clustermanagementaddons CRD has placeholder CA bundle

Step 1/6: Patching webhook services with serving-cert-secret-name annotation...
  ✓ All 3 webhook services patched

Step 2/6: Waiting for service-ca-operator to create secrets...
  ✓ All 3 secrets created by service-ca-operator

Step 3/6: Creating ca-bundle-configmap from serving cert...
  ✓ ConfigMap created with real CA bundle

Step 4/6: Patching CRDs with real CA bundles...
  (CRDs auto-updated after configmap creation)

Step 5/6: Verifying CRDs are now Established...
  ✓ clustermanagementaddons.addon.open-cluster-management.io: Established=True
  ✓ managedclusteraddons.addon.open-cluster-management.io: Established=True

Step 6/6: Restarting cluster-manager and forcing reconciliation...
  ✓ cluster-manager deployment restarted
  ✓ multicluster-engine-operator restarted
  ✓ multiclusterengine annotated for reconciliation

============================================================
Workaround applied successfully!
============================================================

multiclusterhub.operator.open-cluster-management.io/multiclusterhub condition met
MCH reached Running status after applying workaround!
Success! ACM 2.16.0-114 is Running

This workaround will be needed until the upstream fix (open-cluster-management-io/ocm#1309) is merged and released in a future ACM/MCE version.

ccardenosa · 2025-12-28T18:44:25Z

/assign @sg-rh

Could you please review this workaround?

ccardenosa · 2025-12-28T18:50:59Z

/pj-rehearse ack

openshift-ci-robot · 2025-12-28T18:51:01Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2025-12-28T18:53:46Z

/assign @vboulos

Could you please review this workaround?

ccardenosa · 2025-12-29T08:36:04Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-main-ci-4.21-telcov10n-metal-single-node-hub-ztp

openshift-ci-robot · 2025-12-29T08:36:07Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2025-12-29T09:59:11Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-main-ci-4.21-telcov10n-metal-single-node-hub-ztp

openshift-ci-robot · 2025-12-29T09:59:14Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2025-12-29T10:20:15Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-main-ci-4.21-telcov10n-metal-single-node-hub-ztp

openshift-ci-robot · 2025-12-29T10:20:18Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2025-12-29T13:41:03Z

/pj-rehearse ack

openshift-ci-robot · 2025-12-29T13:41:05Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2026-01-05T18:38:18Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-main-ci-4.21-telcov10n-metal-single-node-hub-ztp

openshift-ci-robot · 2026-01-05T18:38:20Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2026-01-05T18:41:08Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp

openshift-ci-robot · 2026-01-05T18:41:11Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2026-01-05T20:56:05Z

/pj-rehearse ack

openshift-ci-robot · 2026-01-05T20:56:08Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

openshift-ci · 2026-01-06T14:30:59Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ccardenosa
Once this PR has been reviewed and has the lgtm label, please ask for approval from sg-rh. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

~~ci-operator/config/openshift-kni/eco-ci-cd/OWNERS~~ [ccardenosa]
ci-operator/step-registry/acm/mch/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ccardenosa · 2026-01-06T14:39:23Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-main-ci-4.21-telcov10n-metal-single-node-hub-ztp

openshift-ci-robot · 2026-01-06T14:39:26Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa · 2026-01-06T14:40:06Z

/pj-rehearse periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp

openshift-ci-robot · 2026-01-06T14:40:08Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

This adds an opt-in workaround for the cluster-manager controller race condition that causes CRDs to be created with an invalid "placeholder" CA bundle. Upstream fix: open-cluster-management-io/ocm#1309 Problem: The cluster-manager controller may create ClusterManagementAddOn and ManagedClusterAddOn CRDs before the cert rotation controller creates the CA bundle ConfigMap. When this happens, the CRDs are created with caBundle: cGxhY2Vob2xkZXI= (base64 of "placeholder"), causing: 1. Webhook conversion fails with "InvalidCABundle" 2. CRDs not becoming Established 3. API endpoints not registered 4. MCH fails: "no matches for kind ClusterManagementAddOn" Additionally, the cluster-manager controller reads CA from ca-bundle-configmap. If this ConfigMap doesn't exist or is empty, it keeps re-applying CRDs with the placeholder CA, overwriting any manual patches. Opt-in Mechanism: Workarounds are now controlled via ENABLE_WORKAROUND_LIST env var: - Default: "[]" (no workarounds enabled) - To enable: ENABLE_WORKAROUND_LIST="[72976]" - Each workaround is identified by its CI PR number This ensures workarounds are: - Explicitly enabled and traceable - Easy to remove once upstream fix is released - No unexpected behavior in production jobs Workaround (6 steps): When MCH fails to reach Running status and workaround 72976 is enabled, detect the race condition by checking for placeholder CA bundles, then: 1. Patch webhook services with serving-cert-secret-name annotation 2. Wait for service-ca-operator to create TLS secrets 3. Create ca-bundle-configmap from the serving cert secret 4. Extract real CA bundle from secrets and patch CRDs 5. Verify CRDs become Established 6. Restart cluster-manager and force MCE operator reconciliation Design: - Workaround only runs if enabled via ENABLE_WORKAROUND_LIST - Detection is specific: checks for the exact placeholder value - Once upstream fix is released, remove 72976 from the list - Eventually remove workaround code entirely after fix is stable Enabled for: - openshift-kni-eco-ci-cd-ztp-left-shifting-kpi__ci-4.21.yaml (hub deployment) Tested on sno-vhub-0: MCE reached Available status and MCH reached Running with 22/22 components after workaround applied. Discovered in Prow jobs: - periodic-ci-...-telcov10n-virtualised-single-node-hub-ztp/2005051399989104640 - periodic-ci-...-telcov10n-virtualised-single-node-hub-ztp/2005219283428184064 Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>

ccardenosa · 2026-01-06T16:43:58Z

🔬 Bug Still Exists - Proved via Binary Analysis (Jan 6, 2026)

Background

Rehearsal job #2008550209683984384 succeeded without triggering the workaround. To verify whether the bug still exists or was fixed upstream, I performed binary analysis on the deployed cluster-manager.

Cluster Status

Item	Value
Cluster	`sno-vhub-0` (OCP 4.21.0-rc.0)
ACM Version	2.16.0-125
MCE Version	2.11.0-154
cluster-manager image	`registry.redhat.io/multicluster-engine/registration-operator-rhel9@sha256:02a4c491fc3d36022f7ab3a8847b7dca8a9d4471312ca7b72bf6fe629b365602`

Binary Analysis

I searched the running cluster-manager binary for the buggy "placeholder" string:

$ oc exec -n multicluster-engine $POD -- grep -ao "placeholder" /registration-operator | wc -l
172

Result: The buggy code is still compiled into the binary (172 occurrences of "placeholder").

CRD State Check

Despite the bug existing in code, the CRDs have real certificates:

$ oc get crd clustermanagementaddons.addon.open-cluster-management.io \
    -o jsonpath="{.spec.conversion.webhook.clientConfig.caBundle}" | base64 -d | head -c 50

-----BEGIN CERTIFICATE-----
MIIDPzCCAiegAwIBAgIIBhmpSdaTem8wDQYJKoZIhvcNAQE

✅ Real certificate (not "placeholder")

Conclusion

Evidence	Result
`"placeholder"` string in binary	172 occurrences → Bug code path exists
`"ca-bundle-configmap"` in binary	2 references → ConfigMap lookup exists
CRD's actual CA bundle	`-----BEGIN CERTIFICATE-----` → Real cert
Was bug triggered?	❌ No

The race condition didn't trigger because certRotationController happened to create ca-bundle-configmap BEFORE clustermanagerController tried to read it.

┌────────────────────────────────────────────────────────────────────┐
│  BINARY PROOF: Bug exists (172x "placeholder" in code)             │
├────────────────────────────────────────────────────────────────────┤
│  RUNTIME STATE: Bug did NOT trigger (real cert in CRD)             │
├────────────────────────────────────────────────────────────────────┤
│  REASON: Race won by timing luck, not by code fix                  │
│          certRotationController was faster this time               │
└────────────────────────────────────────────────────────────────────┘

Why This Workaround is Still Needed

✅ Upstream fix (ocm#1309) NOT merged - still under review
✅ MCE 2.11.0-154 has NOT been patched - confirmed via binary analysis
✅ Race condition is non-deterministic - can still trigger under:
- Slower infrastructure (storage, network, CPU)
- Higher cluster load during deployment
- Unlucky Kubernetes pod scheduling

The workaround remains necessary until the upstream fix is merged and released in ACM/MCE.

openshift-ci-robot · 2026-01-06T16:48:58Z

[REHEARSALNOTIFIER]
@ccardenosa: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name	Repo	Type	Reason
periodic-ci-stolostron-policy-collection-main-ocp4.20-interop-opp-aws	N/A	periodic	Registry content changed
periodic-ci-stolostron-acmqe-autotest-main-acm-ocp4.17-lp-interop-acm-interop-aws	N/A	periodic	Registry content changed
periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp	N/A	periodic	Ci-operator config changed
periodic-ci-RedHatQE-interop-testing-master-acm-cnv-ocp4.19-p2p-interop-acm-cnv-p2p-aws-419	N/A	periodic	Registry content changed
periodic-ci-stolostron-acmqe-autotest-main-acm-ocp4.16-lp-interop-acm-interop-aws	N/A	periodic	Registry content changed
periodic-ci-openshift-kni-eco-ci-cd-main-ci-4.21-telcov10n-metal-single-node-hub-ztp	N/A	periodic	Ci-operator config changed
periodic-ci-stolostron-acmqe-autotest-main-acm-ocp4.15-lp-interop-acm-interop-aws	N/A	periodic	Registry content changed
periodic-ci-stolostron-acmqe-autotest-main-acm-ocp4.14-lp-interop-acm-interop-aws	N/A	periodic	Registry content changed
periodic-ci-stolostron-policy-collection-main-ocp4.21-interop-opp-aws	N/A	periodic	Registry content changed
periodic-ci-stolostron-policy-collection-main-ocp4.21-interop-opp-vsphere	N/A	periodic	Registry content changed

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

ccardenosa · 2026-01-06T16:56:21Z

/pj-rehearse ack

openshift-ci-robot · 2026-01-06T16:56:24Z

@ccardenosa: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ccardenosa force-pushed the fix/clustermanager-cabundle-race-condition-workaround branch from 7cf2ff7 to 915f710 Compare December 28, 2025 16:26

ccardenosa force-pushed the fix/clustermanager-cabundle-race-condition-workaround branch from 915f710 to 79ab2e2 Compare December 28, 2025 18:39

openshift-ci bot assigned sg-rh Dec 28, 2025

openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 28, 2025

openshift-ci bot assigned vboulos Dec 28, 2025

ccardenosa force-pushed the fix/clustermanager-cabundle-race-condition-workaround branch from 79ab2e2 to d5b6d56 Compare December 29, 2025 09:57

openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 29, 2025

ccardenosa force-pushed the fix/clustermanager-cabundle-race-condition-workaround branch from d5b6d56 to 5b57089 Compare December 29, 2025 13:11

openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 29, 2025

ccardenosa force-pushed the fix/clustermanager-cabundle-race-condition-workaround branch from 5b57089 to e52e1e6 Compare January 5, 2026 20:43

openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jan 5, 2026

openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jan 5, 2026

ccardenosa force-pushed the fix/clustermanager-cabundle-race-condition-workaround branch from e52e1e6 to 3c79102 Compare January 6, 2026 14:28

openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jan 6, 2026

ccardenosa force-pushed the fix/clustermanager-cabundle-race-condition-workaround branch from 3c79102 to 2c58fda Compare January 6, 2026 14:34

ccardenosa changed the title ~~Add workaround for OCM CA bundle race condition in acm-mch step~~ Add opt-in workaround for OCM CA bundle race condition in acm-mch step Jan 6, 2026

ccardenosa force-pushed the fix/clustermanager-cabundle-race-condition-workaround branch from 2c58fda to e1ec1f9 Compare January 6, 2026 16:46

openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jan 6, 2026

Add opt-in workaround for OCM CA bundle race condition in acm-mch step #72976

Are you sure you want to change the base?

Add opt-in workaround for OCM CA bundle race condition in acm-mch step #72976

Conversation

ccardenosa commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Opt-in Mechanism

Benefits of Opt-in Design

Currently Enabled For

Related Issues

Problem

Evidence from Failed Prow Jobs

Solution

Workaround Steps (when enabled and race condition detected)

Design Decisions

Cleanup Path

Testing

Changes

Uh oh!

openshift-ci bot commented Dec 28, 2025

Summary

Related Issues

Problem

Evidence from Failed Prow Jobs

Solution

Workaround Steps

Design Decisions

Cleanup Path

Testing

Changes

Uh oh!

ccardenosa commented Dec 28, 2025

Uh oh!

openshift-ci-robot commented Dec 28, 2025

Uh oh!

ccardenosa commented Dec 28, 2025

Uh oh!

openshift-ci-robot commented Dec 28, 2025

Uh oh!

ccardenosa commented Dec 28, 2025

Uh oh!

openshift-ci-robot commented Dec 28, 2025

Uh oh!

ccardenosa commented Dec 28, 2025

Uh oh!

openshift-ci-robot commented Dec 28, 2025

Uh oh!

ccardenosa commented Dec 28, 2025

✅ Workaround Verified Working

Execution Summary

Uh oh!

ccardenosa commented Dec 28, 2025

Uh oh!

ccardenosa commented Dec 28, 2025

Uh oh!

openshift-ci-robot commented Dec 28, 2025

Uh oh!

ccardenosa commented Dec 28, 2025

Uh oh!

ccardenosa commented Dec 29, 2025

Uh oh!

openshift-ci-robot commented Dec 29, 2025

Uh oh!

ccardenosa commented Dec 29, 2025

Uh oh!

openshift-ci-robot commented Dec 29, 2025

Uh oh!

ccardenosa commented Dec 29, 2025

Uh oh!

openshift-ci-robot commented Dec 29, 2025

Uh oh!

ccardenosa commented Dec 29, 2025

Uh oh!

openshift-ci-robot commented Dec 29, 2025

Uh oh!

ccardenosa commented Jan 5, 2026

Uh oh!

openshift-ci-robot commented Jan 5, 2026

Uh oh!

ccardenosa commented Dec 28, 2025 •

edited

Loading