Skip to content

Conversation

@camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Jan 11, 2026

Problem

When a catalog becomes unavailable (deleted, registry offline, network issues), installed extensions break or stop being maintained. This PR ensures extensions continue working with their installed version until the catalog becomes available again.

What This Fixes

Issues on main when catalog is unavailable/deleted:

  1. Accidentally deleted resources are NOT restored (both runtimes)
  2. Configuration changes are blocked (both runtimes)
  3. Auto-updates hang during catalog image transitions (both runtimes)
  4. Status shows "Failed" instead of "Retrying" (both runtimes)
  5. Version upgrade attempts show unclear status (both runtimes)
  6. Catalog deletion/maintenance breaks extensions (both runtimes)
  7. Resources drift if manually modified (Helm only)
  8. Registry outages break extension health (Helm only)

Note: Boxcutter already maintains resources via CER controller; Helm did not.

Solution

Added smart fallback logic:

  • Catalogs exist but resolution fails → Retry immediately (transient issue)
  • Catalogs deleted → Fall back to installed bundle, continue maintaining resources
  • Version change requested → Always retry (cannot upgrade without catalog)

Key Changes

  1. Resolution step - Added catalog existence check before fallback
  2. Helm applier - Added reconcileExistingRelease() to maintain resources when contentFS == nil
  3. Boxcutter applier - Return success when contentFS == nil (CER controller maintains)
  4. Status - Clear "Retrying" status instead of "Failed"

What "Extension Continue Working" Means

An extension continues working when:

  • Operator Deployment is running and healthy
  • All managed resources exist in cluster
  • Deleted resources are automatically restored
  • Resource specs match desired state (no drift)
  • Status shows Installed=True
  • Operator's business logic continues (e.g., Prometheus keeps scraping)

Testing

Added comprehensive e2e test suite in test/e2e/features/catalog-deletion-resilience.feature:

  • Extension continues running after catalog deletion
  • Resources auto-restored after catalog deletion
  • Config changes work without catalog
  • Version upgrades correctly blocked without catalog
  • Multiple revisions remain stable (Boxcutter)
  • Workload availability properly tracked

All scenarios tested for both Helm and Boxcutter runtimes where applicable.

What Still Requires Catalog (Correct Behavior)

  • Fresh installs
  • Version upgrades
  • Package changes

Resolution Fails?

├─ Version change requested (1.0.0 → 1.0.1)?
│  └─ YES → RETRY (need catalog to upgrade)
│
├─ Catalogs exist in cluster?
│  ├─ YES → RETRY (transient issue, catalog updating)
│  └─ NO → Check for installed bundle...
│     ├─ Have installed bundle?
│     │  └─ YES → FALLBACK (maintain current workload)
│     └─ NO → RETRY (fresh install needs catalog)

Copilot AI review requested due to automatic review settings January 11, 2026 05:18
@netlify
Copy link

netlify bot commented Jan 11, 2026

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit e59f517
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/6965386bb3c2ff0008083ecd
😎 Deploy Preview https://deploy-preview-2439--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@openshift-ci
Copy link

openshift-ci bot commented Jan 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign pedjak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive end-to-end tests to verify that installed OLM extensions continue functioning correctly when their source catalog is deleted. The tests cover both standard runtime and experimental Boxcutter runtime scenarios.

Changes:

  • Added new feature file with 8 scenarios testing catalog deletion resilience
  • Implemented CatalogIsDeleted function to support catalog deletion in tests
  • Added step registrations for ClusterExtension update operations

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
test/e2e/steps/steps.go Adds CatalogIsDeleted function and step registrations for testing catalog deletion and ClusterExtension updates
test/e2e/features/catalog-deletion-resilience.feature Defines 8 test scenarios covering extension resilience, resource restoration, config changes, version upgrades, and revision behavior when catalog is deleted

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 11, 2026 05:43
@camilamacedo86 camilamacedo86 changed the title 🌱 test: add e2e tests for workload resilience when catalog is deleted WIP 🌱 test: add e2e tests for workload resilience when catalog is deleted Jan 11, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 11, 2026 07:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 11, 2026 07:30
@camilamacedo86 camilamacedo86 changed the title WIP 🌱 test: add e2e tests for workload resilience when catalog is deleted WIP 🐛 Workload should still resilient when catalog is deleted Jan 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 11, 2026 09:00
Copilot AI review requested due to automatic review settings January 12, 2026 09:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 12, 2026 17:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Jan 12, 2026

Codecov Report

❌ Patch coverage is 74.22680% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.31%. Comparing base (1fa4169) to head (e59f517).

Files with missing lines Patch % Lines
internal/operator-controller/applier/helm.go 39.13% 7 Missing and 7 partials ⚠️
...er/controllers/clusterextension_reconcile_steps.go 89.06% 6 Missing and 1 partial ⚠️
internal/operator-controller/applier/boxcutter.go 50.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2439      +/-   ##
==========================================
+ Coverage   73.00%   73.31%   +0.31%     
==========================================
  Files         100      100              
  Lines        7641     7727      +86     
==========================================
+ Hits         5578     5665      +87     
+ Misses       1625     1620       -5     
- Partials      438      442       +4     
Flag Coverage Δ
e2e 47.93% <59.79%> (+1.07%) ⬆️
experimental-e2e 49.70% <54.63%> (+1.01%) ⬆️
unit 57.04% <52.57%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@camilamacedo86 camilamacedo86 changed the title WIP 🐛 Workload should still resilient when catalog is deleted 🐛 Workload should still resilient when catalog is deleted Jan 12, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 12, 2026
Enables installed extensions to continue working when their source
catalog becomes unavailable or is deleted. When resolution fails due
to catalog unavailability, the operator now continues reconciling with
the currently installed bundle instead of failing.

Changes:
- Resolution falls back to installed bundle when catalog unavailable
- Unpacking skipped when maintaining current installed state
- Helm and Boxcutter appliers handle nil contentFS gracefully
- Version upgrades properly blocked without catalog access

This ensures workloads remain stable and operational even when the
catalog they were installed from is temporarily unavailable or deleted,
while appropriately preventing version changes that require catalog access.
@camilamacedo86
Copy link
Contributor Author

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 12, 2026
@camilamacedo86 camilamacedo86 changed the title 🐛 Workload should still resilient when catalog is deleted WIP 🐛 Workload should still resilient when catalog is deleted Jan 12, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 12, 2026
@camilamacedo86 camilamacedo86 changed the title WIP 🐛 Workload should still resilient when catalog is deleted 🐛 Workload should still resilient when catalog is deleted Jan 12, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant