CONSOLE-5118: Add OLS integration for cluster update workflows#16131
CONSOLE-5118: Add OLS integration for cluster update workflows#16131fao89 wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
@fao89: This pull request references CONSOLE-5118 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fao89 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@fao89: This pull request references CONSOLE-5118 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughThis pull request integrates LightSpeed/OpenShift Console (OLS) support into the cluster settings UI. A new 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@frontend/public/components/cluster-settings/cluster-settings.tsx`:
- Around line 1494-1511: The precheck prompt in the case 'precheck' block uses
availableUpdates[0] (targetVersion) which may be undefined or misleading before
a user selects a release; change the logic in the cluster-settings.tsx precheck
handler to stop hard-coding availableUpdates[0]: if there is a user-selected
target (e.g., selectedRelease or selectedUpdate state), use that version
(targetVersion) in the prompt; otherwise construct a version-agnostic prompt
that either lists all candidate versions (map availableUpdates to their .version
strings) or explicitly states that no single target is selected and asks for
guidance across the candidate set; update the same pattern referenced around
lines 1646-1649 as well.
- Around line 979-982: The current render condition treats
ClusterUpdateStatus.ErrorRetrieving like a failure and shows the "Add Failure
OLS Button"; remove ClusterUpdateStatus.ErrorRetrieving from the failure branch
so the button only appears for ClusterUpdateStatus.Failing and
ClusterUpdateStatus.UpdatingAndFailing, and add a separate branch for
ClusterUpdateStatus.ErrorRetrieving that either hides the button or renders a
distinct retrieval-error UI/prompt (create a small conditional branch near the
existing button rendering using the status variable and
ClusterUpdateStatus.ErrorRetrieving to present a dedicated message/component
instead of the failure workflow).
- Around line 1921-1933: The health check for operators wrongly uses
op.status?.conditions?.every(...) which treats unrelated conditions (e.g.,
Upgradeable) as failures; update the logic in the operatorHealth calculation
(and the similar block around allOperatorsHealthy) to explicitly pick conditions
by type ('Available','Progressing','Degraded') from op.status.conditions (use
find or a small lookup) and then evaluate only those three for the healthy
determination, leaving allOperatorsUpdated logic (getClusterOperatorVersion and
getCurrentVersionSafe) unchanged; modify the code paths that compute
operatorHealth.healthy and allOperatorsHealthy to reference these explicit
condition checks for each clusterOperators element.
- Around line 900-909: Gate the ClusterOperator and Node watches behind the
OLS-availability flag so they only mount when the OLS action is actually shown:
keep the hook calls (useK8sWatchResource) in place but pass null (or an explicit
disabled value) instead of the resource descriptor when the existing OLS
availability flag (e.g., canShowOlsAction / isOlsAvailable / whatever boolean
you already compute to decide showing the OLS action) is false; update the two
calls that create clusterOperators/clusterOperatorsLoaded and nodes/nodesLoaded
(currently using ClusterOperatorModel and NodeModel) to use a conditional
argument (shouldWatch ? { isList: true, kind:
referenceForModel(ClusterOperatorModel) } : null) and similarly for NodeModel so
the watches only start when OLS is available.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bcb67d8d-d4a9-420b-b922-e44673df8830
📒 Files selected for processing (2)
frontend/public/components/cluster-settings/cluster-settings.tsxfrontend/public/locales/en/public.json
| // Enhanced data fetching for OLS integration | ||
| const [clusterOperators, clusterOperatorsLoaded] = useK8sWatchResource<ClusterOperator[]>({ | ||
| isList: true, | ||
| kind: referenceForModel(ClusterOperatorModel), | ||
| }); | ||
|
|
||
| const [nodes, nodesLoaded] = useK8sWatchResource<K8sResourceKind[]>({ | ||
| isList: true, | ||
| kind: referenceForModel(NodeModel), | ||
| }); |
There was a problem hiding this comment.
Gate the new cluster-scoped watches behind OLS availability.
These ClusterOperator and Node list watches now start on every visit to Cluster Settings, even when Lightspeed is disabled or the extension is absent. That adds avoidable load on large clusters and widens the RBAC surface for a page that previously did not need these resources just to render. Please only mount the watches when the OLS action can actually be shown.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontend/public/components/cluster-settings/cluster-settings.tsx` around
lines 900 - 909, Gate the ClusterOperator and Node watches behind the
OLS-availability flag so they only mount when the OLS action is actually shown:
keep the hook calls (useK8sWatchResource) in place but pass null (or an explicit
disabled value) instead of the resource descriptor when the existing OLS
availability flag (e.g., canShowOlsAction / isOlsAvailable / whatever boolean
you already compute to decide showing the OLS action) is false; update the two
calls that create clusterOperators/clusterOperatorsLoaded and nodes/nodesLoaded
(currently using ClusterOperatorModel and NodeModel) to use a conditional
argument (shouldWatch ? { isList: true, kind:
referenceForModel(ClusterOperatorModel) } : null) and similarly for NodeModel so
the watches only start when OLS is available.
| {/* Add Failure OLS Button */} | ||
| {(status === ClusterUpdateStatus.Failing || | ||
| status === ClusterUpdateStatus.UpdatingAndFailing || | ||
| status === ClusterUpdateStatus.ErrorRetrieving) && |
There was a problem hiding this comment.
Treat ErrorRetrieving as a separate workflow.
ClusterUpdateStatus.ErrorRetrieving means the console could not fetch update recommendations, not that a cluster upgrade failed. Reusing the failure workflow here will generate misleading prompts and attachments, including cases where the target version is missing. Consider hiding the button for this state or using a dedicated retrieval-error prompt.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontend/public/components/cluster-settings/cluster-settings.tsx` around
lines 979 - 982, The current render condition treats
ClusterUpdateStatus.ErrorRetrieving like a failure and shows the "Add Failure
OLS Button"; remove ClusterUpdateStatus.ErrorRetrieving from the failure branch
so the button only appears for ClusterUpdateStatus.Failing and
ClusterUpdateStatus.UpdatingAndFailing, and add a separate branch for
ClusterUpdateStatus.ErrorRetrieving that either hides the button or renders a
distinct retrieval-error UI/prompt (create a small conditional branch near the
existing button rendering using the status variable and
ClusterUpdateStatus.ErrorRetrieving to present a dedicated message/component
instead of the failure workflow).
| case 'precheck': { | ||
| const targetVersion = availableUpdates[0]?.version; | ||
| const recentFailures = | ||
| cv.status?.history?.slice(0, 5).filter((h) => h.state !== 'Completed').length || 0; | ||
|
|
||
| return `I'm planning to update my ${environment} OpenShift cluster (${clusterSize}) from ${currentVersion} to ${targetVersion} via ${updateChannel} channel. | ||
|
|
||
| Recent update history shows ${recentFailures} non-successful attempts in the last 5 updates. | ||
|
|
||
| Please provide comprehensive pre-update guidance including: | ||
| 1. Specific prerequisites and compatibility checks for this version jump | ||
| 2. Resource requirements and capacity planning | ||
| 3. Backup and rollback strategies | ||
| 4. Risk assessment based on my cluster configuration | ||
| 5. Estimated update duration and maintenance window planning | ||
| 6. Any known issues or breaking changes for this update path | ||
|
|
||
| Focus on actionable steps I should take before starting the update.`; |
There was a problem hiding this comment.
Don't hard-code availableUpdates[0] as the precheck target.
This action is rendered before the user selects a release, but the prompt and attachment pretend a specific target version was chosen. On clusters with multiple supported updates, OLS can analyze the wrong upgrade path and return incorrect prerequisite guidance. Either make the precheck flow version-specific after selection, or keep it version-agnostic and send the full candidate set.
Also applies to: 1646-1649
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontend/public/components/cluster-settings/cluster-settings.tsx` around
lines 1494 - 1511, The precheck prompt in the case 'precheck' block uses
availableUpdates[0] (targetVersion) which may be undefined or misleading before
a user selects a release; change the logic in the cluster-settings.tsx precheck
handler to stop hard-coding availableUpdates[0]: if there is a user-selected
target (e.g., selectedRelease or selectedUpdate state), use that version
(targetVersion) in the prompt; otherwise construct a version-agnostic prompt
that either lists all candidate versions (map availableUpdates to their .version
strings) or explicitly states that no single target is selected and asks for
guidance across the candidate set; update the same pattern referenced around
lines 1646-1649 as well.
| allOperatorsUpdated: clusterOperators.every( | ||
| (op) => getClusterOperatorVersion(op) === getCurrentVersionSafe(), | ||
| ), | ||
| operatorHealth: { | ||
| total: clusterOperators.length, | ||
| healthy: clusterOperators.filter((op) => | ||
| op.status?.conditions?.every( | ||
| (c) => | ||
| (c.type === 'Available' && c.status === 'True') || | ||
| (c.type === 'Degraded' && c.status === 'False') || | ||
| (c.type === 'Progressing' && c.status === 'False'), | ||
| ), | ||
| ).length, |
There was a problem hiding this comment.
Limit the success health check to the relevant operator conditions.
Using status.conditions.every(...) across the whole condition array treats extra conditions like Upgradeable as failures, so a healthy cluster can be reported as unhealthy after a successful update. Look up Available, Progressing, and Degraded explicitly before computing allOperatorsHealthy and healthy.
Suggested fix
+const isOperatorHealthy = (op: ClusterOperator) => {
+ const available = op.status?.conditions?.find((c) => c.type === 'Available')?.status;
+ const progressing = op.status?.conditions?.find((c) => c.type === 'Progressing')?.status;
+ const degraded = op.status?.conditions?.find((c) => c.type === 'Degraded')?.status;
+ return available === 'True' && progressing === 'False' && degraded === 'False';
+};
+
...
- allOperatorsHealthy: clusterOperators.every((op) =>
- op.status?.conditions?.every(
- (c) =>
- (c.type === 'Available' && c.status === 'True') ||
- (c.type === 'Degraded' && c.status === 'False') ||
- (c.type === 'Progressing' && c.status === 'False'),
- ),
- ),
+ allOperatorsHealthy: clusterOperators.every(isOperatorHealthy),
...
- healthy: clusterOperators.filter((op) =>
- op.status?.conditions?.every(
- (c) =>
- (c.type === 'Available' && c.status === 'True') ||
- (c.type === 'Degraded' && c.status === 'False') ||
- (c.type === 'Progressing' && c.status === 'False'),
- ),
- ).length,
+ healthy: clusterOperators.filter(isOperatorHealthy).length,Also applies to: 1979-1987
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontend/public/components/cluster-settings/cluster-settings.tsx` around
lines 1921 - 1933, The health check for operators wrongly uses
op.status?.conditions?.every(...) which treats unrelated conditions (e.g.,
Upgradeable) as failures; update the logic in the operatorHealth calculation
(and the similar block around allOperatorsHealthy) to explicitly pick conditions
by type ('Available','Progressing','Degraded') from op.status.conditions (use
find or a small lookup) and then evaluate only those three for the healthy
determination, leaving allOperatorsUpdated logic (getClusterOperatorVersion and
getCurrentVersionSafe) unchanged; modify the code paths that compute
operatorHealth.healthy and allOperatorsHealthy to reference these explicit
condition checks for each clusterOperators element.
|
/test e2e-gcp-console |
CI Status UpdateAll core builds are passing consistently:
The e2e failures appear to be environment-related (resource constraints, networking issues, cluster instability) rather than code-related. The test logs show good initial execution but infrastructure failures. Requesting maintainer review for merge consideration given the clean core builds. /cc @console-maintainers |
Implements OpenShift Lightspeed AI assistance integration into cluster settings page to provide contextual help during cluster update workflows. Features: - UpdateWorkflowOLSButton component with 4 workflow phase support - Precheck: Pre-update validation and readiness assessment - Failure: Error analysis and troubleshooting guidance - Status: Real-time update progress monitoring assistance - Success: Post-update verification and validation help The integration uses the official lightspeed-console plugin API to open the OLS chatbox with context-aware prompts and cluster data attachments. Each workflow phase provides tailored AI prompts and exports relevant cluster resources (ClusterVersion) as YAML for comprehensive analysis. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> Signed-off-by: Fabricio Aguiar <fabricio.aguiar@gmail.com>
|
@fao89: This pull request references CONSOLE-5118 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@fao89: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Implements OpenShift Lightspeed AI assistance integration into cluster settings page to provide contextual help during cluster update workflows.
Features:
The integration uses the official lightspeed-console plugin API to open the OLS chatbox with context-aware prompts and cluster data attachments. Each workflow phase provides tailored AI prompts and exports relevant cluster resources (ClusterVersion) as YAML for comprehensive analysis.
https://github.com/openshift/lightspeed-console?tab=readme-ov-file#opening-the-openshift-lightspeed-ui-from-other-console-pages-and-plugins
Summary by CodeRabbit
New Features
Localization