Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,11 @@ generate-config:
verify-config: generate-config
./scripts/verify/verify-config.sh

.PHONY: generate-crds
generate-crds:
./scripts/fetch_tools.sh controller-gen && \
./scripts/generate-crds.sh
Comment on lines +349 to +352
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Wire generate-crds into the default generate/verify flow.

Right now this target is leaf-only, so the visible make, make verify, and make verify-config paths won't catch stale CRDs or generated clients after API changes.

Suggested change
-all: generate-config microshift etcd
+all: generate-config generate-crds microshift etcd
...
-verify-config: generate-config
+verify-config: generate-config generate-crds
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 349 - 352, The generate-crds Makefile target is
currently leaf-only and not invoked by the broader generate/verify flows; update
the top-level targets (e.g., the generate target and verify/verify-config
targets used by make and make verify) to depend on generate-crds so CRD and
client generation runs automatically—modify the prerequisites for the primary
generate target and the verify/verify-config targets to include generate-crds
(references: Makefile targets "generate-crds", "generate", "verify",
"verify-config") so stale CRDs/clients are caught by the normal make/verify
pipelines.


# Run all of the end to end tests
.PHONY: e2e
e2e:
Expand Down
28 changes: 28 additions & 0 deletions assets/components/c2cc/clusterrole.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: microshift-c2cc-probe
rules:
- apiGroups:
- microshift.io
resources:
- remoteclusters
verbs:
- get
- list
- watch
- apiGroups:
- microshift.io
resources:
- remoteclusters/status
verbs:
- update
- patch
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
verbs:
- use
resourceNames:
- privileged
12 changes: 12 additions & 0 deletions assets/components/c2cc/clusterrolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: microshift-c2cc-probe
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: microshift-c2cc-probe
subjects:
- kind: ServiceAccount
namespace: microshift-c2cc
name: c2cc-probe
68 changes: 68 additions & 0 deletions assets/components/c2cc/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: microshift-c2cc
name: c2cc-probe
labels:
app: c2cc-probe
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: c2cc-probe
template:
metadata:
labels:
app: c2cc-probe
annotations:
target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
openshift.io/required-scc: privileged
spec:
serviceAccountName: c2cc-probe
containers:
- name: c2cc-probe
image: '{{ .ReleaseImage.cli }}'
imagePullPolicy: IfNotPresent
command:
- /host/usr/bin/microshift
- c2cc-probe
ports:
- containerPort: 8080
name: probe
protocol: TCP
livenessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
Comment on lines +35 to +40
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a readiness probe for endpoint gating.

Without readiness, the Service can route traffic before the probe server is actually listening, which can cause avoidable startup failures.

Suggested change
         livenessProbe:
           httpGet:
             path: /
             port: 8080
           initialDelaySeconds: 10
           periodSeconds: 10
+        readinessProbe:
+          httpGet:
+            path: /
+            port: 8080
+          initialDelaySeconds: 2
+          periodSeconds: 5
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/components/c2cc/deployment.yaml` around lines 35 - 40, Add a
readinessProbe alongside the existing livenessProbe so the Service only routes
traffic after the app is ready: add a readinessProbe entry (mirroring the
livenessProbe's httpGet path: / and port: 8080) and configure sensible timings
(e.g., initialDelaySeconds, periodSeconds, successThreshold/failureThreshold) to
gate traffic until the probe server is listening; update the same Kubernetes
manifest section containing livenessProbe to include readinessProbe.

resources:
requests:
cpu: 50m
memory: 64Mi
volumeMounts:
- name: microshift-binary
mountPath: /host/usr/bin/microshift
readOnly: true
volumes:
- name: microshift-binary
hostPath:
path: /usr/bin/microshift
type: File
nodeSelector:
node-role.kubernetes.io/master: ""
priorityClassName: system-cluster-critical
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
Comment on lines +54 to +60
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="assets/components/c2cc/deployment.yaml"

echo "=== File exists? ==="
ls -l "$FILE" || exit 1

echo
echo "=== Lines 1-120 (context) ==="
nl -ba "$FILE" | sed -n '1,120p'

echo
echo "=== Lines 45-75 (scheduling snippet focus) ==="
nl -ba "$FILE" | sed -n '45,75p'

echo
echo "=== Search for scheduling/topology fields in this file ==="
rg -n --hidden --no-ignore-vcs "(nodeSelector|affinity|podAntiAffinity|topologySpreadConstraints|topologyKey|replicas|podDisruptionBudget|PDB|node-role.kubernetes.io/master|control-plane|master|kubernetes.io/hostname)" "$FILE" || true

echo
echo "=== Repo-wide usage of node-role.kubernetes.io/master ==="
rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/master" . || true

Repository: openshift/microshift

Length of output: 266


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="assets/components/c2cc/deployment.yaml"

echo "=== File exists? ==="
ls -l "$FILE" || exit 1

echo
echo "=== Lines 1-120 (context) ==="
awk 'NR>=1 && NR<=120 {printf "%6d  %s\n", NR, $0}' "$FILE"

echo
echo "=== Lines 45-75 (scheduling snippet focus) ==="
awk 'NR>=45 && NR<=75 {printf "%6d  %s\n", NR, $0}' "$FILE"

echo
echo "=== Search for scheduling/topology fields in this file ==="
rg -n --hidden --no-ignore-vcs "(nodeSelector|affinity|podAntiAffinity|topologySpreadConstraints|topologyKey|replicas|podDisruptionBudget|PDB|node-role.kubernetes.io/master|control-plane|master|kubernetes.io/hostname)" "$FILE" || true

echo
echo "=== Repo-wide usage of node-role.kubernetes.io/master ==="
rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/master" . || true

Repository: openshift/microshift

Length of output: 15624


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="assets/components/c2cc/deployment.yaml"

echo "=== File exists? ==="
ls -l "$FILE" || exit 1

echo
echo "=== Lines 1-120 (context) ==="
awk 'NR>=1 && NR<=120 {printf "%6d  %s\n", NR, $0}' "$FILE"

echo
echo "=== Lines 45-75 (scheduling snippet focus) ==="
awk 'NR>=45 && NR<=75 {printf "%6d  %s\n", NR, $0}' "$FILE"

echo
echo "=== Search for scheduling/topology fields in this file ==="
rg -n --hidden --no-ignore-vcs "(nodeSelector|affinity|podAntiAffinity|topologySpreadConstraints|topologyKey|replicas|podDisruptionBudget|PDB|node-role.kubernetes.io/master|control-plane|master|kubernetes.io/hostname)" "$FILE" || true

echo
echo "=== Repo-wide usage of node-role.kubernetes.io/master ==="
rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/master" . || true

Repository: openshift/microshift

Length of output: 15624


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Repo-wide: control-plane label usage ==="
rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/control-plane" . || true

echo
echo "=== Repo-wide: arbiter mentions ==="
rg -n --hidden --no-ignore-vcs "arbiter" . || true

echo
echo "=== Repo-wide: SNO mention ==="
rg -n --hidden --no-ignore-vcs "\bSNO\b|single[- ]node|single[- ]node" . || true

echo
echo "=== CI/topology job hints ==="
rg -n --hidden --no-ignore-vcs "topology|topological|spread|DoNotSchedule|hostname|podAntiAffinity" .github . || true

Repository: openshift/microshift

Length of output: 50378


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Repo-wide: control-plane label usage ==="
rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/control-plane" . || true

echo
echo "=== Repo-wide: arbiter mentions ==="
rg -n --hidden --no-ignore-vcs "arbiter" . || true

echo
echo "=== Repo-wide: SNO mention ==="
rg -n --hidden --no-ignore-vcs "\bSNO\b|single[- ]node|single[- ]node" . || true

echo
echo "=== CI/topology job hints ==="
rg -n --hidden --no-ignore-vcs "topology|topological|spread|DoNotSchedule|hostname|podAntiAffinity" .github . || true

Repository: openshift/microshift

Length of output: 50378


Avoid hard pinning c2cc-probe to master-labeled control-plane nodes
In assets/components/c2cc/deployment.yaml (lines 54-60), the workload uses nodeSelector: node-role.kubernetes.io/master: "" plus a NoSchedule toleration for the same taint, forcing placement onto master-labeled nodes only. This bakes in a specific control-plane topology and isn’t topology-aware for single-node/arbiter variants—make placement topology-aware and cover it with topology-specific CI.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/components/c2cc/deployment.yaml` around lines 54 - 60, The deployment
currently hard-pins c2cc-probe to master-labeled nodes via nodeSelector
node-role.kubernetes.io/master and a NoSchedule toleration; change this to a
topology-aware preference instead: remove the strict nodeSelector and NoSchedule
toleration and replace with a preferred node affinity
(preferredDuringSchedulingIgnoredDuringExecution) that targets control-plane
labels (e.g., node-role.kubernetes.io/master or
node-role.kubernetes.io/control-plane) so pods prefer control-plane nodes but
can run elsewhere for single-node/arbiter variants, and if a toleration is still
required use a non-blocking preference (e.g., avoid NoSchedule) or match the
exact taint/key semantics; update the c2cc deployment
(nodeSelector/tolerations/priorityClassName usage) and add topology-specific CI
to validate placement across single-node and multi-node topologies.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix — MicroShift is single-node. Every other deployment (service-ca, OVN, CSI, DNS) uses the identical nodeSelector: node-role.kubernetes.io/master + NoSchedule toleration pattern. This is the standard convention, not a topology concern.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmtk, understood — that's a valid convention for MicroShift's single-node topology. I'll defer to the established pattern.


✏️ Learnings added
Learnt from: pmtk
Repo: openshift/microshift PR: 6729
File: assets/components/c2cc/deployment.yaml:54-60
Timestamp: 2026-05-25T11:24:44.503Z
Learning: In MicroShift (openshift/microshift), all system-level deployments (service-ca, OVN, CSI, DNS, c2cc-probe, etc.) use the standard pattern of `nodeSelector: node-role.kubernetes.io/master: ""` plus a `NoSchedule` toleration for `node-role.kubernetes.io/master`. This is intentional and correct — MicroShift is a single-node distribution, so topology-aware scheduling is not applicable. Do not flag this pattern as a topology concern.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 120
11 changes: 11 additions & 0 deletions assets/components/c2cc/namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: v1
kind: Namespace
metadata:
name: microshift-c2cc
labels:
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/warn: privileged
annotations:
openshift.io/node-selector: ""
workload.openshift.io/allowed: "management"
14 changes: 14 additions & 0 deletions assets/components/c2cc/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: v1
kind: Service
metadata:
namespace: microshift-c2cc
name: c2cc-probe
spec:
clusterIP: '{{ .ProbeServiceClusterIP }}'
ports:
- name: probe
port: 8080
targetPort: 8080
protocol: TCP
selector:
app: c2cc-probe
5 changes: 5 additions & 0 deletions assets/components/c2cc/serviceaccount.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: microshift-c2cc
name: c2cc-probe
85 changes: 85 additions & 0 deletions assets/crd/microshift.io_remoteclusters.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.18.0
name: remoteclusters.microshift.io
spec:
group: microshift.io
names:
kind: RemoteCluster
listKind: RemoteClusterList
plural: remoteclusters
singular: remotecluster
scope: Cluster
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: |-
RemoteCluster represents a remote cluster's healthcheck probe target.
Created by the C2CC controller, read and updated by the probe pod.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
properties:
probeInterval:
default: 10s
description: Interval between probe attempts (e.g. "10s", "1m").
type: string
probeTarget:
Comment on lines +43 to +47
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Enforce spec.probeInterval constraints at admission.

probeInterval is documented as 1s5m, but the CRD schema currently accepts any string. Add schema-level validation so invalid values are rejected early.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/crd/microshift.io_remoteclusters.yaml` around lines 43 - 47, Add
schema-level admission validation for spec.probeInterval: update the CRD's
openAPIV3Schema for the spec.probeInterval property to include an
x-kubernetes-validations rule that enforces a duration between 1s and 5m (e.g. a
CEL expression that parses the duration and asserts seconds >= 1 && seconds <=
300) and a pattern fallback to only accept valid duration strings (digits + unit
like s, m). Target the spec.probeInterval property in the CRD schema and include
a clear message for rejection.

description: IP:port of the remote cluster's probe service (11th IP
in remote service CIDR, port 8080).
type: string
required:
- probeInterval
- probeTarget
type: object
status:
description: RemoteClusterStatus is populated by the probe pod with health
probe results.
properties:
errors:
items:
type: string
type: array
lastProbeTime:
format: date-time
type: string
lastSuccessfulProbe:
format: date-time
type: string
state:
default: NeverProbed
enum:
- NeverProbed
- Healthy
- Unhealthy
type: string
required:
- state
type: object
required:
- spec
type: object
served: true
storage: true
subresources:
status: {}
5 changes: 5 additions & 0 deletions cmd/generate-config/config/config-openapi-spec.json
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,11 @@
}
}
},
"probeInterval": {
"description": "Interval between healthcheck probe attempts to each remote cluster.\nParsed as a Go duration string (e.g. \"10s\", \"1m\"). Must be between 1s and 5m.",
"type": "string",
"default": "10s"
},
Comment thread
coderabbitai[bot] marked this conversation as resolved.
"remoteClusters": {
"description": "List of remote clusters to establish connectivity with.\nC2CC is disabled when this list is empty.",
"type": "array",
Expand Down
1 change: 1 addition & 0 deletions cmd/microshift/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,6 @@ func newCommand() *cobra.Command {
cmd.AddCommand(cmds.NewRestoreCommand())
cmd.AddCommand(cmds.NewHealthcheckCommand())
cmd.AddCommand(cmds.NewAddNodeCommand())
cmd.AddCommand(cmds.NewC2CCProbeCommand())
return cmd
}
2 changes: 2 additions & 0 deletions docs/user/howto_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ clusterToCluster:
dns:
cacheNegativeTTL: 0
cacheTTL: 0
probeInterval: ""
remoteClusters:
- clusterNetwork: []
domain: ""
Expand Down Expand Up @@ -196,6 +197,7 @@ clusterToCluster:
dns:
cacheNegativeTTL: 10
cacheTTL: 10
probeInterval: 10s
remoteClusters:
- clusterNetwork: []
domain: ""
Expand Down
3 changes: 3 additions & 0 deletions packaging/microshift/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ clusterToCluster:
# Maximum TTL (seconds) for positive DNS cache entries in CoreDNS server blocks
# generated for remote clusters. Must be >= 0. Setting to 0 disables positive caching.
cacheTTL: 10
# Interval between healthcheck probe attempts to each remote cluster.
# Parsed as a Go duration string (e.g. "10s", "1m"). Must be between 1s and 5m.
probeInterval: 10s
# List of remote clusters to establish connectivity with.
# C2CC is disabled when this list is empty.
remoteClusters:
Expand Down
5 changes: 5 additions & 0 deletions pkg/apis/microshift/v1alpha1/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
// +kubebuilder:object:generate=true
// +groupName=microshift.io
// +k8s:deepcopy-gen=package

package v1alpha1
30 changes: 30 additions & 0 deletions pkg/apis/microshift/v1alpha1/groupversion_info.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/schema"
)

var (
GroupName = "microshift.io"
GroupVersion = schema.GroupVersion{Group: GroupName, Version: "v1alpha1"}

SchemeGroupVersion = GroupVersion

schemeBuilder = runtime.NewSchemeBuilder(addKnownTypes)
AddToScheme = schemeBuilder.AddToScheme
)

func Resource(resource string) schema.GroupResource {
return schema.GroupResource{Group: GroupName, Resource: resource}
}

func addKnownTypes(scheme *runtime.Scheme) error {
scheme.AddKnownTypes(GroupVersion,
&RemoteCluster{},
&RemoteClusterList{},
)
metav1.AddToGroupVersion(scheme, GroupVersion)
return nil
}
56 changes: 56 additions & 0 deletions pkg/apis/microshift/v1alpha1/types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// +genclient
// +genclient:nonNamespaced
// +kubebuilder:object:root=true
// +kubebuilder:resource:scope=Cluster
// +kubebuilder:subresource:status
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// RemoteCluster represents a remote cluster's healthcheck probe target.
// Created by the C2CC controller, read and updated by the probe pod.
type RemoteCluster struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec RemoteClusterSpec `json:"spec"`
Status RemoteClusterStatus `json:"status,omitempty"`
}

type RemoteClusterSpec struct {
// IP:port of the remote cluster's probe service (11th IP in remote service CIDR, port 8080).
// +kubebuilder:validation:Required
ProbeTarget string `json:"probeTarget"`

// Interval between probe attempts (e.g. "10s", "1m").
// +kubebuilder:default="10s"
ProbeInterval metav1.Duration `json:"probeInterval"`
}

// RemoteClusterStatus is populated by the probe pod with health probe results.
type RemoteClusterStatus struct {
// +kubebuilder:validation:Enum=NeverProbed;Healthy;Unhealthy
// +kubebuilder:default="NeverProbed"
State string `json:"state"`
// +optional
LastSuccessfulProbe *metav1.Time `json:"lastSuccessfulProbe,omitempty"`
// +optional
LastProbeTime *metav1.Time `json:"lastProbeTime,omitempty"`
// +optional
Errors []string `json:"errors,omitempty"`
}

// +kubebuilder:object:root=true
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// RemoteClusterList contains a list of RemoteCluster resources.
type RemoteClusterList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`

Items []RemoteCluster `json:"items"`
}
Loading