Increase pathological event thresholds for API server rollout scenarios by gangwgr · Pull Request #31122 · openshift/origin

gangwgr · 2026-05-05T05:23:14Z

KMS encryption tests and similar tests that trigger multiple kube-apiserver rollouts cause cascading rollouts across openshift-apiserver, openshift-oauth-apiserver, and their operators. This generates ScalingReplicaSet, OperatorStatusChanged, and DeploymentUpdated events well above the default threshold of 20.

Add three new matchers with increased repeatThresholdOverride values:

APIServerScalingReplicaSetDuringRollout (threshold: 100)
APIServerOperatorStatusChangedDuringRollout (threshold: 50)
OperatorDeploymentUpdatedDuringRollout (threshold: 50)

Summary by CodeRabbit

Bug Fixes
- Monitoring now conditionally tolerates expected cascade events only during nearby KMS encryption test windows: scaling of API server replicas (limited to openshift-apiserver / openshift-oauth-apiserver apiserver), operator status changes (openshift-apiserver-operator), and deployment updates (console and authentication operators). Each allowance is time-scoped to buffered encryption test intervals and uses adjusted repeat thresholds to reduce false positives and improve test reliability.

openshift-merge-bot · 2026-05-05T05:23:17Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-05-05T05:23:37Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Replaces the prior unconditional ScalingReplicaSet allowance with three KMS-encryption–scoped pathological event matchers. Each matcher derives buffered KMS encryption e2e intervals from finalIntervals (locator contains EncryptionKMS or encryption-kms, expanded ±10 minutes) and only allows its pathological event when overlapping those intervals.

Changes

KMS-encryption Scoped Pathological Matchers

Layer / File(s)	Summary
Interval Extraction / Scoping `pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`	Derives buffered KMS encryption e2e-test intervals by filtering `finalIntervals` for intervals whose `LocatorE2ETestKey` contains `EncryptionKMS` or `encryption-kms`, then expanding each matched interval by ±10 minutes.
Matcher Constructors — ScalingReplicaSet `pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`	Adds `newKMSEncryptionTestScalingReplicaSetMatcher(finalIntervals)` that permits `ScalingReplicaSet` with `repeatThresholdOverride: 100`, restricted to namespaces `openshift-apiserver` and `openshift-oauth-apiserver`, deployment key `apiserver`, and only when overlapping the buffered KMS encryption intervals (via `OverlapOtherIntervalsPathologicalEventMatcher`).
Matcher Constructors — OperatorStatusChanged `pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`	Adds `newKMSEncryptionTestOperatorStatusChangedMatcher(finalIntervals)` that permits `OperatorStatusChanged` with `repeatThresholdOverride: 50`, restricted to `openshift-apiserver-operator`, and only when overlapping the buffered KMS encryption intervals.
Matcher Constructors — DeploymentUpdated `pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`	Adds `newKMSEncryptionTestDeploymentUpdatedMatcher(finalIntervals)` that permits `DeploymentUpdated` with `repeatThresholdOverride: 50`, restricted to deployments `console-operator` (namespace `openshift-console-operator`) and `authentication-operator` (namespace `openshift-authentication-operator`), and only when overlapping the buffered KMS encryption intervals.
Registration / Wiring `pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`	Removes the unconditional `ScalingReplicaSet` allowance in `NewUniversalPathologicalEventMatchers` and registers the three new KMS-encryption-scoped matchers so their allowances apply only when events overlap the derived buffered KMS encryption intervals.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 12

✅ Passed checks (12 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Increase pathological event thresholds for API server rollout scenarios' accurately summarizes the main objective of the PR, which is to raise repeat thresholds for pathological events during API server rollouts caused by KMS encryption tests.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PR contains no Ginkgo test definitions. Changes are limited to internal pathological event matcher names which are hardcoded string literals with no dynamic content.
Test Structure And Quality	✅ Passed	Custom check for Ginkgo test quality is not applicable. PR modifies only duplicated_event_patterns.go, a library file with no test code. No Ginkgo tests are added or modified.
Microshift Test Compatibility	✅ Passed	This PR does not add any Ginkgo e2e tests. It only modifies test monitoring library code. The check for MicroShift compatibility is not applicable.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No new Ginkgo e2e tests added in this PR. Changes modify pathological event matchers only in library code used for test monitoring. SNO compatibility check does not apply.
Topology-Aware Scheduling Compatibility	✅ Passed	This PR modifies test event monitoring infrastructure, not deployment manifests, operator code, or controllers. No scheduling constraints or topology-dependent workload placement logic is introduced.
Ote Binary Stdout Contract	✅ Passed	Library code with no process-level entry points or stdout writes. Not an OTE binary entry point.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	This PR does not add new Ginkgo e2e tests. Changes are in a utility library containing event matcher functions, not test code. No IPv4 or external connectivity concerns.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ardaguclu · 2026-05-05T05:49:45Z

+	// KMS encryption tests (and similar tests that trigger multiple kube-apiserver rollouts)
+	// cause cascading rollouts across openshift-apiserver and openshift-oauth-apiserver,
+	// generating ScalingReplicaSet events well above the default threshold.
+	registry.AddPathologicalEventMatcherOrDie(&SimplePathologicalEventMatcher{


In which step enables this configuration only for KMS CI?. I think we shouldn't update for all configurations.

Agreed, I have uupdated matchers are now scoped to only apply when encryption tests are detected in the test intervals.

coderabbitai

🧹 Nitpick comments (1)

pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go (1)

446-482: ⚡ Quick win

Add focused unit tests for matcher scope + thresholds.

Please add tests asserting namespace/deployment/reason matching and repeat-threshold behavior for all three new matchers (100/50/50) so future edits don’t unintentionally broaden allowance.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`
around lines 446 - 482, Add unit tests that exercise the three new
SimplePathologicalEventMatcher entries (names:
APIServerScalingReplicaSetDuringRollout,
APIServerOperatorStatusChangedDuringRollout,
OperatorDeploymentUpdatedDuringRollout) by locating them via the registry
(registry.AddPathologicalEventMatcherOrDie/registry lookup) and verifying their
locatorKeyRegexes and messageReasonRegex behave as expected: include positive
tests where namespace, deployment and reason match the regexes and negative
tests where one of those fields does not match; additionally assert
repeatThresholdOverride equals 100 for APIServerScalingReplicaSetDuringRollout
and 50 for the other two matchers, and include boundary tests to ensure the
matcher does not falsely accept broader values.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`:
- Around line 446-482: Add unit tests that exercise the three new
SimplePathologicalEventMatcher entries (names:
APIServerScalingReplicaSetDuringRollout,
APIServerOperatorStatusChangedDuringRollout,
OperatorDeploymentUpdatedDuringRollout) by locating them via the registry
(registry.AddPathologicalEventMatcherOrDie/registry lookup) and verifying their
locatorKeyRegexes and messageReasonRegex behave as expected: include positive
tests where namespace, deployment and reason match the regexes and negative
tests where one of those fields does not match; additionally assert
repeatThresholdOverride equals 100 for APIServerScalingReplicaSetDuringRollout
and 50 for the other two matchers, and include boundary tests to ensure the
matcher does not falsely accept broader values.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 021666eb-9047-4696-acf6-e64f6dff4a6b

📥 Commits

Reviewing files that changed from the base of the PR and between f0f6117 and 9ff012b.

📒 Files selected for processing (1)

pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`:
- Around line 1385-1393: The three new SimplePathologicalEventMatcher entries
(including the one named "APIServerScalingReplicaSetDuringEncryption" and the
two adjacent matchers added alongside it) are missing explicit
repeatThresholdOverride values, leaving them unbounded under
allowIfWithinIntervals; add the repeatThresholdOverride field to each matcher
and set it to enforce the intended caps (set 100 for the
APIServerScalingReplicaSetDuringEncryption matcher and 50/50 for the other two
matchers) so the Allows logic respects those limits.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7bd1ea0a-7884-4eef-bc93-10a88c017895

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff012b and 60ccd43.

📒 Files selected for processing (1)

pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go

openshift-merge-bot · 2026-05-05T06:42:37Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

ardaguclu

Overall mechanism is nice to me and better than disabling the monitors. Dropped some comments.

ardaguclu · 2026-05-05T07:17:32Z

+	encryptionTestIntervals := finalIntervals.Filter(func(eventInterval monitorapi.Interval) bool {
+		testName := eventInterval.Locator.Keys[monitorapi.LocatorE2ETestKey]
+		return eventInterval.Source == monitorapi.SourceE2ETest &&
+			(strings.Contains(testName, "Encryption") || strings.Contains(testName, "encryption"))


Isn't encryption too generic that may overlap with some other tests. Is it possible to add KMS to ensure that this will cover our setup?.

Updated. Changed to match EncryptionKMS and encryption-kms specifically so it won't overlap with other encryption-related tests.

ardaguclu · 2026-05-05T07:18:26Z

+			(strings.Contains(testName, "Encryption") || strings.Contains(testName, "encryption"))
+	})
+	for i := range encryptionTestIntervals {
+		encryptionTestIntervals[i].To = encryptionTestIntervals[i].To.Add(time.Minute * 10)


What are we fixing by adding and subtracting 10 minutes?

Why 10 minutes?

After the encryption test finishes, the API server rollout it triggered is still in progress - pods are still restarting and operators are still reacting. The +-10 minute buffer gives time for these trailing events to settle. Same buffer value used by the vsphere configuration test matcher for the same reason

ardaguclu · 2026-05-05T07:20:08Z

+// Encryption tests trigger multiple kube-apiserver rollouts that cascade into
+// these namespaces, generating ScalingReplicaSet events (observed 68-82 times).
+func newEncryptionTestScalingReplicaSetMatcher(finalIntervals monitorapi.Intervals) EventMatcher {
+	encryptionIntervals := getEncryptionTestIntervals(finalIntervals)


I think we shouldn't depend on another function. We can carry the same filters here like others do.

ardaguclu · 2026-05-05T07:21:19Z

+				monitorapi.LocatorDeploymentKey: regexp.MustCompile(`^apiserver$`),
+			},
+			messageReasonRegex:      regexp.MustCompile(`^ScalingReplicaSet$`),
+			repeatThresholdOverride: 100,


We need to determine the expected threshold and update periodically after each new test.

Agreed. These thresholds come from what we've seen in the first few CI runs - for example ScalingReplicaSet fired 68-82 times, so we set the cap at 100 to give some room. We'll keep an eye on the periodic job results and tune them if needed

ardaguclu · 2026-05-05T07:21:36Z

+// Operators report status changes repeatedly during cascading rollouts
+// triggered by encryption configuration changes (observed 22-25 times).
+func newEncryptionTestOperatorStatusChangedMatcher(finalIntervals monitorapi.Intervals) EventMatcher {
+	encryptionIntervals := getEncryptionTestIntervals(finalIntervals)


Just carry the filters without depending on a function.

ardaguclu · 2026-05-05T07:22:24Z

+	encryptionIntervals := getEncryptionTestIntervals(finalIntervals)
+	return &OverlapOtherIntervalsPathologicalEventMatcher{
+		delegate: &SimplePathologicalEventMatcher{
+			name: "APIServerOperatorStatusChangedDuringEncryption",


Just a note for future readers: OperatorStatus is changed to progressing when roll out is needed.

ardaguclu · 2026-05-05T07:23:02Z

+		delegate: &SimplePathologicalEventMatcher{
+			name: "OperatorDeploymentUpdatedDuringEncryption",
+			locatorKeyRegexes: map[monitorapi.LocatorKey]*regexp.Regexp{
+				monitorapi.LocatorNamespaceKey:  regexp.MustCompile(`^(openshift-console-operator|openshift-authentication-operator)$`),


Why openshift-apiserver-operator is not listed here but openshift-console-operator is?

Different operators emit different event types. The apiserver-operator fires OperatorStatusChanged (covered by the second matcher), while console-operator and authentication-operator fire DeploymentUpdated when they react to the apiserver changes

Do we know that why console-operator is rolled out?. We only focus on aggregated apiservers.

Not sure I see in this job https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_clus[…]in-e2e-gcp-operator-encryption-kms-ote/2049754074668077056
namespace/openshift-console-operator deployment/console-operator - reason/DeploymentUpdated Updated Deployment.apps/console -n openshift-console because it changed (31 times)
that's why i added we can remove console-operator it has indirect effect of rollout.

As per I'm assumption he console-operator detects the auth config change and re-syncs the console deployment that's why it has rollouts

That makes sense.

coderabbitai

♻️ Duplicate comments (1)

pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go (1)

446-451: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the KMS-only scope out of MatchesAny.

These matchers are always registered, but OverlapOtherIntervalsPathologicalEventMatcher.Matches() ignores allowIfWithinIntervals. That means non-KMS jobs will still classify matching ScalingReplicaSet / OperatorStatusChanged / DeploymentUpdated events as matches, which is broader than the rollout-only scope described here. Gate the registrations behind the same KMS-test detection so the matchers are absent when no KMS interval exists.

Suggested patch

 	// KMS encryption tests trigger multiple kube-apiserver rollouts that cascade across
 	// apiserver, oauth-apiserver, and their operators. These matchers are scoped to only
 	// apply when KMS encryption tests are detected in the intervals.
-	registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestScalingReplicaSetMatcher(finalIntervals))
-	registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestOperatorStatusChangedMatcher(finalIntervals))
-	registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestDeploymentUpdatedMatcher(finalIntervals))
+	hasKMSEncryptionIntervals := len(finalIntervals.Filter(func(eventInterval monitorapi.Interval) bool {
+		testName := eventInterval.Locator.Keys[monitorapi.LocatorE2ETestKey]
+		return eventInterval.Source == monitorapi.SourceE2ETest &&
+			(strings.Contains(testName, "EncryptionKMS") || strings.Contains(testName, "encryption-kms"))
+	})) > 0
+	if hasKMSEncryptionIntervals {
+		registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestScalingReplicaSetMatcher(finalIntervals))
+		registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestOperatorStatusChangedMatcher(finalIntervals))
+		registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestDeploymentUpdatedMatcher(finalIntervals))
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`
around lines 446 - 451, The three KMS-specific matchers
(newKMSEncryptionTestScalingReplicaSetMatcher,
newKMSEncryptionTestOperatorStatusChangedMatcher,
newKMSEncryptionTestDeploymentUpdatedMatcher) are always registered via
registry.AddPathologicalEventMatcherOrDie but their Matches() ignores
allowIfWithinIntervals so they must only be registered when KMS intervals are
present; modify the registration site to check the same KMS-test detection
(i.e., the presence/contents of finalIntervals or the existing KMS detection
predicate used elsewhere) and only call
registry.AddPathologicalEventMatcherOrDie for these three matchers when a KMS
interval exists, leaving them absent for non-KMS runs.

🧹 Nitpick comments (1)

pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go (1)

1365-1451: ⚡ Quick win

Add focused tests for the KMS matcher boundaries.

These three builders now encode most of the PR’s behavior: KMS test detection, ±10 minute buffering, namespace/deployment scoping, and 100/50/50 caps. A small table-driven test around Allows would lock down the intended cases: inside-window vs outside-window, non-KMS test intervals, and one-over-threshold counts.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`
around lines 1365 - 1451, Add table-driven unit tests that exercise the Allows
behavior for newKMSEncryptionTestScalingReplicaSetMatcher,
newKMSEncryptionTestOperatorStatusChangedMatcher, and
newKMSEncryptionTestDeploymentUpdatedMatcher: create synthetic
monitorapi.Intervals that include a matching KMS e2e test interval and variants
(inside the ±10 minute buffered window, just outside the buffer, and a non-KMS
test interval), then assert that event candidates matching the
locator/namespace/deployment and reason are allowed only when inside the
buffered window and not allowed when outside or for non-KMS intervals; also
include a case that exceeds the repeatThresholdOverride by one to ensure the
matcher rejects beyond the cap. Use table-driven subtests to cover each matcher,
vary From/To to validate the ±10 minute logic, and construct events with the
exact locator keys and message reasons referenced in the functions to target the
code paths.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`:
- Around line 446-451: The three KMS-specific matchers
(newKMSEncryptionTestScalingReplicaSetMatcher,
newKMSEncryptionTestOperatorStatusChangedMatcher,
newKMSEncryptionTestDeploymentUpdatedMatcher) are always registered via
registry.AddPathologicalEventMatcherOrDie but their Matches() ignores
allowIfWithinIntervals so they must only be registered when KMS intervals are
present; modify the registration site to check the same KMS-test detection
(i.e., the presence/contents of finalIntervals or the existing KMS detection
predicate used elsewhere) and only call
registry.AddPathologicalEventMatcherOrDie for these three matchers when a KMS
interval exists, leaving them absent for non-KMS runs.

---

Nitpick comments:
In
`@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`:
- Around line 1365-1451: Add table-driven unit tests that exercise the Allows
behavior for newKMSEncryptionTestScalingReplicaSetMatcher,
newKMSEncryptionTestOperatorStatusChangedMatcher, and
newKMSEncryptionTestDeploymentUpdatedMatcher: create synthetic
monitorapi.Intervals that include a matching KMS e2e test interval and variants
(inside the ±10 minute buffered window, just outside the buffer, and a non-KMS
test interval), then assert that event candidates matching the
locator/namespace/deployment and reason are allowed only when inside the
buffered window and not allowed when outside or for non-KMS intervals; also
include a case that exceeds the repeatThresholdOverride by one to ensure the
matcher rejects beyond the cap. Use table-driven subtests to cover each matcher,
vary From/To to validate the ±10 minute logic, and construct events with the
exact locator keys and message reasons referenced in the functions to target the
code paths.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 41411c95-690f-4ea4-aa36-82683a054b07

📥 Commits

Reviewing files that changed from the base of the PR and between 8861ba0 and d70d6dd.

📒 Files selected for processing (1)

pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go

gangwgr · 2026-05-05T07:50:35Z

♻️ Duplicate comments (1)

pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go (1)> 446-451: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the KMS-only scope out of MatchesAny.
These matchers are always registered, but OverlapOtherIntervalsPathologicalEventMatcher.Matches() ignores allowIfWithinIntervals. That means non-KMS jobs will still classify matching ScalingReplicaSet / OperatorStatusChanged / DeploymentUpdated events as matches, which is broader than the rollout-only scope described here. Gate the registrations behind the same KMS-test detection so the matchers are absent when no KMS interval exists.

Suggested patch

 	// KMS encryption tests trigger multiple kube-apiserver rollouts that cascade across
 	// apiserver, oauth-apiserver, and their operators. These matchers are scoped to only
 	// apply when KMS encryption tests are detected in the intervals.
-	registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestScalingReplicaSetMatcher(finalIntervals))
-	registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestOperatorStatusChangedMatcher(finalIntervals))
-	registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestDeploymentUpdatedMatcher(finalIntervals))
+	hasKMSEncryptionIntervals := len(finalIntervals.Filter(func(eventInterval monitorapi.Interval) bool {
+		testName := eventInterval.Locator.Keys[monitorapi.LocatorE2ETestKey]
+		return eventInterval.Source == monitorapi.SourceE2ETest &&
+			(strings.Contains(testName, "EncryptionKMS") || strings.Contains(testName, "encryption-kms"))
+	})) > 0
+	if hasKMSEncryptionIntervals {
+		registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestScalingReplicaSetMatcher(finalIntervals))
+		registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestOperatorStatusChangedMatcher(finalIntervals))
+		registry.AddPathologicalEventMatcherOrDie(newKMSEncryptionTestDeploymentUpdatedMatcher(finalIntervals))
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`
around lines 446 - 451, The three KMS-specific matchers
(newKMSEncryptionTestScalingReplicaSetMatcher,
newKMSEncryptionTestOperatorStatusChangedMatcher,
newKMSEncryptionTestDeploymentUpdatedMatcher) are always registered via
registry.AddPathologicalEventMatcherOrDie but their Matches() ignores
allowIfWithinIntervals so they must only be registered when KMS intervals are
present; modify the registration site to check the same KMS-test detection
(i.e., the presence/contents of finalIntervals or the existing KMS detection
predicate used elsewhere) and only call
registry.AddPathologicalEventMatcherOrDie for these three matchers when a KMS
interval exists, leaving them absent for non-KMS runs.

🧹 Nitpick comments (1)

pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go (1)> 1365-1451: ⚡ Quick win

Add focused tests for the KMS matcher boundaries.
These three builders now encode most of the PR’s behavior: KMS test detection, ±10 minute buffering, namespace/deployment scoping, and 100/50/50 caps. A small table-driven test around Allows would lock down the intended cases: inside-window vs outside-window, non-KMS test intervals, and one-over-threshold counts.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`
around lines 1365 - 1451, Add table-driven unit tests that exercise the Allows
behavior for newKMSEncryptionTestScalingReplicaSetMatcher,
newKMSEncryptionTestOperatorStatusChangedMatcher, and
newKMSEncryptionTestDeploymentUpdatedMatcher: create synthetic
monitorapi.Intervals that include a matching KMS e2e test interval and variants
(inside the ±10 minute buffered window, just outside the buffer, and a non-KMS
test interval), then assert that event candidates matching the
locator/namespace/deployment and reason are allowed only when inside the
buffered window and not allowed when outside or for non-KMS intervals; also
include a case that exceeds the repeatThresholdOverride by one to ensure the
matcher rejects beyond the cap. Use table-driven subtests to cover each matcher,
vary From/To to validate the ±10 minute logic, and construct events with the
exact locator keys and message reasons referenced in the functions to target the
code paths.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`:
- Around line 446-451: The three KMS-specific matchers
(newKMSEncryptionTestScalingReplicaSetMatcher,
newKMSEncryptionTestOperatorStatusChangedMatcher,
newKMSEncryptionTestDeploymentUpdatedMatcher) are always registered via
registry.AddPathologicalEventMatcherOrDie but their Matches() ignores
allowIfWithinIntervals so they must only be registered when KMS intervals are
present; modify the registration site to check the same KMS-test detection
(i.e., the presence/contents of finalIntervals or the existing KMS detection
predicate used elsewhere) and only call
registry.AddPathologicalEventMatcherOrDie for these three matchers when a KMS
interval exists, leaving them absent for non-KMS runs.

---

Nitpick comments:
In
`@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go`:
- Around line 1365-1451: Add table-driven unit tests that exercise the Allows
behavior for newKMSEncryptionTestScalingReplicaSetMatcher,
newKMSEncryptionTestOperatorStatusChangedMatcher, and
newKMSEncryptionTestDeploymentUpdatedMatcher: create synthetic
monitorapi.Intervals that include a matching KMS e2e test interval and variants
(inside the ±10 minute buffered window, just outside the buffer, and a non-KMS
test interval), then assert that event candidates matching the
locator/namespace/deployment and reason are allowed only when inside the
buffered window and not allowed when outside or for non-KMS intervals; also
include a case that exceeds the repeatThresholdOverride by one to ensure the
matcher rejects beyond the cap. Use table-driven subtests to cover each matcher,
vary From/To to validate the ±10 minute logic, and construct events with the
exact locator keys and message reasons referenced in the functions to target the
code paths.

ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 41411c95-690f-4ea4-aa36-82683a054b07

📥 Commits
Reviewing files that changed from the base of the PR and between 8861ba0 and d70d6dd.

📒 Files selected for processing (1)

pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go

This is already handled by the OverlapOtherIntervalsPathologicalEventMatcher itself. When no KMS test intervals exist, allowIfWithinIntervals is empty, so Allows() always returns false - the matcher has no effect. This is the same unconditional registration pattern used by all other OverlapOtherIntervalsPathologicalEventMatcher matchers in this file (vsphere, imagepolicy, sigterm, etc.).

openshift-merge-bot · 2026-05-05T08:06:58Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

gangwgr · 2026-05-05T08:44:33Z

/test e2e-gcp-csi

gangwgr · 2026-05-05T08:44:52Z

/test e2e-gcp-ovn-upgrade

ardaguclu · 2026-05-05T08:57:25Z

/lgtm

openshift-ci · 2026-05-05T08:57:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ardaguclu, gangwgr
Once this PR has been reviewed and has the lgtm label, please assign miyadav for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

p0lyn0mial · 2026-05-05T09:48:42Z

 		messageHumanRegex: regexp.MustCompile(`message changed from "\\ufeff`),
 	})

+	// KMS encryption tests trigger multiple kube-apiserver rollouts that cascade across


would it make sense to link the analysis we did here in the doc ?

xref: https://docs.google.com/document/d/14EJEJ6Xi7DPRN9gIaNUdLet8BwjJlVWf16Q_md5H1xA/edit?tab=t.0

openshift-ci · 2026-05-05T10:38:23Z

New changes are detected. LGTM label has been removed.

openshift-merge-bot · 2026-05-05T11:06:10Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

ardaguclu · 2026-05-05T11:07:37Z

Changes look good to me

openshift-merge-bot · 2026-05-05T14:15:12Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

gangwgr · 2026-05-05T14:58:12Z

/test e2e-aws-ovn-fips

gangwgr · 2026-05-05T14:59:07Z

/test e2e-gcp-ovn-upgrade

gangwgr · 2026-05-05T15:37:37Z

/test e2e-aws-ovn-fips

KMS encryption tests trigger multiple kube-apiserver rollouts that cascade across openshift-apiserver, openshift-oauth-apiserver, and their operators, generating ScalingReplicaSet, OperatorStatusChanged, and DeploymentUpdated events well above the default threshold of 20. Add three new matchers scoped to only apply when encryption tests are detected in the test intervals using OverlapOtherIntervalsPathological EventMatcher, following the same pattern as vsphere configuration and imagepolicy signature validation matchers: - APIServerScalingReplicaSetDuringEncryption - APIServerOperatorStatusChangedDuringEncryption - OperatorDeploymentUpdatedDuringEncryption

openshift-ci · 2026-05-06T04:43:38Z

@gangwgr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-serial-2of2	`3f656ba`	link	true	`/test e2e-aws-ovn-serial-2of2`
ci/prow/e2e-aws-csi	`3f656ba`	link	true	`/test e2e-aws-csi`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-merge-bot · 2026-05-06T05:15:01Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

openshift-ci Bot requested review from deads2k and sjenning May 5, 2026 05:23

gangwgr mentioned this pull request May 5, 2026

CNTRLPLANE-3347: Migrate testKMSEncryptionOnOff to ote openshift/cluster-kube-apiserver-operator#2127

Open

coderabbitai Bot approved these changes May 5, 2026

View reviewed changes

openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 5, 2026

ardaguclu reviewed May 5, 2026

View reviewed changes

gangwgr force-pushed the kms-encryption-threshold branch from f0f6117 to 9ff012b Compare May 5, 2026 05:50

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

gangwgr force-pushed the kms-encryption-threshold branch from 9ff012b to 60ccd43 Compare May 5, 2026 05:59

coderabbitai Bot requested changes May 5, 2026

View reviewed changes

Comment thread pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go Outdated

gangwgr force-pushed the kms-encryption-threshold branch from 60ccd43 to 8861ba0 Compare May 5, 2026 06:07

coderabbitai Bot approved these changes May 5, 2026

View reviewed changes

ardaguclu reviewed May 5, 2026

View reviewed changes

gangwgr force-pushed the kms-encryption-threshold branch from 8861ba0 to d70d6dd Compare May 5, 2026 07:35

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

openshift-ci Bot assigned ardaguclu May 5, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 5, 2026

p0lyn0mial reviewed May 5, 2026

View reviewed changes

gangwgr force-pushed the kms-encryption-threshold branch from d70d6dd to 22f4314 Compare May 5, 2026 10:38

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 5, 2026

gangwgr force-pushed the kms-encryption-threshold branch from 22f4314 to 3f656ba Compare May 5, 2026 13:49

gangwgr force-pushed the kms-encryption-threshold branch from 3f656ba to ff97901 Compare May 6, 2026 04:25

gangwgr force-pushed the kms-encryption-threshold branch from ff97901 to 5008288 Compare May 6, 2026 04:43

Conversation

gangwgr commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented May 5, 2026

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-merge-bot Bot commented May 5, 2026

Uh oh!

ardaguclu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gangwgr commented May 5, 2026

Uh oh!

openshift-merge-bot Bot commented May 5, 2026

Uh oh!

gangwgr commented May 5, 2026

Uh oh!

gangwgr commented May 5, 2026

Uh oh!

ardaguclu commented May 5, 2026

Uh oh!

openshift-ci Bot commented May 5, 2026

gangwgr commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading