Skip to content

Move AggregationRuleTransform to shared common package to fix eventing RBAC race#2316

Open
creydr wants to merge 1 commit into
knative:mainfrom
creydr:fix/aggregation-rule-transform-eventing
Open

Move AggregationRuleTransform to shared common package to fix eventing RBAC race#2316
creydr wants to merge 1 commit into
knative:mainfrom
creydr:fix/aggregation-rule-transform-eventing

Conversation

@creydr

@creydr creydr commented Jun 11, 2026

Copy link
Copy Markdown
Member

channelable-manipulator is a Kubernetes aggregated ClusterRole — its aggregationRule tells the
aggregation controller to populate the rules field automatically from matching ClusterRoles.

The operator's manifestival reconciliation applies this ClusterRole from its manifest every ~16 seconds.
Because the manifest declares only the aggregationRule with rules: [], each apply temporarily
clears the aggregated rules. The Kubernetes aggregation controller re-fills them within ~50–200ms,
but during that window any RBAC check against the role returns 403 Forbidden.

This affects the eventing controller when it patches InMemoryChannels — roughly 0.6% of patch
attempts hit the race window, causing Subscription finalizer failures.

The root cause is that AggregationRuleTransform (which preserves the cluster's current rules
before applying) was only wired into KnativeServing but not KnativeEventing.

This PR addresses it by:

  • Moving AggregationRuleTransform from knativeserving/common/ to the shared reconciler/common/ package
  • Applying it inside the shared Transform() function, so all components (Serving, Eventing, Kafka) get it automatically
  • Removing the now-redundant call from the KnativeServing transformer chain
  • Adding an eventing-specific test case (channelable-manipulator) to the existing test

Release Note

Fix intermittent 403 RBAC errors for aggregated ClusterRoles (e.g. channelable-manipulator) by
preserving Kubernetes-managed rules during reconciliation for all components, not just Serving.

AggregationRuleTransform was only applied to KnativeServing but not to
KnativeEventing. This caused the operator to continuously overwrite the
rules of aggregated ClusterRoles (e.g. channelable-manipulator) with
empty rules from the manifest, creating a race condition with the
Kubernetes aggregation controller. During the race window, RBAC lookups
against the aggregated role return 403 Forbidden.

Move the transform from knativeserving/common to the shared
reconciler/common package and apply it inside Transform(), so every
component (Serving, Eventing, Kafka) benefits automatically.

Signed-off-by: Christoph Stäbler <cstabler@redhat.com>
@knative-prow

knative-prow Bot commented Jun 11, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: creydr
Once this PR has been reviewed and has the lgtm label, please assign aliok for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 11, 2026
@knative-prow knative-prow Bot requested review from houshengbo and kahirokunn June 11, 2026 08:35
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.89%. Comparing base (6b0f9d8) to head (a7bf656).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2316      +/-   ##
==========================================
+ Coverage   63.84%   63.89%   +0.05%     
==========================================
  Files          55       55              
  Lines        2478     2479       +1     
==========================================
+ Hits         1582     1584       +2     
+ Misses        777      776       -1     
  Partials      119      119              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@creydr

creydr commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/cc @maschmid

@knative-prow knative-prow Bot requested a review from maschmid June 11, 2026 11:02
@creydr

creydr commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

/cherry-pick release-1.22

@knative-prow-robot

Copy link
Copy Markdown
Contributor

@creydr: once the present PR merges, I will cherry-pick it on top of release-1.22 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@creydr

creydr commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

/cherry-pick release-1.21

@knative-prow-robot

Copy link
Copy Markdown
Contributor

@creydr: once the present PR merges, I will cherry-pick it on top of release-1.21 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants