Skip to content

NO-JIRA: Add thread-safe AddSchemes() to fix concurrent map writes#1401

Open
hongkailiu wants to merge 1 commit into
openshift:mainfrom
hongkailiu:fix-e2e
Open

NO-JIRA: Add thread-safe AddSchemes() to fix concurrent map writes#1401
hongkailiu wants to merge 1 commit into
openshift:mainfrom
hongkailiu:fix-e2e

Conversation

@hongkailiu

@hongkailiu hongkailiu commented Jun 9, 2026

Copy link
Copy Markdown
Member

Fixes "fatal error: concurrent map writes" in e2e-agnostic-operator [1, 2].

The issue occurred because multiple tests (e.g., controller_test.go and
start_integration_test.go) calling addSchemes() would
modified the global scheme.Scheme map without synchronization.

Changes:

  • Created pkg/internal/schema.go with AddSchemes() using sync.Once
  • Moved addSchemes() from pkg/start/start.go to avoid import cycles
    (proposal <- cvo <- start creates cycle when proposal imports start)
  • Updated pkg/start/start.go to call internal.AddSchemes()
  • Updated pkg/proposal/controller_test.go to call internal.AddSchemes()
  • Added pkg/internal/schema_test.go with concurrent safety test

The sync.Once pattern ensures scheme registration executes exactly once
across all concurrent callers, preventing race conditions.

[1]. https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1386/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/2063953653860929536

[2]. https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1399/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/2064194902241054720

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Summary by CodeRabbit

  • Refactor

    • Centralized API scheme registration into a single internal utility to remove duplicated registration logic.
  • Tests

    • Added a concurrent test to ensure scheme registration is safe and reliable when invoked from multiple goroutines.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 9, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@hongkailiu: This pull request explicitly references no jira issue.

Details

In response to this:

Fixes "fatal error: concurrent map writes" when e2e integration tests run in parallel.

The issue occurred because multiple tests calling t.Parallel() would concurrently execute NewControllerContext() -> addSchemes(), which modified the global scheme.Scheme map without synchronization.

Changes:

  • Created pkg/internal/schema.go with AddSchemes() using sync.Once
  • Moved addSchemes() from pkg/start/start.go to avoid import cycles (proposal <- cvo <- start creates cycle when proposal imports start)
  • Updated pkg/start/start.go to call internal.AddSchemes()
  • Updated pkg/proposal/controller_test.go to call internal.AddSchemes()
  • Added pkg/internal/schema_test.go with concurrent safety test

The sync.Once pattern ensures scheme registration executes exactly once across all concurrent callers, preventing race conditions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Walkthrough

This PR centralizes registration of the proposalv1alpha1 API into a concurrency-safe helper internal.AddSchemes() (uses sync.Once and caches the first error), adds a concurrent test that calls it from multiple goroutines, and updates tests and startup code to call the new helper instead of performing direct registration.

Changes

Centralized scheme registration

Layer / File(s) Summary
Centralized scheme registration with concurrency safety
pkg/internal/schema.go, pkg/internal/schema_test.go
New AddSchemes() registers proposalv1alpha1 into the global Kubernetes scheme exactly once via sync.Once, caches the first error, and a concurrent test invokes AddSchemes() from 10 goroutines to ensure thread-safety.
Migration of callers to centralized registration
pkg/proposal/controller_test.go, pkg/start/start.go
Test init and startup code remove direct proposalv1alpha1 registration and the local helper, switching to internal.AddSchemes() and returning its error.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo tests found in PR. Only standard Go test function TestAddSchemesConcurrent added with stable, deterministic name containing no dynamic values.
Test Structure And Quality ✅ Passed The custom check addresses Ginkgo test quality, but this PR adds standard Go tests (testing.T), not Ginkgo BDD tests. The codebase uses only standard Go testing, making this check not applicable.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. The added TestAddSchemesConcurrent is a standard Go unit test, not a Ginkgo test, and thus the MicroShift compatibility check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added; only standard Go unit test TestAddSchemesConcurrent for concurrent safety, which tests internal schema registration and makes no SNO topology assumptions.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces no deployment manifests, scheduling constraints, or workload definitions—only internal refactoring to fix concurrent map writes in Kubernetes API scheme registration.
Ote Binary Stdout Contract ✅ Passed PR introduces no process-level stdout writes; no fmt.Print/log.Print/klog.Print found; uses sync.Once safely for scheme registration without corrupting OTE JSON communication.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds only a standard Go unit test (TestAddSchemesConcurrent), not Ginkgo e2e tests, so the IPv6/disconnected check doesn't apply.
No-Weak-Crypto ✅ Passed No weak crypto patterns found. PR only adds concurrency-safe scheme registration with sync.Once; no MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB, custom crypto, or insecure secret comparisons.
Container-Privileges ✅ Passed PR contains only Go source/test files with no container or Kubernetes manifests to evaluate for security privileges.
No-Sensitive-Data-In-Logs ✅ Passed No logging of sensitive data (passwords, tokens, APIs keys, PII, session IDs, or customer data) found. Error messages are generic and safe.
Title check ✅ Passed The title accurately summarizes the main change: adding a thread-safe AddSchemes() function to resolve concurrent map write issues in e2e tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/internal/schema_test.go (1)

11-35: 💤 Low value

Test effectively validates concurrent safety.

The test correctly verifies that AddSchemes() can be called concurrently without panics or errors, which addresses the reported "fatal error: concurrent map writes."

💡 Optional enhancement to verify scheme registration

Consider adding an assertion after the concurrent calls to confirm the scheme is actually registered:

 	// Check if any goroutine encountered an error
 	for err := range errChan {
 		t.Errorf("AddSchemes failed: %v", err)
 	}
+
+	// Verify the scheme is actually registered
+	if !scheme.Scheme.Recognizes(schema.GroupVersionKind{
+		Group:   "lightspeed.openshift.io",
+		Version: "v1alpha1",
+		Kind:    "Proposal",
+	}) {
+		t.Error("proposalv1alpha1 scheme was not registered")
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/internal/schema_test.go` around lines 11 - 35, After the concurrent
AddSchemes() calls in TestAddSchemesConcurrent, add an assertion that the
expected scheme/GVK is actually registered to ensure registration happened (not
just absence of errors); locate the check after wg.Wait() and close(errChan)
and, using the same scheme registry used by AddSchemes (e.g., the global scheme
or the registry type manipulated by AddSchemes), lookup the expected
type/GroupVersionKind and fail the test with t.Fatalf/t.Errorf if the lookup
returns not found or nil. Ensure this assertion uses the same identifying
symbols as AddSchemes so the test verifies both concurrency safety and
successful registration.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/internal/schema_test.go`:
- Around line 11-35: After the concurrent AddSchemes() calls in
TestAddSchemesConcurrent, add an assertion that the expected scheme/GVK is
actually registered to ensure registration happened (not just absence of
errors); locate the check after wg.Wait() and close(errChan) and, using the same
scheme registry used by AddSchemes (e.g., the global scheme or the registry type
manipulated by AddSchemes), lookup the expected type/GroupVersionKind and fail
the test with t.Fatalf/t.Errorf if the lookup returns not found or nil. Ensure
this assertion uses the same identifying symbols as AddSchemes so the test
verifies both concurrency safety and successful registration.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 59bf3fcf-20a5-4eed-b457-fee44959795f

📥 Commits

Reviewing files that changed from the base of the PR and between 7d7dea4 and e6a2115.

📒 Files selected for processing (4)
  • pkg/internal/schema.go
  • pkg/internal/schema_test.go
  • pkg/proposal/controller_test.go
  • pkg/start/start.go

Fixes "fatal error: concurrent map writes" in e2e-agnostic-operator [1, 2].

The issue occurred because multiple tests (e.g., controller_test.go and
start_integration_test.go) calling addSchemes() would
modified the global scheme.Scheme map without synchronization.

Changes:
- Created pkg/internal/schema.go with AddSchemes() using sync.Once
- Moved addSchemes() from pkg/start/start.go to avoid import cycles
  (proposal <- cvo <- start creates cycle when proposal imports start)
- Updated pkg/start/start.go to call internal.AddSchemes()
- Updated pkg/proposal/controller_test.go to call internal.AddSchemes()
- Added pkg/internal/schema_test.go with concurrent safety test

The sync.Once pattern ensures scheme registration executes exactly once
across all concurrent callers, preventing race conditions.

[1]. https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1386/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/2063953653860929536

[2]. https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1399/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator/2064194902241054720

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@hongkailiu hongkailiu changed the title [wip]NO-JIRA: Add thread-safe AddSchemes() to fix concurrent map writes NO-JIRA: Add thread-safe AddSchemes() to fix concurrent map writes Jun 9, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2026
@hongkailiu

Copy link
Copy Markdown
Member Author

/test e2e-agnostic-ovn-techpreview-serial-2of3

@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@hongkailiu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants