OCPBUGS-77078: fix(mtu): handle MTU defaults for all HyperShift platforms#2933
OCPBUGS-77078: fix(mtu): handle MTU defaults for all HyperShift platforms#2933jparrill wants to merge 1 commit intoopenshift:masterfrom
Conversation
WalkthroughThreads an Changes
Sequence DiagramsequenceDiagram
participant Controller as Controller\n(operconfig_controller)
participant Renderer as Renderer\n(render)
participant OVN as OVN\n(ovn_kubernetes)
participant MTUProbe as MTUProbe\n(mtu_probe)
Controller->>Controller: Determine isHyperShift (HostedControlPlane != nil)
Controller->>Renderer: FillDefaults(conf, previous, hostMTU, isHyperShift)
Renderer->>OVN: fillOVNKubernetesDefaults(conf, previous, hostMTU, isHyperShift)
alt sc.MTU is nil
OVN->>OVN: Check previous MTU
alt previous MTU exists
OVN->>OVN: Use previous MTU
else hostMTU > 0
OVN->>OVN: Use hostMTU - encapOverhead
else isHyperShift == true
OVN->>OVN: Use safe default (1500) - encapOverhead
else
OVN->>MTUProbe: GetDefaultMTU()
MTUProbe-->>OVN: Return probed MTU
OVN->>OVN: Use probed MTU or error if invalid
end
end
OVN->>Renderer: Return spec with MTU set
Renderer->>Controller: Return configured spec
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.38.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope ... [truncated 17231 characters] ... ired in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/v2@v2.0.0-20250922181213-ec3ebc5fd46b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.34.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.34.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
|
@jparrill: This pull request references Jira Issue OCPBUGS-77078, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jparrill The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/network/ovn_kubernetes_test.go (1)
887-965: Add a HyperShift+IPsec safety-net test case.The table already covers most branches; adding
isHyperShift=true,hostMTU=0, andIPsecConfigenabled would lock in the expected1500 - (geneve + ipsec)behavior.➕ Suggested table entry
{ name: "When HyperShift with no user MTU and no probed MTU it should use safe default minus geneve overhead", setup: func(conf *operv1.NetworkSpec) { conf.DefaultNetwork.OVNKubernetesConfig = nil }, hostMTU: 0, isHyperShift: true, expectedMTU: 1400, // 1500 (safe default) - 100 (geneve overhead) }, + { + name: "When HyperShift safety-net is used with IPsec it should include IPsec overhead", + setup: func(conf *operv1.NetworkSpec) { + conf.DefaultNetwork.OVNKubernetesConfig.IPsecConfig = &operv1.IPsecConfig{} + }, + hostMTU: 0, + isHyperShift: true, + expectedMTU: 1354, // 1500 - 100 (geneve) - 46 (IPsec) + }, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 887 - 965, Add a new table entry in TestFillOVNKubernetesDefaults to cover HyperShift + IPsec: create a test case where setup sets conf.DefaultNetwork.OVNKubernetesConfig.IPsecConfig = &operv1.IPsecConfig{} (or ensures it's non-nil), hostMTU = 0, isHyperShift = true, and expectedMTU = 1354 (1500 - 100 geneve - 46 ipsec); this ensures fillOVNKubernetesDefaults (called in the test) validates the safe-default minus combined geneve and IPsec overhead behavior for OVNKubernetesConfig when OVNKubernetesConfig and IPsecConfig are present on HyperShift.pkg/network/ovn_kubernetes.go (1)
1085-1091: Extract the HyperShift safe-default MTU to a named constant.Using a named constant here avoids drift with other MTU default paths and makes future tuning safer.
♻️ Suggested cleanup
+const defaultHyperShiftHostMTU uint32 = 1500 ... case isHyperShift: // In HyperShift, GetDefaultMTU() reads the management cluster's pod // network interface, not the hosted cluster's worker node uplink. // This path should not be reached because probeMTU always returns a // valid value for HyperShift, but guard against it as a safety measure. log.Printf("WARNING: No probed MTU available for HyperShift cluster, using safe default of 1500") - mtu = uint32(1500) - getOVNEncapOverhead(conf) + mtu = defaultHyperShiftHostMTU - getOVNEncapOverhead(conf)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes.go` around lines 1085 - 1091, Replace the hard-coded 1500 literal used for the HyperShift safe-default MTU with a named constant (e.g., defaultHyperShiftMTU) to avoid drift and make tuning explicit; in the isHyperShift branch (where probeMTU can be absent) declare the constant near related MTU defaults, then change the assignment mtu = uint32(1500) - getOVNEncapOverhead(conf) to use that constant (e.g., mtu = uint32(defaultHyperShiftMTU) - getOVNEncapOverhead(conf)) and update the log.Printf message if desired to reference the constant name for clarity; keep references to isHyperShift, probeMTU, getOVNEncapOverhead, and mtu so the change is localized and consistent with other MTU default paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 887-965: Add a new table entry in TestFillOVNKubernetesDefaults to
cover HyperShift + IPsec: create a test case where setup sets
conf.DefaultNetwork.OVNKubernetesConfig.IPsecConfig = &operv1.IPsecConfig{} (or
ensures it's non-nil), hostMTU = 0, isHyperShift = true, and expectedMTU = 1354
(1500 - 100 geneve - 46 ipsec); this ensures fillOVNKubernetesDefaults (called
in the test) validates the safe-default minus combined geneve and IPsec overhead
behavior for OVNKubernetesConfig when OVNKubernetesConfig and IPsecConfig are
present on HyperShift.
In `@pkg/network/ovn_kubernetes.go`:
- Around line 1085-1091: Replace the hard-coded 1500 literal used for the
HyperShift safe-default MTU with a named constant (e.g., defaultHyperShiftMTU)
to avoid drift and make tuning explicit; in the isHyperShift branch (where
probeMTU can be absent) declare the constant near related MTU defaults, then
change the assignment mtu = uint32(1500) - getOVNEncapOverhead(conf) to use that
constant (e.g., mtu = uint32(defaultHyperShiftMTU) - getOVNEncapOverhead(conf))
and update the log.Printf message if desired to reference the constant name for
clarity; keep references to isHyperShift, probeMTU, getOVNEncapOverhead, and mtu
so the change is localized and consistent with other MTU default paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 67632cbd-22a0-46be-b1c6-73980a18680f
📒 Files selected for processing (7)
pkg/controller/operconfig/mtu_probe.gopkg/controller/operconfig/mtu_probe_test.gopkg/controller/operconfig/operconfig_controller.gopkg/network/ovn_kubernetes.gopkg/network/ovn_kubernetes_test.gopkg/network/render.gopkg/network/render_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/network/ovn_kubernetes_test.go (1)
1202-1210: Add a HyperShift+IPsec MTU defaulting test case.This table now covers HyperShift safe-default MTU, but it does not assert the IPsec overhead path (
1500 - 100 - 46), which is part of the new MTU rules.🧪 Suggested test-case addition
+ { + name: "When HyperShift with IPsec and no user/probed MTU it should use safe default minus geneve+IPsec overhead", + setup: func(conf *operv1.NetworkSpec) { + conf.DefaultNetwork.OVNKubernetesConfig = &operv1.OVNKubernetesConfig{ + IPsecConfig: &operv1.IPsecConfig{Mode: operv1.IPsecModeFull}, + } + }, + hostMTU: 0, + isHyperShift: true, + expectedMTU: 1354, // 1500 - 100 - 46 + },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 1202 - 1210, Add a new table test case alongside the existing HyperShift case in ovn_kubernetes_test.go to assert the IPsec overhead path: create a case named something like "When HyperShift with no user MTU and no probed MTU with IPsec it should use safe default minus geneve and ipsec overhead" that leaves DefaultNetwork.OVNKubernetesConfig nil for HyperShift but ensures IPsec is enabled in the cluster config used by the test, set hostMTU to 0 and isHyperShift to true, and set expectedMTU to 1354 (1500 safe default − 100 geneve − 46 IPsec); ensure the test harness that computes MTU (the function exercised by these table cases) picks up the IPsec flag so this path is validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 1175-1177: The test's previous function currently returns the same
pointer as conf which aliases the current spec; change it to return an
independent copy to avoid masking defaulting mutations—e.g. in the anonymous
previous func that takes (conf *operv1.NetworkSpec) return conf.DeepCopy() if
operv1.NetworkSpec implements DeepCopy(), otherwise allocate and return a new
operv1.NetworkSpec value copied from *conf (e.g. ns := *conf; return &ns) so
previous no longer points to the same object as conf in the IPsec test case.
---
Nitpick comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 1202-1210: Add a new table test case alongside the existing
HyperShift case in ovn_kubernetes_test.go to assert the IPsec overhead path:
create a case named something like "When HyperShift with no user MTU and no
probed MTU with IPsec it should use safe default minus geneve and ipsec
overhead" that leaves DefaultNetwork.OVNKubernetesConfig nil for HyperShift but
ensures IPsec is enabled in the cluster config used by the test, set hostMTU to
0 and isHyperShift to true, and set expectedMTU to 1354 (1500 safe default − 100
geneve − 46 IPsec); ensure the test harness that computes MTU (the function
exercised by these table cases) picks up the IPsec flag so this path is
validated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c6435c97-08ed-480a-87d3-73632557155d
📒 Files selected for processing (7)
pkg/controller/operconfig/mtu_probe.gopkg/controller/operconfig/mtu_probe_test.gopkg/controller/operconfig/operconfig_controller.gopkg/network/ovn_kubernetes.gopkg/network/ovn_kubernetes_test.gopkg/network/render.gopkg/network/render_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/network/ovn_kubernetes.go
- pkg/network/render_test.go
- pkg/controller/operconfig/operconfig_controller.go
| previous: func(conf *operv1.NetworkSpec) *operv1.NetworkSpec { | ||
| return conf | ||
| }, |
There was a problem hiding this comment.
Avoid aliasing previous and conf in the IPsec test case.
previous currently returns the same pointer as conf, which can hide regressions when defaulting mutates the current spec.
💡 Proposed fix
- previous: func(conf *operv1.NetworkSpec) *operv1.NetworkSpec {
- return conf
- },
+ previous: func(conf *operv1.NetworkSpec) *operv1.NetworkSpec {
+ return conf.DeepCopy()
+ },📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| previous: func(conf *operv1.NetworkSpec) *operv1.NetworkSpec { | |
| return conf | |
| }, | |
| previous: func(conf *operv1.NetworkSpec) *operv1.NetworkSpec { | |
| return conf.DeepCopy() | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/network/ovn_kubernetes_test.go` around lines 1175 - 1177, The test's
previous function currently returns the same pointer as conf which aliases the
current spec; change it to return an independent copy to avoid masking
defaulting mutations—e.g. in the anonymous previous func that takes (conf
*operv1.NetworkSpec) return conf.DeepCopy() if operv1.NetworkSpec implements
DeepCopy(), otherwise allocate and return a new operv1.NetworkSpec value copied
from *conf (e.g. ns := *conf; return &ns) so previous no longer points to the
same object as conf in the IPsec test case.
…7078) In HyperShift the MTU prober job cannot run because CNO has no access to guest worker nodes. Previously only AWS (9001) and Azure (1500) had hardcoded defaults; other platforms fell through to the prober and failed silently. MTU resolution flow: 1. User sets tunnel MTU (--ovn-kubernetes-mtu) → preserved as-is 2. No user MTU → probeMTU returns platform default (AWS=9001, Azure=1500, others=1500), FillDefaults subtracts encap overhead 3. Re-reconcile → FillDefaults carries forward the previous value Changes: - probeMTU: add default case returning 1500 for non-AWS/Azure HyperShift platforms, fix Azure log message that incorrectly said "AWS cluster" - fillOVNKubernetesDefaults: accept isHyperShift bool, use switch with 4 cases (previous, probed, hypershift-safe-default, standalone-fallback) to prevent GetDefaultMTU() from returning the management cluster MTU - FillDefaults/fillDefaultNetworkDefaults: propagate isHyperShift - Tests: table-driven tests for fillOVNKubernetesDefaults covering all scenarios, platform coverage for probeMTU Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/network/ovn_kubernetes_test.go (1)
1175-1177:⚠️ Potential issue | 🟡 MinorAvoid aliasing
previouswith the spec under test.Line 1175 returns
confdirectly, sopreviousis mutated alongside the object being defaulted. That can hide regressions in the preserved-MTU path; return a copy instead.💡 Proposed fix
previous: func(conf *operv1.NetworkSpec) *operv1.NetworkSpec { - return conf + return conf.DeepCopy() },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 1175 - 1177, The test helper that sets previous currently returns the same pointer (previous: func(conf *operv1.NetworkSpec) *operv1.NetworkSpec { return conf }), which aliases the spec under test and allows mutations to hide regressions; change that function to return an independent copy of conf (e.g., use conf.DeepCopy() or otherwise clone the NetworkSpec) so previous is not mutated during defaulting and the preserved-MTU path is correctly tested.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 1175-1177: The test helper that sets previous currently returns
the same pointer (previous: func(conf *operv1.NetworkSpec) *operv1.NetworkSpec {
return conf }), which aliases the spec under test and allows mutations to hide
regressions; change that function to return an independent copy of conf (e.g.,
use conf.DeepCopy() or otherwise clone the NetworkSpec) so previous is not
mutated during defaulting and the preserved-MTU path is correctly tested.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 85715fb7-b78c-4c76-ba36-c4a05f6f90c1
📒 Files selected for processing (7)
pkg/controller/operconfig/mtu_probe.gopkg/controller/operconfig/mtu_probe_test.gopkg/controller/operconfig/operconfig_controller.gopkg/network/ovn_kubernetes.gopkg/network/ovn_kubernetes_test.gopkg/network/render.gopkg/network/render_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/controller/operconfig/mtu_probe.go
- pkg/controller/operconfig/operconfig_controller.go
- pkg/network/render.go
|
just FWIW, the 26 Security checks were failing before our PR. |
|
/retest-required |
1 similar comment
|
/retest-required |
|
Hey @kyrtapz do you mind review? |
| } | ||
|
|
||
| func fillOVNKubernetesDefaults(conf, previous *operv1.NetworkSpec, hostMTU int) { | ||
| func fillOVNKubernetesDefaults(conf, previous *operv1.NetworkSpec, hostMTU int, isHyperShift bool) { |
There was a problem hiding this comment.
I don't see a need for changing this, it should be enough to change the prober logic.
There was a problem hiding this comment.
Humm maybe, the scenario we need to address with this PR is when the prober fails. other ones I think are mostly covered. Maybe we can set a default if the prober fails from the beginning?
|
@jparrill: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
In HyperShift the MTU prober job cannot run because CNO has no access to guest worker nodes. Previously only AWS (9001) and Azure (1500) had hardcoded defaults in
probeMTU; all other platforms fell through to the prober job and failed silently. Additionally, theGetDefaultMTU()fallback infillOVNKubernetesDefaultswould return the management cluster's pod network MTU — not the hosted cluster's worker node uplink MTU.This PR adds proper handling for all three MTU scenarios and guards against the incorrect
GetDefaultMTU()fallback in HyperShift.Related PRs
Fixes
MTU Resolution Flow
Three scenarios resolved
User explicitly sets MTU (e.g.
--ovn-kubernetes-mtu 1300):NeedMTUProbereturnsfalse,probeMTUis never called,FillDefaultspreserves the user value as-is. Works on all platforms.No user MTU, commercial AWS (jumbo frames):
probeMTUreturns 9001,FillDefaultscomputes9001 - 100 = 8901. Same behavior as before.No user MTU, ISO/classified AWS or other platforms: Previously broken — prober job couldn't run and
GetDefaultMTU()returned the wrong MTU. NowprobeMTUreturns a safe default (1500 for non-AWS/Azure), andfillOVNKubernetesDefaultsguards against theGetDefaultMTU()fallback in HyperShift. Users in ISO regions should set MTU explicitly via--ovn-kubernetes-mtu.Changes
pkg/controller/operconfig/mtu_probe.godefaultcase in HyperShiftswitchreturningdefaultHCPMTU = 1500for all platforms without a known uplink MTUpkg/controller/operconfig/mtu_probe_test.gopkg/controller/operconfig/operconfig_controller.goisHyperShiftfrominfraStatus.HostedControlPlane != nilisHyperShifttoFillDefaultscallspkg/network/ovn_kubernetes.goisHyperShift boolparameter tofillOVNKubernetesDefaultsif-elsewithswitch(4 cases: previous, probed, hypershift-safe-default, standalone-fallback)GetDefaultMTU()in HyperShift — it returns the management cluster's pod MTU, not the hosted cluster's worker node uplinkpkg/network/render.goisHyperShift boolthroughFillDefaultsandfillDefaultNetworkDefaultspkg/network/ovn_kubernetes_test.goTestFillOVNKubernetesDefaultscovering 5 scenarios:pkg/network/render_test.gofillDefaultshelper for newisHyperShiftparameterTest Plan
CGO_ENABLED=0 GOOS=linux go test -c ./pkg/network/ ./pkg/controller/operconfig/--ovn-kubernetes-mtu→ MTU should be 8901--ovn-kubernetes-mtu 1300→ MTU should be 1300🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests