OCPBUGS-59176: fix several failing tests in custom-dns jobs#31129
OCPBUGS-59176: fix several failing tests in custom-dns jobs#31129sadasu wants to merge 2 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@sadasu: This pull request references Jira Issue OCPBUGS-59176, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughTests and helpers were changed to detect whether cluster DNS is managed, prefer the cluster base domain, resolve load‑balancer addresses only when DNS is unmanaged, and route HTTP/gRPC client connections either via DNS or by dialing the LB address. ChangesDNS-aware routing, LB resolution, and client dialing
Sequence Diagram(s)sequenceDiagram
participant Test as Router Test
participant DNS as DNS API
participant Ingress as IngressController
participant LB as Load Balancer Service
participant Client as HTTP/gRPC Client
participant Route as Router Endpoint
Test->>DNS: getClusterBaseDomainName()
DNS-->>Test: baseDomain
Test->>Ingress: isDNSManaged()
Ingress-->>Test: DNSManaged status
alt DNS unmanaged
Test->>LB: getLoadBalancerAddress(serviceName)
LB-->>Test: lbAddress (IP/hostname)
else DNS managed
Note over Test: lbAddress = ""
end
alt lbAddress provided
Test->>Client: makeHTTPClient(..., lbAddress) / DialParams.Target=lbAddress
Note over Client: custom dialer uses lbAddress:port
Client->>LB: tcp connect to lbAddress:port
LB-->>Client: connection
else DNS resolution
Client->>DNS: resolve route hostname
DNS-->>Client: route IP
Client->>Route: connect to route IP:port
Route-->>Client: connection
end
Client->>Route: HTTP/gRPC request
Route-->>Client: response
Test->>Client: verify success
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sadasu The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
This is basically the fix added via #30131 with rebase conflicts fixed. |
|
@sadasu: This pull request references Jira Issue OCPBUGS-59176, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sadasu: This pull request references Jira Issue OCPBUGS-59176, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/router/gatewayapicontroller.go`:
- Around line 406-407: The test currently always runs assertHttpRouteConnection
which for unmanaged-DNS paths uses getLoadBalancerAddress and therefore requires
a LoadBalancer; guard these checks by first verifying the platform supports load
balancers (check loadBalancerSupported) and only call assertHttpRouteConnection
/ getLoadBalancerAddress when loadBalancerSupported is true (while preserving
the test coverage for unmanaged-DNS on platforms where an LB exists); apply the
same guard around the other occurrences referenced (the block around lines
1079-1097) so tests on platforms without LB support skip the connectivity check
instead of hanging.
In `@test/extended/router/http2.go`:
- Around line 644-671: The new isDNSManaged(oc *exutil.CLI, timeout
time.Duration) conflicts with the existing isDNSManaged(oc *exutil.CLI) in
package router; rename the new function (e.g., isDNSManagedWithTimeout or
isDNSManagedTimeout) and update all callers to the new name and signature:
change calls at http2.go:504, grpc-interop.go:333, and gatewayapicontroller.go
lines referenced (around 807 and 1080) to pass the timeout argument and call the
renamed function; ensure the function body and returned types remain the same
and only the identifier and call sites are adjusted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d894b75a-2375-4917-96ed-96cf3881c44f
📒 Files selected for processing (6)
test/extended/router/gatewayapicontroller.gotest/extended/router/grpc-interop.gotest/extended/router/grpc-interop/clientconn.gotest/extended/router/h2spec.gotest/extended/router/http2.gotest/extended/router/idle.go
| assertHttpRouteConnection(oc, gw+"-openshift-default", defaultRoutename) | ||
| }) |
There was a problem hiding this comment.
Unmanaged-DNS path now implicitly requires a LoadBalancer.
This always executes connectivity checks, and for unmanaged DNS it always resolves through getLoadBalancerAddress(...). On platforms where loadBalancerSupported == false, this can hang/fail waiting for an LB that never appears.
Suggested guard (preserve unmanaged-DNS coverage where LB exists)
-assertHttpRouteConnection(oc, gw+"-openshift-default", defaultRoutename)
+if managedDNS || loadBalancerSupported {
+ assertHttpRouteConnection(oc, gw+"-openshift-default", defaultRoutename)
+} else {
+ g.Skip("Skipping HTTPRoute connectivity check: unmanaged DNS and no LoadBalancer support")
+}Also applies to: 1079-1097
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/router/gatewayapicontroller.go` around lines 406 - 407, The
test currently always runs assertHttpRouteConnection which for unmanaged-DNS
paths uses getLoadBalancerAddress and therefore requires a LoadBalancer; guard
these checks by first verifying the platform supports load balancers (check
loadBalancerSupported) and only call assertHttpRouteConnection /
getLoadBalancerAddress when loadBalancerSupported is true (while preserving the
test coverage for unmanaged-DNS on platforms where an LB exists); apply the same
guard around the other occurrences referenced (the block around lines 1079-1097)
so tests on platforms without LB support skip the connectivity check instead of
hanging.
| func isDNSManaged(oc *exutil.CLI, timeout time.Duration) (bool, error) { | ||
| var status operatorv1.ConditionStatus | ||
| conditionFound := false | ||
|
|
||
| if err := wait.PollUntilContextTimeout(context.Background(), time.Second, timeout, true, func(ctx context.Context) (bool, error) { | ||
| ic, err := oc.AdminOperatorClient().OperatorV1().IngressControllers("openshift-ingress-operator").Get(ctx, "default", metav1.GetOptions{}) | ||
| if err != nil { | ||
| e2e.Logf("Failed to get default ingresscontroller: %v, retrying...", err) | ||
| return false, nil | ||
| } | ||
| for _, condition := range ic.Status.Conditions { | ||
| if condition.Type == string(operatorv1.DNSManagedIngressConditionType) { | ||
| status = condition.Status | ||
| conditionFound = true | ||
| return true, nil | ||
| } | ||
| } | ||
| // DNSManaged condition not present, assume DNS is managed (default behavior) | ||
| e2e.Logf("DNSManaged condition not found on default ingresscontroller, assuming DNS is managed") | ||
| return true, nil | ||
| }); err != nil { | ||
| return false, err | ||
| } | ||
| if !conditionFound { | ||
| return true, nil | ||
| } | ||
| return status != operatorv1.ConditionFalse, nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify duplicate symbol definitions and mixed call signatures.
set -euo pipefail
echo "== Function definitions named isDNSManaged =="
rg -n '^\s*func\s+isDNSManaged\s*\(' test/extended/router
echo
echo "== Call sites of isDNSManaged =="
rg -n '\bisDNSManaged\s*\(' test/extended/router/http2.go test/extended/router/grpc-interop.go test/extended/router/gatewayapicontroller.goRepository: openshift/origin
Length of output: 998
isDNSManaged is redeclared in package router and will break compilation.
Line 644 introduces func isDNSManaged(oc *exutil.CLI, timeout time.Duration) (bool, error), but test/extended/router/gatewayapicontroller.go:620 already defines func isDNSManaged(oc *exutil.CLI) bool. Go does not support function overloading—both function definitions in the same package cause a redeclaration compile error.
Rename the new function to avoid collision:
-func isDNSManaged(oc *exutil.CLI, timeout time.Duration) (bool, error) {
+func getDefaultIngressDNSManagedStatus(oc *exutil.CLI, timeout time.Duration) (bool, error) {Update call sites in http2.go:504, grpc-interop.go:333, and gatewayapicontroller.go:807, 1080.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/router/http2.go` around lines 644 - 671, The new
isDNSManaged(oc *exutil.CLI, timeout time.Duration) conflicts with the existing
isDNSManaged(oc *exutil.CLI) in package router; rename the new function (e.g.,
isDNSManagedWithTimeout or isDNSManagedTimeout) and update all callers to the
new name and signature: change calls at http2.go:504, grpc-interop.go:333, and
gatewayapicontroller.go lines referenced (around 807 and 1080) to pass the
timeout argument and call the renamed function; ensure the function body and
returned types remain the same and only the identifier and call sites are
adjusted.
Some e2e tests are failing with the job "gcp-custom-dns" for featuregate "GCPClusterHostedDNSInstall" which is promoted to GA in 4.20. In the "custom-dns" cluster OpenShift will start static CoreDNS pods to provide DNS resolution for API, Internal API and Ingress services that are essential for cluster creation. After cluster deployment is completed, the customer will update their external DNS solution with the same assigned LB IP addresses used for the configuration of the internal CoreDNS instance. The failing tests like http2 and grpc tests use dedicated ingresscontrollers, and gateway also has separated LB and dnsrecord, so the default wildcard created by the new static CoreDNS won't work for those tests. Make below changes to fix the failing tests: - Update makeHTTPClient() DialContext to target LoadBalancer IP address directly when DNS is unmanaged, fixing both http2 and gatewayapi httproute tests. - Update grpc Dial() to target LoadBalancer IP address directly when DNS is unmanaged, fixing the grpc test. - Extract getLoadBalancerAddress() as a reusable helper from assertGatewayLoadbalancerReady() for use across http2, grpc and gatewayapi tests. - Update gatewayapi assertDNSRecordStatus() to check the Published condition per-zone based on whether DNS is managed or unmanaged. - Add isDNSManaged() helper that checks the DNSManaged condition on the default ingresscontroller, defaulting to managed=true when the condition is absent. - Add getClusterBaseDomainName() and update http2/grpc/h2spec shard ingresscontroller to generate domain based on cluster baseDomain instead of ingress domain (e.g. "e2e-test-xxx.apps.baseDomain"), avoiding custom ingresscontroller DNS overlapping with default wildcard "*.apps.baseDomain".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/router/gatewayapi_upgrade.go`:
- Around line 141-143: The test currently gates connectivity checks behind
t.managedDNS so the unmanaged-DNS code path in assertHttpRouteConnection never
runs; change the condition around the pre- and post-upgrade HTTP checks
(currently checking t.loadBalancerSupported && t.managedDNS) to always run when
t.loadBalancerSupported is true, and call assertHttpRouteConnection with the
appropriate gwServiceName for the unmanaged case (use
gatewayName+"-openshift-default" for managed and pass the gwServiceName argument
when t.managedDNS is false) so both managed and unmanaged DNS paths are
validated; update both occurrences near where assertHttpRouteConnection is
invoked (lines around the current calls that build
gatewayName+"-openshift-default" and the later call that should pass
gwServiceName) accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5038e62e-a26c-4934-bb58-fc38fb07f58e
📒 Files selected for processing (7)
test/extended/router/gatewayapi_upgrade.gotest/extended/router/gatewayapicontroller.gotest/extended/router/grpc-interop.gotest/extended/router/grpc-interop/clientconn.gotest/extended/router/h2spec.gotest/extended/router/http2.gotest/extended/router/idle.go
✅ Files skipped from review due to trivial changes (1)
- test/extended/router/h2spec.go
🚧 Files skipped from review as they are similar to previous changes (3)
- test/extended/router/grpc-interop/clientconn.go
- test/extended/router/gatewayapicontroller.go
- test/extended/router/grpc-interop.go
| if t.loadBalancerSupported && t.managedDNS { | ||
| g.By("Verifying HTTP connectivity before upgrade") | ||
| assertHttpRouteConnection(t.hostname) | ||
| assertHttpRouteConnection(t.oc, t.gatewayName+"-openshift-default", t.hostname) |
There was a problem hiding this comment.
Unmanaged-DNS connectivity validation is still gated out
Line 141 and Line 185 still require t.managedDNS, so the new unmanaged-DNS path in assertHttpRouteConnection (enabled by passing gwServiceName on Line 143/187) never runs in this upgrade test. This leaves the exact scenario from custom-dns jobs unvalidated here.
Suggested fix
- if t.loadBalancerSupported && t.managedDNS {
+ if t.loadBalancerSupported {
g.By("Verifying HTTP connectivity before upgrade")
assertHttpRouteConnection(t.oc, t.gatewayName+"-openshift-default", t.hostname)
e2e.Logf("HTTPRoute connectivity verified before upgrade")
}
...
- if t.loadBalancerSupported && t.managedDNS {
+ if t.loadBalancerSupported {
g.By("Verifying HTTP connectivity after upgrade")
assertHttpRouteConnection(t.oc, t.gatewayName+"-openshift-default", t.hostname)
}Also applies to: 185-187
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/router/gatewayapi_upgrade.go` around lines 141 - 143, The test
currently gates connectivity checks behind t.managedDNS so the unmanaged-DNS
code path in assertHttpRouteConnection never runs; change the condition around
the pre- and post-upgrade HTTP checks (currently checking
t.loadBalancerSupported && t.managedDNS) to always run when
t.loadBalancerSupported is true, and call assertHttpRouteConnection with the
appropriate gwServiceName for the unmanaged case (use
gatewayName+"-openshift-default" for managed and pass the gwServiceName argument
when t.managedDNS is false) so both managed and unmanaged DNS paths are
validated; update both occurrences near where assertHttpRouteConnection is
invoked (lines around the current calls that build
gatewayName+"-openshift-default" and the later call that should pass
gwServiceName) accordingly.
Remove duplicate 'var err error' declaration that caused compilation error. The err variable is already declared at line 601, so reuse it at line 612.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/extended/router/gatewayapicontroller.go (1)
406-406:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSkip the unmanaged-DNS connectivity path when the platform has no LoadBalancer support.
This now falls back to
getLoadBalancerAddress(...)for every unmanaged-DNS run. On VSphere/BareMetal/Equinix,loadBalancerSupportedis false, so the test waits for a LoadBalancer ingress that will never appear instead of skipping cleanly.🛠️ Minimal guard
- assertHttpRouteConnection(oc, gw+"-openshift-default", defaultRoutename) + if managedDNS || loadBalancerSupported { + assertHttpRouteConnection(oc, gw+"-openshift-default", defaultRoutename) + } else { + g.Skip("Skipping HTTPRoute connectivity check: unmanaged DNS and no LoadBalancer support") + }Also applies to: 1072-1090
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/router/gatewayapicontroller.go` at line 406, The test calls assertHttpRouteConnection which triggers getLoadBalancerAddress for the unmanaged-DNS path and blocks on a LoadBalancer ingress even on platforms without LoadBalancer support; update the test to check the platform capability (the loadBalancerSupported flag or equivalent) before invoking the unmanaged-DNS connectivity path and skip calling assertHttpRouteConnection / getLoadBalancerAddress when loadBalancerSupported is false (also apply the same guard to the other occurrences around lines 1072-1090) so the test cleanly skips the LoadBalancer-dependent checks on VSphere/BareMetal/Equinix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@test/extended/router/gatewayapicontroller.go`:
- Line 406: The test calls assertHttpRouteConnection which triggers
getLoadBalancerAddress for the unmanaged-DNS path and blocks on a LoadBalancer
ingress even on platforms without LoadBalancer support; update the test to check
the platform capability (the loadBalancerSupported flag or equivalent) before
invoking the unmanaged-DNS connectivity path and skip calling
assertHttpRouteConnection / getLoadBalancerAddress when loadBalancerSupported is
false (also apply the same guard to the other occurrences around lines
1072-1090) so the test cleanly skips the LoadBalancer-dependent checks on
VSphere/BareMetal/Equinix.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 74cc852c-6939-4f81-8656-465de3a70ce0
📒 Files selected for processing (1)
test/extended/router/gatewayapicontroller.go
|
Scheduling required tests: |
|
@sadasu: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Some e2e tests are failing with the job "gcp-custom-dns" for featuregate "GCPClusterHostedDNSInstall" which is promoted to GA in 4.20. In the "custom-dns" cluster OpenShift will start static CoreDNS pods to provide DNS resolution for API, Internal API and Ingress services that are essential for cluster creation. After cluster deployment is completed, the customer will update their external DNS solution with the same assigned LB IP addresses used for the configuration of the internal CoreDNS instance.
The failing tests like http2 and grpc tests use dedicated ingresscontrollers, and gateway also has separated LB and dnsrecord, so the default wildcard created by the new static CoreDNS won't work for those tests.
Make below changes to fix the failing tests:
Summary by CodeRabbit