NE-2422: Add dual-stack ingress e2e tests for AWSDualStackInstall featuregate#30904
NE-2422: Add dual-stack ingress e2e tests for AWSDualStackInstall featuregate#30904alebedev87 wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@alebedev87: This pull request references NE-2422 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new Ginkgo e2e test for AWS dual-stack router Route HTTPS reachability (NLB and CLB scenarios) and extends router shard Config to accept Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/router/dualstack.go`:
- Around line 58-59: The test currently creates shardFQDN under defaultDomain so
the created host can be served by the default ingress controller; change the
admission/probing logic to ensure we exercise the shard controller by (1)
requiring the admitted ingress entry’s ingress.RouterName equals
shardIngressCtrl.Name when waiting for admission (check the ingress object's
RouterName field) and (2) pinning the subsequent HTTP probe to the shard ingress
entry’s canonical hostname returned in the admitted ingress status (use the
admitted status.host/canonical name instead of the generic host/shardFQDN).
Ensure references to shardFQDN, shardIngressCtrl and ingress.RouterName are
updated accordingly so the probe targets the shard ingress entry only.
- Around line 217-236: The shell snippet built in function
waitForDualStackRouteResponse uses Bash-only [[ ... ]] tests which will fail
under /bin/sh; update the cmd string to use POSIX-compatible [ ... ] tests
instead (replace [[ "${rc:-0}" -eq 0 ]] and [[ $code -eq 200 ]] with equivalent
[ ... ] checks), ensure proper quoting/spacing for the single-bracket tests and
parameter expansion, and keep the same logic around rc and code checks so
RunHostCmd continues to detect curl errors and 200 responses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2b52642d-eba4-405a-95df-f79dce04d4d5
📒 Files selected for processing (2)
test/extended/router/dualstack.gotest/extended/router/shard/shard.go
734f097 to
bbeb835
Compare
|
@alebedev87: This pull request references NE-2422 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
3185a90 to
80ca2cf
Compare
|
@alebedev87: This pull request references NE-2422 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Scheduling required tests: |
|
@alebedev87: This pull request references NE-2422 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
cf53d48 to
3dbb657
Compare
|
@alebedev87: This pull request references NE-2422 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
41b285c to
63f33fb
Compare
|
Scheduling required tests: |
|
@alebedev87: This pull request references NE-2422 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/assign @davidesalerno |
| g.By("Disabling client IP preservation on the NLB target group to avoid hairpin issues (OCPBUGS-63219)") | ||
| routerSvcName := "router-" + shardIngressCtrl.Name | ||
| err = oc.AsAdmin().Run("annotate").Args("service", "-n", "openshift-ingress", routerSvcName, | ||
| "service.beta.kubernetes.io/aws-load-balancer-target-group-attributes=preserve_client_ip.enabled=false").Execute() |
There was a problem hiding this comment.
Noted 👍 I know this is pretty obvious, but the IP will not be preserved, because we are not enabling proxy protocol as the alternative solution for IP preservation (I don't think it's possible to override the CIO logic that enables proxy protocol). I think that is okay for this test.
When the solution to OCPBUGS-63219 merges, I think this will become a NO-OP (the plan is to set service.beta.kubernetes.io/aws-load-balancer-target-group-attributes=preserve_client_ip.enabled=false anyways), but I will update that this test either way.
There was a problem hiding this comment.
I know this is pretty obvious, but the IP will not be preserved, because we are not enabling proxy protocol as the alternative solution for IP preservation (I don't think it's possible to override the CIO logic that enables proxy protocol). I think that is okay for this test.
Right, as of now accept-proxy instruction is not added to HAProxy frontends if the LB type is NLB. So to avoid false negatives without additional code changes, I had to go the hacky way.
davidesalerno
left a comment
There was a problem hiding this comment.
There is no major issue, I shared only some question that could eventually help to improve a little bit the code.
In general I would like to highlight also the fact that the NLB and Classic LB test cases are sharing a lot of code. The main differences are:
- LB type (NLB vs Classic)
- shard subdomain prefix (nlb. vs clb.)
- The NLB annotation for client IP preservation
- The NLB test verifies both IPv4 and IPv6, Classic only IPv4
We could consider a table-driven approach or a shared helper to reduce duplication.
Other tests in the repo (e.g., http2.go) use table-driven patterns for similar variations.
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| } | ||
|
|
||
| func waitForRouteAdmitted(ctx context.Context, oc *exutil.CLI, ns, name, host string) { |
There was a problem hiding this comment.
Is there a specific reason we used a different approach of the other wait functions?
Here the interval is 5 seconds (half than 10) and the timeout is not configurable.
There was a problem hiding this comment.
Updated to 10 seconds and configurable timeout.
test/extended/router/dualstack.go
Outdated
| } | ||
|
|
||
| func waitForRouteAdmitted(ctx context.Context, oc *exutil.CLI, ns, name, host string) { | ||
| err := wait.PollImmediate(5*time.Second, 5*time.Minute, func() (bool, error) { |
There was a problem hiding this comment.
wait.PollImmediate is deprecated in favor of wait.PollUntilContextTimeout if it isn't a huge deal I'll migrate to the new one.
in this specific case something like:
err := wait.PollUntilContextTimeout(ctx, 5*time.Second, 5*time.Minute, true, func(ctx context.Context) (bool, error) { ... })
There was a problem hiding this comment.
Changed to wait.PollUntilContextTimeout.
test/extended/router/dualstack.go
Outdated
| o.Expect(err).NotTo(o.HaveOccurred(), "route was not admitted") | ||
| } | ||
|
|
||
| func waitForDNSResolution(ns, execPodName, host string, timeout time.Duration) error { |
There was a problem hiding this comment.
This function and the waitForRouteResponse are not accepting the context, if we would like to use the wait.PollUntilContextTimeout we need to add it
|
|
||
| g.It("should be reachable via IPv4 and IPv6 through a dual-stack ingress controller", func() { | ||
| ctx := context.Background() | ||
|
|
There was a problem hiding this comment.
Most router tests register an AfterEach hook to dump logs on failure — e.g., exutil.DumpPodLogsStartingWith(...) or dumping route ingress status. This test doesn't have one. On failure in CI, there would be no automatic log collection to help debug.
Is there any specific reason we decided to not use the same approach?
There was a problem hiding this comment.
Added AfterEach with a dump of router logs.
…turegate Add e2e tests that verify ingress connectivity on AWS dual-stack clusters gated by the AWSDualStackInstall featuregate. Two test cases are included: - NLB: Creates an IngressController shard with a Network Load Balancer and validates that routes are accessible over both IPv4 and IPv6. - Classic LB: Creates an IngressController shard with a Classic Load Balancer and validates that routes are accessible over IPv4. Each test deploys a backend pod/service, an edge-terminated route, and uses curl from an exec pod to verify connectivity. DNS propagation is validated separately before connectivity checks. Route stability is confirmed by requiring 3 consecutive HTTP 200 responses. Client IP preservation is disabled on the NLB target group to avoid IPv6 hairpin traffic issues (OCPBUGS-63219). The shard.Config is extended with an optional LoadBalancer field to configure the IngressController. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
63f33fb to
29b6fcf
Compare
Right, this might have been a possibility. However, I like the readability of 2 separate test cases as long as there are only 2. I think it's still acceptable to not go the table test way. I extracted some more common parts (domain and dualstack check) though to reduce the boilerplate. |
|
Scheduling required tests: |
|
/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/cloud-provider-aws#135 openshift/installer#10380 openshift/cluster-kube-apiserver-operator#2079 openshift/cluster-ingress-operator#1376 |
|
@alebedev87: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
The two added tests passed but execution time is larger than 5m. I guess the LB DNS propagation might be culprit. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87, davidesalerno The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by CI |
|
@lihongan: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Add e2e tests that verify ingress connectivity on AWS dual-stack clusters gated by
AWSDualStackInstall.Two test cases are included:
Each test deploys a backend pod/service, an edge-terminated route, and uses curl from an exec pod to verify connectivity. The
shard.Configis extended with an optionalLoadBalancerfield to configure the IngressController's load balancer type.DNS propagation is validated separately before connectivity checks. Route stability is confirmed by requiring 3 consecutive HTTP 200 responses. Client IP preservation is disabled on the NLB target group to avoid IPv6 hairpin traffic issues (OCPBUGS-63219).
Manual test
Clusterbot command to build custom payload:
Install config: