Add some more ExpectNoError checks to the e2e tests #4623

danwinship · 2024-08-14T16:52:25Z

What this PR does and why is it needed

More fixes pulled out of @martinkennelly 's branch to start adding support for running ovn-k e2e downstream in OCP. This just takes a bunch of places where the test cases were doing something and ignoring possible errors, and changes them to assert that an error didn't occur instead.

(In theory, this should not result in us discovering any new bugs in the existing code, since if something actually went wrong, the test would presumably fail later on anyway. It just means that in case of future bugs, the tests should fail closer to the point where something actually went wrong, rather than failing more mysteriously later.)

Does this PR introduce a user-facing change?

NONE

tssurya

ping me when CI runs, I can merge this LGTM

tssurya · 2024-08-19T15:15:36Z

control-plane's are failing..

danwinship · 2024-08-19T15:43:10Z

test/e2e/e2e.go

@@ -789,7 +789,8 @@ var _ = ginkgo.Describe("e2e control plane", func() {
 				pod.Name != "etcd-ovn-control-plane" &&
 				!strings.HasPrefix(pod.Name, "ovs-node") {
 				framework.Logf("%q", pod.Namespace)
-				e2epod.DeletePodWithWaitByName(context.TODO(), f.ClientSet, pod.Name, ovnNs)
+				err = e2epod.DeletePodWithWaitByName(context.TODO(), f.ClientSet, pod.Name, ovnNs)
+				framework.ExpectNoError(err, fmt.Sprintf("failed to delete pod %s", pod.Name))


LOL, this is causing e2e failures because of a bug in e2epod.DeletePodWithWaitByName; it deletes the pod, and then loops until no pod with the given name exists. But if something else recreates the pod immediately, it doesn't notice that the pod that exists isn't the same one as the one it deleted.

(So, without the ExpectNoError check, the failure mode would be "it deletes the pod successfully, then gets confused and loops for 5 minutes and then the function returns an error, but we ignore the error and keep going (which is fine, because the pod we meant to delete was deleted)", whereas with this PR, the failure mode is "it deletes the pod successfully, then gets confused and loops for 5 minutes and then the function returns an error, which causes us to fail the test".)

danwinship · 2024-08-19T16:29:37Z

Filed kubernetes/kubernetes#126785. Let's see what happens there.

This bug may be affecting other places where we kill control-plane pods...

danwinship · 2024-09-05T12:53:27Z

Filed kubernetes/kubernetes#126785. Let's see what happens there.

No consensus on a short-term fix, so I'll just fix this locally.

…t restarted. Signed-off-by: Dan Winship <danwinship@redhat.com>

Signed-off-by: Dan Winship <danwinship@redhat.com> Co-authored-by: Martin Kennelly <mkennell@redhat.com>

danwinship requested a review from a team as a code owner August 14, 2024 16:52

danwinship requested a review from girishmg August 14, 2024 16:52

github-actions bot added feature/egress-ip Issues related to EgressIP feature area/e2e-testing feature/egress-gateway All issues related to ICNI/APBR labels Aug 14, 2024

martinkennelly previously approved these changes Aug 19, 2024

View reviewed changes

martinkennelly force-pushed the e2e-no-error branch from ae664b6 to 1a0c8d8 Compare August 19, 2024 10:36

tssurya self-assigned this Aug 19, 2024

tssurya previously approved these changes Aug 19, 2024

View reviewed changes

danwinship commented Aug 19, 2024

View reviewed changes

danwinship marked this pull request as draft August 19, 2024 16:28

danwinship dismissed stale reviews from tssurya and martinkennelly via 55c72d9 September 5, 2024 13:00

danwinship force-pushed the e2e-no-error branch from 1a0c8d8 to 55c72d9 Compare September 5, 2024 13:00

danwinship marked this pull request as ready for review September 5, 2024 13:00

danwinship force-pushed the e2e-no-error branch 2 times, most recently from 9f75c2d to 514f3a6 Compare September 9, 2024 13:55

danwinship and others added 2 commits September 17, 2024 10:24

Reimplement e2epod.DeletePodWithWait to correctly handle pods that ge…

a84e695

…t restarted. Signed-off-by: Dan Winship <danwinship@redhat.com>

Add some more ExpectNoError checks to the e2e tests

46f2293

Signed-off-by: Dan Winship <danwinship@redhat.com> Co-authored-by: Martin Kennelly <mkennell@redhat.com>

danwinship force-pushed the e2e-no-error branch from 514f3a6 to 46f2293 Compare September 17, 2024 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some more ExpectNoError checks to the e2e tests #4623

Add some more ExpectNoError checks to the e2e tests #4623

danwinship commented Aug 14, 2024

tssurya left a comment

tssurya commented Aug 19, 2024

danwinship Aug 19, 2024

danwinship commented Aug 19, 2024

danwinship commented Sep 5, 2024

Add some more ExpectNoError checks to the e2e tests #4623

Are you sure you want to change the base?

Add some more ExpectNoError checks to the e2e tests #4623

Conversation

danwinship commented Aug 14, 2024

What this PR does and why is it needed

Does this PR introduce a user-facing change?

tssurya left a comment

Choose a reason for hiding this comment

tssurya commented Aug 19, 2024

danwinship Aug 19, 2024

Choose a reason for hiding this comment

danwinship commented Aug 19, 2024

danwinship commented Sep 5, 2024