-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some more ExpectNoError checks to the e2e tests #4623
base: master
Are you sure you want to change the base?
Conversation
ae664b6
to
1a0c8d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping me when CI runs, I can merge this LGTM
control-plane's are failing.. |
@@ -789,7 +789,8 @@ var _ = ginkgo.Describe("e2e control plane", func() { | |||
pod.Name != "etcd-ovn-control-plane" && | |||
!strings.HasPrefix(pod.Name, "ovs-node") { | |||
framework.Logf("%q", pod.Namespace) | |||
e2epod.DeletePodWithWaitByName(context.TODO(), f.ClientSet, pod.Name, ovnNs) | |||
err = e2epod.DeletePodWithWaitByName(context.TODO(), f.ClientSet, pod.Name, ovnNs) | |||
framework.ExpectNoError(err, fmt.Sprintf("failed to delete pod %s", pod.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, this is causing e2e failures because of a bug in e2epod.DeletePodWithWaitByName
; it deletes the pod, and then loops until no pod with the given name exists. But if something else recreates the pod immediately, it doesn't notice that the pod that exists isn't the same one as the one it deleted.
(So, without the ExpectNoError
check, the failure mode would be "it deletes the pod successfully, then gets confused and loops for 5 minutes and then the function returns an error, but we ignore the error and keep going (which is fine, because the pod we meant to delete was deleted)", whereas with this PR, the failure mode is "it deletes the pod successfully, then gets confused and loops for 5 minutes and then the function returns an error, which causes us to fail the test".)
Filed kubernetes/kubernetes#126785. Let's see what happens there. This bug may be affecting other places where we kill control-plane pods... |
No consensus on a short-term fix, so I'll just fix this locally. |
55c72d9
1a0c8d8
to
55c72d9
Compare
9f75c2d
to
514f3a6
Compare
…t restarted. Signed-off-by: Dan Winship <danwinship@redhat.com>
Signed-off-by: Dan Winship <danwinship@redhat.com> Co-authored-by: Martin Kennelly <mkennell@redhat.com>
514f3a6
to
46f2293
Compare
What this PR does and why is it needed
More fixes pulled out of @martinkennelly 's branch to start adding support for running ovn-k e2e downstream in OCP. This just takes a bunch of places where the test cases were doing something and ignoring possible errors, and changes them to assert that an error didn't occur instead.
(In theory, this should not result in us discovering any new bugs in the existing code, since if something actually went wrong, the test would presumably fail later on anyway. It just means that in case of future bugs, the tests should fail closer to the point where something actually went wrong, rather than failing more mysteriously later.)
Does this PR introduce a user-facing change?