-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make GetActiveNetworkForNamespace use a controller #4662
Conversation
16c7a46
to
4765fad
Compare
4765fad
to
8dade70
Compare
aefe4f8
to
716d844
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just comments on the NAD controller first.
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
Hey @trozet maybe labeling the NAD when it's created by UserDefinedNetwork controller with role=primary alow for fast listing instead of iterating ?. |
not sure what iterating you are referring to |
716d844
to
175c1ad
Compare
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Outdated
Show resolved
Hide resolved
go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go
Show resolved
Hide resolved
Full review, so no more comments. |
I have one extra comment, sorry. I think it would be useful to have a method It would be directly usable there, because we already have a NetInfo, to be aware of what is the NAD serving in that namespace. Otherwise we would have to use The alternative implementation for those methods would be to parse the pod annotation and find the matching NAD from |
Wait, there is an alternative implementation which is to parse the namespace from the nad namespaced names from |
b38aeb3
to
be9d053
Compare
flake?
https://github.com/ovn-org/ovn-kubernetes/actions/runs/10932941999/job/30350584438?pr=4662 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
@@ -72,6 +75,7 @@ func NewCNIServer(factory factory.NodeWatchFactory, kclient kubernetes.Interface | |||
} | |||
|
|||
if util.IsNetworkSegmentationSupportEnabled() { | |||
s.nadController = nadController |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still have a use for the nad lister below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think so. I'll see if I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
} | ||
|
||
// NetAttachDefinitionController handles namespaced scoped NAD events and | ||
// manages cluster scoped networks defined in those NADs. NADs are mostly | ||
// referred from pods to give them access to the network. Different NADs can | ||
// define the same network as long as those definitions are actually equal. | ||
// Unexpected situations are handled on best effort basis but improper NAD | ||
// adminstration can lead to undefined behavior in referred from running pods. | ||
// administration can lead to undefined behavior in referred from running pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/in/if
👼
for _, nadName := range networkNADs { | ||
nadNs, _, err := cache.SplitMetaNamespaceKey(nadName) | ||
if err != nil { | ||
klog.Errorf("Cannot parse NAD reference: %q on network info: %q", nadName, netName) | ||
continue | ||
} | ||
if nadNs == namespace { | ||
filteredNADs = append(filteredNADs, nadName) | ||
} | ||
} | ||
n := util.CopyNetInfo(network) | ||
n.SetNADs(filteredNADs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we already had the NADs per namespace in NetInfo, which is an information we already come by naturally in sync method, we wouldn't have to do this search every time GetActiveNetworkForNamespace is called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also wondering...should we disallow multiple NADs per namespace on a primary network?
So there could be multiple NADs per namespace for a primary network as long as only one of them is flagged as primary. That would be an interesting use case if, let's say, selected pods want to customize how they attach to the primary network, for example use sr-iov. The one flagged as primary would be used by default. The others would be used through the network selection element annotation.
We currently disallow the network selection element annotation for primary networks so there's actually no use for that. However we can't prevent the mere existence of multiple NADs per namespace for a primary network. So we either have a way to tell around which of the NADs is really primary or we don't make anyone aware of those NADs since they have no use for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To disallow, we would just have to check against the ensured network instead of the nad network:
if ensureNetwork.IsPrimaryNetwork() && primaryNAD != "" && primaryNAD != key
Then in this method you would just have to do
n.SetNADs(primaryNAD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we have a way to really tell which is the active one, since we have the primaryNADs map? Why don't i just use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For GetActiveNetworkForNamespace
that is enough.
But we also have the network controllers which have a NetInfo potentially containing references for multiple NADs per namespace for a primary network.
If GetActiveNetworkForNamespace
just returns the one, the NetInfo that network controllers are using should just contain the one for the namespace, for consistency I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make that change and then we can avoid having to loop. We can just return that primary NAD until someone complains ;)
go-controller/pkg/util/util.go
Outdated
klog.Warningf("Skipping nad '%s/%s' as active network after failing parsing it with %v", nad.Namespace, nad.Name, err) | ||
continue | ||
} | ||
func NewUnknownActiveNetworkError(namespace string) *UnknownActiveNetworkError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not anymore after we guarantee the controller only renders a single primary network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
func (a *PodAllocator) getActiveNetworkForPod(pod *corev1.Pod) (util.NetInfo, error) { | ||
activeNetwork, err := util.GetActiveNetworkForNamespace(pod.Namespace, a.nadLister) | ||
activeNetwork, err := a.nadController.GetActiveNetworkForNamespace(pod.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to flag some use cases where, if we had per-namespace NAD information in NetInfo, we probably wouldn't need access to the nad controller to query this information. This is one of them.
We would lose the protection of actually querying UDNs, true.
Just for awareness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this was intentional for some reason:
func (a *PodAllocator) GetNetworkRole(pod *corev1.Pod) (string, error) {
if !util.IsNetworkSegmentationSupportEnabled() {
// if user defined network segmentation is not enabled
// then we know pod's primary network is "default" and
// pod's secondary network is NOT its primary network
if a.netInfo.IsDefault() {
return types.NetworkRolePrimary, nil
}
return types.NetworkRoleSecondary, nil
}
activeNetwork, err := a.getActiveNetworkForPod(pod)
if err != nil {
return "", err
}
if activeNetwork.GetNetworkName() == a.netInfo.GetNetworkName() {
return types.NetworkRolePrimary, nil
}
if a.netInfo.IsDefault() {
// if default network was not the primary network,
// then when UDN is turned on, default network is the
// infrastructure-locked network forthis pod
return types.NetworkRoleInfrastructure, nil
}
return types.NetworkRoleSecondary, nil
}
It is comparing the network name of the network found for the pod and the netinfo held in the allocator. @qinqon maybe you wrote the original code and can explain why it is like this? We are wondering why you cant just use the NetInfo on the allocator itself to know whether or not this is a primary or secondary role?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trozet activeNetwork
is the pod's namespace primary network so this checks if the pod allocator network is the pod's namespace primary network (default or UDN) and terturn Primary according to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
However this can be equally achieved if we could query the controller NetInfo for the namespaces served by the network and checking if the one we are concerned about is among them.
Anyway, I will follow up with this on a different PR maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense too, thanks @jcaamano
return util.GetActiveNetworkForNamespace(namespace, nadLister) | ||
// and is a wrapper around GetActiveNetworkForNamespace | ||
func (bnc *BaseNetworkController) getActiveNetworkForNamespace(namespace string) (util.NetInfo, error) { | ||
return bnc.nadController.GetActiveNetworkForNamespace(namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to flag some use cases where, if we had per-namespace NAD information in NetInfo, we probably wouldn't need access to the nad controller to query this information. This is one of them.
We would lose the protection of actually querying UDNs, true.
Just for awareness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a result of me changing this function to be a receiver for bnc rather than the common network controller (that doesn't have netInfo). However, it looks like this function is used to determine pod wiring for the default controller, so I think makes sense to have the UDN protection here.
go-controller/pkg/network-attach-def-controller/network_attach_def_controller_test.go
Show resolved
Hide resolved
be9d053
to
5bf1b7c
Compare
Leverages the NAD Controller to also track primary networks. This should be faster than listing all NADs in a namespace and iterating through them to find the primary network for a namespace. Refactors network manager and nad controller. Changes the network cache to only be stored in network manager, rather than having a duplicate cache in NAD controller. NAD Controller is modified to only hold references to network names for NADs, and then queries network manager to get the networks. NAD Controller and Network Manager are now thread safe. Tests are also modified: - NAD Controller now implemented as an interface so that unit tests can implement a fake NAD controller for getting active network - Fully functioning OVN controller unit tests still leverage a fully functioning NAD controller. NCM may be faked out in these cases. - Fake entities moved to testing/nad package Co-authored-by: Jaime Caamaño Ruiz <jcaamano@redhat.com> Signed-off-by: Tim Rozet <trozet@redhat.com>
Previously the function would check the informer cache which provides (in practical terms) real time access to if there is a primary network for this namespace as a NAD. This did not cover the case where a UDN could be generating the NAD, but the NAD has not been created yet. This commit addresses it by checking for a UDN (only when a NAD is not found for the namespace). This should help in UDN cases (the case we really care about) to ensure that the function returns an accurate result. Especially during times of delay where perhaps a pod handler and UDN handler are firing almost simultaneously. Right now the function will return an error indicating there is a primary network that hasn't been processed yet, letting the caller know to retry. We have seen in the past with time sensitive handlers that they may give up after 5 or so retries before the dependent controller has finished. In those cases, we usually add a wait in the handler itself and have around a 200-300ms timeout while we wait for the dependent handler (in this case the NAD handler) to finish. This commit may be expanded on in the future to accomodate that where needed. Signed-off-by: Tim Rozet <trozet@redhat.com>
5bf1b7c
to
8024aed
Compare
rebased |
In case this PR is about improving performance when looking for the namespace's primary UDN, we can label de NAD to improve listing performance. |
Leverages the NAD Controller to also track primary networks. This should be faster than listing all NADs in a namespace and iterating through them to find the primary network for a namespace.