Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDN: Design routes and policies on L2's GR correctly. #4694

Merged
merged 12 commits into from
Sep 18, 2024

Conversation

tssurya
Copy link
Member

@tssurya tssurya commented Sep 3, 2024

Depends on #4554

What this PR does and why is it needed

Fixes: #4686
Fixes: #4682

Design:

What are the routes we need on L2 GR? It needs to be a combo of routes on L3's GR and ovn_cluster_router

Today this is what we have:

sh-5.2# ovn-nbctl lr-route-list GR_l2.network_ovn-worker                                                                                                                     
IPv4 Routes                                                                                                                                                                  
Route Table <main>:                                                                                                                                                          
           169.254.0.0/17               169.254.0.4 dst-ip rtoe-GR_l2.network_ovn-worker                                                                                     
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_l2.network_ovn-worker                                                                                     
                                                                                                                                                                             
IPv6 Routes                                                                                                                                                                  
Route Table <main>:                                                                                                                                                          
               fd69::/112                   fd69::4 dst-ip rtoe-GR_l2.network_ovn-worker                                                                                     
                     ::/0     fc00:f853:ccd:e793::1 dst-ip rtoe-GR_l2.network_ovn-worker        

which won't work for pod2Egress on LGW because we need to steer that traffic via mpX not via br-ex

In L3, we have the following in ovn_cluster_router:

               100.65.0.2                100.88.0.2 dst-ip --> not needed in L2 because SWITCH is directly connected to each of the GR
               100.65.0.3                100.65.0.3 dst-ip --> not needed in L2 because SWITCH is directly connected to each of the GR
               100.65.0.4                100.88.0.4 dst-ip --> not needed in L2 because SWITCH is directly connected to each of the GR
            10.128.0.0/24                100.88.0.2 dst-ip --> not needed in L2 because pod subnet is directly connected to GR
            10.128.2.0/24                100.88.0.4 dst-ip --> not needed in L2 because pod subnet is directly connected to GR
            10.128.1.0/24                10.128.1.2 src-ip --> NEEDED IN L2 on GR

So L2 routes will be:

sh-5.2# ovn-nbctl lr-route-list GR_l2.network_ovn-worker
IPv4 Routes
Route Table <main>:
          10.100.200.0/24              10.100.200.2 src-ip ----> this one
           169.254.0.0/17               169.254.0.4 dst-ip rtoe-GR_l2.network_ovn-worker
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_l2.network_ovn-worker

IPv6 Routes
Route Table <main>:
               fd69::/112                   fd69::4 dst-ip rtoe-GR_l2.network_ovn-worker
        2010:100:200::/60           2010:100:200::2 src-ip ---> this one
                     ::/0     fc00:f853:ccd:e793::1 dst-ip rtoe-GR_l2.network_ovn-worker

We also need this LRP:

        99 ip4.dst == 172.18.0.0/16 && ip4.src == 10.100.200.0/24         reroute              10.100.200.2
        99 ip6.dst == fc00:f853:ccd:e793::/64 && ip6.src == 2010:100:200::/60         reroute           2010:100:200::2

for pod2Egress to work properly on L2 if its pod2OtherNode traffic in addition to the LRSR we are adding because primary node subnet is a directly attached network to the GR

EIP on L2 will need appropriate changes to accommodate that.

TODO:

  • Evaluate if SNATing to 100.65 is needed for services reply of ingress traffic to work properly -> will be another PR

@github-actions github-actions bot added area/unit-testing Issues related to adding/updating unit tests area/e2e-testing labels Sep 3, 2024
@tssurya tssurya force-pushed the udn-design-l2-routes-policies branch 4 times, most recently from d7fd0f3 to 74debe3 Compare September 3, 2024 20:13
@tssurya tssurya marked this pull request as ready for review September 3, 2024 20:46
@tssurya tssurya requested a review from a team as a code owner September 3, 2024 20:46
@tssurya
Copy link
Member Author

tssurya commented Sep 3, 2024

This PR works for almost all scenarios except:

14. lr_in_ip_routing (northd.c:11686): ip4.dst == 172.18.0.0/16, priority 50, uuid 3d4e30cd                                                                                                                                            
    ip.ttl--;                                                                                                                                                                                                                          
    reg8[0..15] = 0;                                                                                                                                                                                                                   
    reg0 = ip4.dst;                                                                                                                                                                                                                    
    reg1 = 172.18.0.4;                                                                                                                                                                                                                 
    eth.src = 02:42:ac:12:00:04;                                                                                                                                                                                                       
    outport = "rtoe-GR_mzvwk_hogwarts_ovn-worker2";                                                                                                                                                                                    
    flags.loopback = 1;                                                                                                                                                                                                                
    next;       

need to revisit maybe the planB instead

Copy link
Contributor

@pperiyasamy pperiyasamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments.

// snat eth.dst == 76:23:f 169.254.0.12 10.128.0.0/14
// snat eth.dst == 76:23:f 169.254.0.12 2010:100:200::/64
// these SNATs are required for pod2Egress traffic in LGW mode and pod2SameNode traffic in SGW mode to function properly on UDNs
func (oc *SecondaryLayer2NetworkController) addUDNClusterSubnetEgressSNAT(localPodSubnets []*net.IPNet, routerName string, node *kapi.Node) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move addUDNClusterSubnetEgressSNAT into BaseSecondaryNetworkController and use it from secondary L2 and L3 controllers ? Looks like outputPort and routerName are different for L3 and L2, can be taken as arguments and remaining logic are same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I had that initially because it avoided code duplication.
then I decided to split it out because it was neater to track it specially because the L3 one is on cluster-router and this one is on GR and L3 is per node subnet this one is for entire cluster subnet
and in my previous prs.. I was encourage to not put things into BSNC since localnet doesn't require many of these things and hence decided to split it into l3 and l2 controllres.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a strong preference I can do either ways... I don't really mind.

@@ -456,7 +458,14 @@ func (oc *SecondaryLayer2NetworkController) addUpdateLocalNodeEvent(node *corev1
errs = append(errs, err)
oc.gatewaysFailed.Store(node.Name, true)
} else {
oc.gatewaysFailed.Delete(node.Name)
if util.IsNetworkSegmentationSupportEnabled() && oc.IsPrimaryNetwork() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about deleteNodeEvent ? do we need to delete local pod subnet SNATs explicitly there ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a node delete I thought about it.. in IC will totally delete the database and switches and routers and everything.. so ovnkube-node is totally gone.. and given we don't support L2 on nonIC we should be good.

go-controller/pkg/ovn/base_network_controller_secondary.go Outdated Show resolved Hide resolved
@tssurya
Copy link
Member Author

tssurya commented Sep 4, 2024

This PR works for almost all scenarios except:

14. lr_in_ip_routing (northd.c:11686): ip4.dst == 172.18.0.0/16, priority 50, uuid 3d4e30cd                                                                                                                                            
    ip.ttl--;                                                                                                                                                                                                                          
    reg8[0..15] = 0;                                                                                                                                                                                                                   
    reg0 = ip4.dst;                                                                                                                                                                                                                    
    reg1 = 172.18.0.4;                                                                                                                                                                                                                 
    eth.src = 02:42:ac:12:00:04;                                                                                                                                                                                                       
    outport = "rtoe-GR_mzvwk_hogwarts_ovn-worker2";                                                                                                                                                                                    
    flags.loopback = 1;                                                                                                                                                                                                                
    next;       

need to revisit maybe the planB instead

      2000                             ip4.dst==172.18.0.0/16         reroute                10.128.0.2

does the trick I think let's see if this works

@tssurya tssurya force-pushed the udn-design-l2-routes-policies branch 2 times, most recently from a27c7c9 to f83490c Compare September 6, 2024 09:54
@tssurya tssurya force-pushed the udn-design-l2-routes-policies branch from f83490c to ae5e2cf Compare September 7, 2024 15:25
@tssurya tssurya changed the title WIP: UDN: Design routes and policies on L2's GR correctly. UDN: Design routes and policies on L2's GR correctly. Sep 7, 2024
@tssurya tssurya force-pushed the udn-design-l2-routes-policies branch from ae5e2cf to e7a7fb5 Compare September 10, 2024 16:52
@tssurya tssurya added feature/user-defined-network-segmentation All PRs related to User defined network segmentation and removed do-not-merge/hold labels Sep 10, 2024
@tssurya tssurya force-pushed the udn-design-l2-routes-policies branch 3 times, most recently from c50cd3d to 9beda9a Compare September 16, 2024 18:50
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were not adding the re-route to mpX route for L2, this commit adds that

@@ -66,6 +67,41 @@ func (pbr *PolicyBasedRoutesManager) AddSameNodeIPPolicy(nodeName, mgmtPortIP st
return nil
}

// AddHostCIDRPolicy adds the following policy in local-gateway-mode for L2 topology
// 2000 ip4.dst==172.18.0.0/16 reroute 10.128.0.2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh 1500... not 2000

// AddHostCIDRPolicy adds the following policy in local-gateway-mode for L2 topology
// 2000 ip4.dst==172.18.0.0/16 reroute 10.128.0.2
// Since rtoe of GR is directly connected to the hostCIDR range in LGW even with the
// reroute to mp0 src-ip route the dst-ip based default OVN route takes precedence and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so its the directly connected route in the GR connected to 172.18.0.0 that is overriding your src-ip route? Should you match on src traffic being the pod subnet? I'm wondering if ovn-controller injecting control plane traffic back into OVS if it would reroute it towards mp0. There is an issue from the past similar to this. I'll try to remember it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you print the routes from ovn_cluster_router in this scenario? Would also be good to add that and more explanation to the description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes let me give a snapshot of this on the GR (since its L2)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so its the directly connected route in the GR connected to 172.18.0.0 that is overriding your src-ip route?

yeah exactly!

Should you match on src traffic being the pod subnet?

yea I thought about this as well, do you think just have a single LRP rerouting all of the traffic with podIP==srcIP is better than having a LRSR? I was trying to keep it as close as possible to L3 and wasn't sure if doing the LRSR will break other things or not

Copy link
Member Author

@tssurya tssurya Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sh-5.2# ovn-nbctl lr-route-list GR_l2.network_ovn-worker2
IPv4 Routes
Route Table <main>:
          10.100.200.0/24              10.100.200.2 src-ip --> added this in this PR
           169.254.0.0/17               169.254.0.4 dst-ip rtoe-GR_l2.network_ovn-worker2
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_l2.network_ovn-worker2
sh-5.2# ovn-nbctl lr-policy-list GR_l2.network_ovn-worker2                                                                                                                   
Routing Policies
      1500                           ip4.dst == 172.18.0.0/16         reroute              10.100.200.2 --> added this in this PR
      1004 inport == "rtos-l2.network_ovn_layer2_switch" && ip4.dst == 172.18.0.4 /* l2.network_ovn_layer2_switch */         reroute              10.100.200.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the explanation better lmk how it is now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed, ovn-controller generated traffic may be affected so we need to match the src ip subnet as pod subnet. Also, we need to lower the priority to <100 so that we dont override egress IP LRPs. Some of those will need to be updated to reroute to mp0 once egress IP is implemented for lgw mode @martinkennelly fyi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done in the recent push

@tssurya tssurya force-pushed the udn-design-l2-routes-policies branch from 9beda9a to 25d0368 Compare September 17, 2024 11:31
@tssurya tssurya force-pushed the udn-design-l2-routes-policies branch 2 times, most recently from 9591858 to 0e0cc00 Compare September 17, 2024 19:33
@tssurya
Copy link
Member Author

tssurya commented Sep 17, 2024

in the latest push I have addressed: #4694 (comment)

@tssurya tssurya force-pushed the udn-design-l2-routes-policies branch from 0e0cc00 to 081047d Compare September 18, 2024 09:32
kyrtapz and others added 12 commits September 18, 2024 11:46
In Layer2 networks there is no join switch, only the cluster switch.
Ensure that the ports are named appropriately, this is important for
the logical router policies created for local node access.

Signed-off-by: Patryk Diak <pdiak@redhat.com>
(cherry picked from commit 967923e)
For some reason we were not adding the reroutes
to mpX interface required in LGW in L2 topology
on the GR. This commit fixes that.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Co-Authored-by: Patryk Diak <pdiak@redhat.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
after this PR:

 unable to get logical router GR_l2.network_ovn-worker2: object not found

before this PR:

object not found

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
syncNodeManagementPort should be called after the gwManager
because we want to add the routes to the GR which is only
created when the gwManager is run

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
See https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/ovn/default_network_controller.go#L920
In UDN L3/L2 controllers we were missing
checking for mgmtPortFailed and accounting for that
as part of nodeupdates. So retry for any errors from
syncManagementPort was not happening.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@tssurya tssurya force-pushed the udn-design-l2-routes-policies branch from 081047d to 12b9838 Compare September 18, 2024 09:46
@tssurya
Copy link
Member Author

tssurya commented Sep 18, 2024

@trozet trozet merged commit 8017e9a into ovn-org:master Sep 18, 2024
38 of 39 checks passed
@tssurya tssurya added the kind/bug All issues that are bugs and PRs opened to fix bugs label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/e2e-testing area/unit-testing Issues related to adding/updating unit tests feature/user-defined-network-segmentation All PRs related to User defined network segmentation kind/bug All issues that are bugs and PRs opened to fix bugs
Projects
Status: Done
4 participants