Feature request: ability to configure weave #1171

sunghospark · 2016-12-15T18:29:50Z

It would be nice if we can configure weave through kops edit cluster command.
For example

networking:
  weave:
    ipalloc-range: 10.2.0.0/16

to set custom weave CIDR range to avoid possible IP clashing when cluster is launched in existing VPC.

The text was updated successfully, but these errors were encountered:

krisnova · 2016-12-16T11:27:36Z

I have heard this before, we would have to change the API for weave, and pass the value through to the manifest

Maybe we should wait for the api machinery in place first before looking at another API change?

sunghospark · 2016-12-16T18:09:09Z

I'm wondering if we can templatize the weave daemonset yaml file.

and add the IPALLOC_RANGE environment variable in daemonset configuration.
Something like

      containers:
          env:
            - name: IPALLOC_RANGE
              value: {{ WeaveConfig.IPALLOC_RANGE }}

I got this configuration from Weave documenatation

chrislovecnm · 2016-12-17T21:20:51Z

This is a great idea... @caseeker are you open to contributing this functionality?

Per @kris-nova's note this needs to pend on API machinery. We are going through a BIG refactor, but the design work could start today.

sunghospark · 2016-12-19T19:08:24Z

@chrislovecnm
I would love to contribute if some guidance is provided :)
I did try updating the daemonset yaml file in source and tested it, but that didn't work.

chrislovecnm · 2016-12-19T22:23:21Z

So this is the first place I would point you to -> #1194

Get the use cases on here, and then work on a high-level design. With this feature, the design can be super short. Like a paragraph. If you look at the PR for weave you will see the work I did for weave. That needs to be extended with a yaml replacement as well. I think we are doing that for a couple of other yaml templates, but we will need to walk you through it. In a nutshell the yaml files can use go templates for replacement. Basically call a go function. We are going to need to expand the API for this a bit as well. I would perfer a value in the yaml cluster spec, instead of a command line switch.

Questions?

sunghospark · 2016-12-19T22:40:26Z

@chrislovecnm sounds good, I'll prepare the PR,
Quick question for the implementation, I found out this weave yaml file is not the one that gets used in cluster.
The cluster state bucket addons/networking.weave still contains v1.8.0.20161116.yaml even when I built the source with v1.8.1.20161130.yaml

Can you tell me if this is correct behavior? If so I'm wondering where v1.8.0.20161116.yaml is coming from as I could not locate it in source code.

chrislovecnm · 2016-12-19T23:01:45Z

It is the correct file :) We have a bug in the versioning, which may be impacting you. Are you running on master? To work that out do you mind starting on another issue?

chrislovecnm · 2016-12-19T23:02:47Z

#1081 <- no design needed. We need to rename the weave file v1.8.1.yaml

sunghospark · 2016-12-20T17:02:58Z

I could make it working with updating the weave yaml file.
The problem was not from kops, I was updating on wrong src directory 😮

chrislovecnm · 2016-12-21T06:55:59Z

Well you figured it out!!!!

justinsb · 2016-12-28T04:15:24Z

I think this should use the nonMasqueradeCIDR, right?

chrislovecnm · 2016-12-28T16:00:18Z

On my phone, but you have to configure a component on the weave yaml

justinsb · 2017-01-17T06:33:54Z

Is there a reason why weave isn't using the nonMasqueradeCIDR value @chrislovecnm ?

chrislovecnm · 2017-01-18T22:38:49Z

@justinsb don't have enough understanding to answer that. Should we be??

justinsb · 2017-01-19T05:41:36Z

I think so. Non-masquerade CIDR is the name we decided on for the overlay network CIDR: https://github.com/kubernetes/kubernetes/blob/master/cluster/aws/options.md

justinsb · 2017-01-19T06:44:52Z

I agree that we should support this configuration, and we may already have a suitable field in the form of nonMasqueradeCidr. But, as per #1514, this isn't going to get fixed in 1.5.0, so I'm going to move this issue to 1.5.1. (We will hopefully up the release cadence substantially once we get these big 1.5.0 features out). Workarounds if you have a conflicting CIDR are to use kopeio-networking or calico.

And of course, contributions to fix this are welcome :-)

chrislovecnm · 2017-06-12T20:03:56Z

@austinmoore- this is a problem that I introduced when I first implemented weave / CNI :) I did not implement using the non-masquerade range with weave.

We need to figure out how to have a kubernetes upgrade work for users running the default IPALLOC_RANGE, and moving to the nonMasqueradeCidr range. Upgrades are the one thing that is blocking this. @jordanjennings tested upgrades a bunch and was not successful at getting weave to upgrade properly.

@bboreham what information can we provide you to give us more guidance. We have weave running with the default IP range, and we upgrade the cluster and set IPALLOC_RANGE to 100.64.0.0/10 weave stop functioning.

@jordanjennings can you provide @bboreham more information?

jordanjennings · 2017-06-12T20:36:23Z

@chrislovecnm Not much info to provide, all networking was 100% broken when I tried upgrading. I didn't capture any logs or anything because I wasn't really sure where to look. I can certainly give it a shot again and try to capture more information.

But my main confusion here still remains around how it's OK for any networking provider to use the same range for pod IPs as it does for service IPs. What am I not understanding here?

@justinsb What is nonMasqueradeCIDR? I am very confused by the name, and why this field should map to pod IPs. I'm also confused why in the default kops configuration this overlaps with service IPs. Any additional info here would be great.

chrislovecnm · 2017-06-12T20:43:10Z

@jordanjennings the nonMasquereCIDR is the network range set aside for pod IP ranges. I think the problem we are seeing is that the service range and the pod IP range are overlapping. They cannot with CNI, and I am not certain why they are overlapping.

@bboreham / @caseydavenport are experts on these IP ranges with CNI if we could get him to comment that would be awesome.

In a nutshell, we seem to be having service IP ranges and pod IP ranges overlapping, which I think is really bad with CNI. I think we have a bug with using the IP ranges that kops sets up with CNI. If that is the case I am not certain how Calico is functioning as well. I think weave is failing because it expects service and pod IP ranges to be separate.

chrislovecnm · 2017-06-12T20:50:03Z

Here are the values that we are setting up on a standard CNI install. These values are part of the kops API, and typically get applied as flags to different components.

  kubeAPIServer:
    serviceClusterIPRange: 100.64.0.0/13
  kubeControllerManager:
    # I do not think we need this with CNI
    clusterCIDR: 100.96.0.0/11
  kubeDNS:
    serverIP: 100.64.0.10
  kubeProxy:
    clusterCIDR: 100.96.0.0/11
  kubelet:
    clusterDNS: 100.64.0.10
    networkPluginName: cni
    nonMasqueradeCIDR: 100.64.0.0/10
  # these values are the base subnets that everything is setup from
  # I do not believe that these should overlap
  nonMasqueradeCIDR: 100.64.0.0/10
  serviceClusterIPRange: 100.64.0.0/13

caseydavenport · 2017-06-12T21:11:13Z

    # I do not think we need this with CNI
    clusterCIDR: 100.96.0.0/11

Not strictly orthogonal with CNI - some CNI plugins / configuration still respect these allocations.

the nonMasquereCIDR is the network range set aside for pod IP ranges. I think the problem we are seeing is that the service range and the pod IP range are overlapping. They cannot with CNI, and I am not certain why they are overlapping.

Yeah, can confirm - generally not a good idea to have overlapping Service / Pod / Node IP ranges.

chrislovecnm · 2017-06-12T21:13:21Z

@caseydavenport calico template is using nonMasquereCIDR directly, which can cause calico pod IP ranges to overlap with service IP ranges.

caseydavenport · 2017-06-12T21:17:22Z

Yeah, I think that should be considered a misconfiguration. The default values definitely shouldn't overlap. We might want some soft validation that warns if you've picked values that could cause problems (I'm not aware of any implementations where overlap is desired, but perhaps they're out there...)

chrislovecnm · 2017-06-12T21:34:43Z

So here is the word from @justinsb. The pod IP range is:

  kubeControllerManager:
    clusterCIDR: 100.96.0.0/11

All CNI providers should be using that CIDR for the pod IP range. @jordanjennings is going to implement and test for weave.

Talking with a couple of users we may want:

  nonMasqueradeCIDR: 100.64.0.0/10
  serviceClusterIPRange: 100.64.0.0/13
  # new API value to setup the pod IP range.
  podClusterIPRange: 100.96.0.0/11

Thanks everyone, this has been bugging me for AWHILE.

caseydavenport · 2017-06-12T21:44:30Z

Ah yes, I think that is right. I missed the distinction between nonMasqueradeCIDR and clusterCIDR.

The Calico manifests should be updated to use clusterCIDR :)

chrislovecnm · 2017-06-12T21:50:12Z

@jordanjennings here are some ideas on how we can approach this implementation:

Implement weave using .KubeControllerManager.ClusterCIDR in the weave manifest. This code change will give you an easy way to do the difficult testing.
Implement a new API value podClusterIPRange with the proper validation. The IP validation is going to require a review from someone who understands IP math. If @justinsb / @bboreham or @caseydavenport could assist, that would be awesome. I know my limitations, and IP math is something I have never figured out.
Migrate weave to use the PodClusterIPRange.

Thoughts anyone? We could break up step 2 into API implementation and then validation, but I think it is simpler to implement together.

jordanjennings · 2017-06-12T21:55:02Z

@chrislovecnm 👍 I'm testing out step 1 right now

chrislovecnm · 2017-06-12T22:03:25Z

@ottoyiu, since you have been doing a lot of the calico work as of late, wanted to loop you in on this issue.

TLDR - Calico should not be using the non-masq range, but should be using .KubeControllerManager.ClusterCIDR, since that range, does not overlap with the service IP range. Pod IP range overlapping with service IP range is NOT good.

cc: @blakebarnett

ottoyiu · 2017-06-12T23:01:50Z

@chrislovecnm thanks for the heads up. I'll test with the ClusterCIDR and report back. This ticket might be explaining these weird one-off service IP issues I've been experiencing.

jamesbucher · 2017-06-13T03:16:32Z

@austinmoore- I am not 100% sure that having a Weave Network outside of the nonMasqueradeCidr actually results in requests being MASQUERADED. I had 2 pods up and running in a cluster with NonMasqueradeCidr set to: 100.64.0.0/10 and the IPALLOC_RANGE for weavenet set to 192.168.240/21. With pods on separate nodes I am able to ping across pods and the requests appear to be coming directly through. I am able to use the Pod IP address (for the server) from a client (curl) to connect and the server (nginx). The server also actually sees the correct client podIP (verified from the nginx logs).

Correct me if I am wrong but wouldn't masqueraded traffic disallow this as it is a form of NAT (either the server or client would appear to have the wrong IP address)?

Regardless of weather or not this works its probably a really bad idea to have the internal weave network operate outside of the NonMasqueradeCidr range.

chrislovecnm · 2017-06-13T20:14:33Z

@jamesbucher we are not planning on having weave outside of the range. We need to run it in the pod range. We worked out which subnet is for services and which subnet is for pods, which are subsets of the non masq cidr.

Make sense?

Btw the first PR is in for weave. The other providers require updates and testing as well.

jamesbucher · 2017-06-13T23:54:01Z

@chrislovecnm Makes sense. I just wanted to note that it does actually work. I actually went in today and updated our clusters to ensure that Weave's CIDR corresponds to the podClusterIPRange so that we don't have to do some crazy network migration at a later date when our clusters are taking more traffic.

fejta-bot · 2017-12-27T15:40:38Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-26T15:49:17Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-02-25T16:34:48Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

krisnova added area/api area/networking Feature Request labels Dec 16, 2016

justinsb added this to the 1.5.0 milestone Dec 28, 2016

justinsb added the P0 label Jan 3, 2017

justinsb modified the milestones: 1.5.1, 1.5.0 Jan 19, 2017

krisnova added the Help Wanted! label Jan 19, 2017

chrislovecnm mentioned this issue Feb 3, 2017

Weave restarting after a few hours weaveworks/weave#2768

Closed

chrislovecnm added good-starter-issue and removed good-starter-issue labels Apr 30, 2017

This was referenced May 19, 2017

Set MTU size for Weave networking to support "jumbo frames" #2600

Closed

Make Weave MTU configurable and configure jumbo frame support for new clusters on AWS #2613

Merged

chrislovecnm mentioned this issue Jun 12, 2017

Subnet conflict with calico #2437

Closed

jordanjennings mentioned this issue Jun 13, 2017

Update weave to use the correct CIDR for pods #2717

Merged

ottoyiu mentioned this issue Jun 16, 2017

Update calico to use the correct CIDR for pods #2768

Merged

2 tasks

chrislovecnm mentioned this issue Jul 13, 2017

Release notes for Calico Pod CIDR changes made in #2768 #2911

Merged

edulop91 mentioned this issue Oct 31, 2017

Enable weave network encryption #3734

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 26, 2018

k8s-ci-robot closed this as completed Feb 25, 2018

rifelpet added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. and removed good-starter-issue labels Apr 18, 2020

Feature request: ability to configure weave #1171

Feature request: ability to configure weave #1171

Comments

sunghospark commented Dec 15, 2016

krisnova commented Dec 16, 2016

sunghospark commented Dec 16, 2016 • edited Loading

chrislovecnm commented Dec 17, 2016

sunghospark commented Dec 19, 2016

chrislovecnm commented Dec 19, 2016

sunghospark commented Dec 19, 2016

chrislovecnm commented Dec 19, 2016 • edited Loading

chrislovecnm commented Dec 19, 2016 • edited Loading

sunghospark commented Dec 20, 2016

chrislovecnm commented Dec 21, 2016

justinsb commented Dec 28, 2016

chrislovecnm commented Dec 28, 2016

justinsb commented Jan 17, 2017

chrislovecnm commented Jan 18, 2017

justinsb commented Jan 19, 2017

justinsb commented Jan 19, 2017

chrislovecnm commented Jun 12, 2017

jordanjennings commented Jun 12, 2017

chrislovecnm commented Jun 12, 2017

chrislovecnm commented Jun 12, 2017

caseydavenport commented Jun 12, 2017

chrislovecnm commented Jun 12, 2017

caseydavenport commented Jun 12, 2017

chrislovecnm commented Jun 12, 2017 • edited Loading

caseydavenport commented Jun 12, 2017

chrislovecnm commented Jun 12, 2017

jordanjennings commented Jun 12, 2017

chrislovecnm commented Jun 12, 2017 • edited Loading

ottoyiu commented Jun 12, 2017

jamesbucher commented Jun 13, 2017

chrislovecnm commented Jun 13, 2017

jamesbucher commented Jun 13, 2017

fejta-bot commented Dec 27, 2017

fejta-bot commented Jan 26, 2018

fejta-bot commented Feb 25, 2018

sunghospark commented Dec 16, 2016 •

edited

Loading

chrislovecnm commented Dec 19, 2016 •

edited

Loading

chrislovecnm commented Dec 19, 2016 •

edited

Loading

chrislovecnm commented Jun 12, 2017 •

edited

Loading

chrislovecnm commented Jun 12, 2017 •

edited

Loading