Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: ability to configure weave #1171

Closed
sunghospark opened this issue Dec 15, 2016 · 38 comments
Closed

Feature request: ability to configure weave #1171

sunghospark opened this issue Dec 15, 2016 · 38 comments
Labels
area/api area/networking Feature Request good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. P0
Milestone

Comments

@sunghospark
Copy link

It would be nice if we can configure weave through kops edit cluster command.
For example

networking:
  weave:
    ipalloc-range: 10.2.0.0/16

to set custom weave CIDR range to avoid possible IP clashing when cluster is launched in existing VPC.

@krisnova
Copy link
Contributor

I have heard this before, we would have to change the API for weave, and pass the value through to the manifest

Maybe we should wait for the api machinery in place first before looking at another API change?

@sunghospark
Copy link
Author

sunghospark commented Dec 16, 2016

I'm wondering if we can templatize the weave daemonset yaml file.

and add the IPALLOC_RANGE environment variable in daemonset configuration.
Something like

      containers:
          env:
            - name: IPALLOC_RANGE
              value: {{ WeaveConfig.IPALLOC_RANGE }}

I got this configuration from Weave documenatation

@chrislovecnm
Copy link
Contributor

This is a great idea... @caseeker are you open to contributing this functionality?

Per @kris-nova's note this needs to pend on API machinery. We are going through a BIG refactor, but the design work could start today.

@sunghospark
Copy link
Author

@chrislovecnm
I would love to contribute if some guidance is provided :)
I did try updating the daemonset yaml file in source and tested it, but that didn't work.

@chrislovecnm
Copy link
Contributor

So this is the first place I would point you to -> #1194

Get the use cases on here, and then work on a high-level design. With this feature, the design can be super short. Like a paragraph. If you look at the PR for weave you will see the work I did for weave. That needs to be extended with a yaml replacement as well. I think we are doing that for a couple of other yaml templates, but we will need to walk you through it. In a nutshell the yaml files can use go templates for replacement. Basically call a go function. We are going to need to expand the API for this a bit as well. I would perfer a value in the yaml cluster spec, instead of a command line switch.

Questions?

@sunghospark
Copy link
Author

@chrislovecnm sounds good, I'll prepare the PR,
Quick question for the implementation, I found out this weave yaml file is not the one that gets used in cluster.
The cluster state bucket addons/networking.weave still contains v1.8.0.20161116.yaml even when I built the source with v1.8.1.20161130.yaml

Can you tell me if this is correct behavior? If so I'm wondering where v1.8.0.20161116.yaml is coming from as I could not locate it in source code.

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Dec 19, 2016

It is the correct file :) We have a bug in the versioning, which may be impacting you. Are you running on master? To work that out do you mind starting on another issue?

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Dec 19, 2016

#1081 <- no design needed. We need to rename the weave file v1.8.1.yaml

@sunghospark
Copy link
Author

I could make it working with updating the weave yaml file.
The problem was not from kops, I was updating on wrong src directory 😮

@chrislovecnm
Copy link
Contributor

Well you figured it out!!!!

@justinsb
Copy link
Member

I think this should use the nonMasqueradeCIDR, right?

@justinsb justinsb added this to the 1.5.0 milestone Dec 28, 2016
@chrislovecnm
Copy link
Contributor

On my phone, but you have to configure a component on the weave yaml

@justinsb justinsb added the P0 label Jan 3, 2017
@justinsb
Copy link
Member

Is there a reason why weave isn't using the nonMasqueradeCIDR value @chrislovecnm ?

@chrislovecnm
Copy link
Contributor

@justinsb don't have enough understanding to answer that. Should we be??

@justinsb
Copy link
Member

I think so. Non-masquerade CIDR is the name we decided on for the overlay network CIDR: https://github.com/kubernetes/kubernetes/blob/master/cluster/aws/options.md

@justinsb
Copy link
Member

I agree that we should support this configuration, and we may already have a suitable field in the form of nonMasqueradeCidr. But, as per #1514, this isn't going to get fixed in 1.5.0, so I'm going to move this issue to 1.5.1. (We will hopefully up the release cadence substantially once we get these big 1.5.0 features out). Workarounds if you have a conflicting CIDR are to use kopeio-networking or calico.

And of course, contributions to fix this are welcome :-)

@chrislovecnm
Copy link
Contributor

@austinmoore- this is a problem that I introduced when I first implemented weave / CNI :) I did not implement using the non-masquerade range with weave.

We need to figure out how to have a kubernetes upgrade work for users running the default IPALLOC_RANGE, and moving to the nonMasqueradeCidr range. Upgrades are the one thing that is blocking this. @jordanjennings tested upgrades a bunch and was not successful at getting weave to upgrade properly.

@bboreham what information can we provide you to give us more guidance. We have weave running with the default IP range, and we upgrade the cluster and set IPALLOC_RANGE to 100.64.0.0/10 weave stop functioning.

@jordanjennings can you provide @bboreham more information?

@jordanjennings
Copy link
Contributor

@chrislovecnm Not much info to provide, all networking was 100% broken when I tried upgrading. I didn't capture any logs or anything because I wasn't really sure where to look. I can certainly give it a shot again and try to capture more information.

But my main confusion here still remains around how it's OK for any networking provider to use the same range for pod IPs as it does for service IPs. What am I not understanding here?

@justinsb What is nonMasqueradeCIDR? I am very confused by the name, and why this field should map to pod IPs. I'm also confused why in the default kops configuration this overlaps with service IPs. Any additional info here would be great.

@chrislovecnm
Copy link
Contributor

@jordanjennings the nonMasquereCIDR is the network range set aside for pod IP ranges. I think the problem we are seeing is that the service range and the pod IP range are overlapping. They cannot with CNI, and I am not certain why they are overlapping.

@bboreham / @caseydavenport are experts on these IP ranges with CNI if we could get him to comment that would be awesome.

In a nutshell, we seem to be having service IP ranges and pod IP ranges overlapping, which I think is really bad with CNI. I think we have a bug with using the IP ranges that kops sets up with CNI. If that is the case I am not certain how Calico is functioning as well. I think weave is failing because it expects service and pod IP ranges to be separate.

@chrislovecnm
Copy link
Contributor

Here are the values that we are setting up on a standard CNI install. These values are part of the kops API, and typically get applied as flags to different components.

  kubeAPIServer:
    serviceClusterIPRange: 100.64.0.0/13
  kubeControllerManager:
    # I do not think we need this with CNI
    clusterCIDR: 100.96.0.0/11
  kubeDNS:
    serverIP: 100.64.0.10
  kubeProxy:
    clusterCIDR: 100.96.0.0/11
  kubelet:
    clusterDNS: 100.64.0.10
    networkPluginName: cni
    nonMasqueradeCIDR: 100.64.0.0/10
  # these values are the base subnets that everything is setup from
  # I do not believe that these should overlap
  nonMasqueradeCIDR: 100.64.0.0/10
  serviceClusterIPRange: 100.64.0.0/13

@caseydavenport
Copy link
Member

    # I do not think we need this with CNI
    clusterCIDR: 100.96.0.0/11

Not strictly orthogonal with CNI - some CNI plugins / configuration still respect these allocations.

the nonMasquereCIDR is the network range set aside for pod IP ranges. I think the problem we are seeing is that the service range and the pod IP range are overlapping. They cannot with CNI, and I am not certain why they are overlapping.

Yeah, can confirm - generally not a good idea to have overlapping Service / Pod / Node IP ranges.

@chrislovecnm
Copy link
Contributor

@caseydavenport calico template is using nonMasquereCIDR directly, which can cause calico pod IP ranges to overlap with service IP ranges.

@caseydavenport
Copy link
Member

Yeah, I think that should be considered a misconfiguration. The default values definitely shouldn't overlap. We might want some soft validation that warns if you've picked values that could cause problems (I'm not aware of any implementations where overlap is desired, but perhaps they're out there...)

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Jun 12, 2017

So here is the word from @justinsb. The pod IP range is:

  kubeControllerManager:
    clusterCIDR: 100.96.0.0/11

All CNI providers should be using that CIDR for the pod IP range. @jordanjennings is going to implement and test for weave.

Talking with a couple of users we may want:

  nonMasqueradeCIDR: 100.64.0.0/10
  serviceClusterIPRange: 100.64.0.0/13
  # new API value to setup the pod IP range.
  podClusterIPRange: 100.96.0.0/11

Thanks everyone, this has been bugging me for AWHILE.

@caseydavenport
Copy link
Member

Ah yes, I think that is right. I missed the distinction between nonMasqueradeCIDR and clusterCIDR.

The Calico manifests should be updated to use clusterCIDR :)

@chrislovecnm
Copy link
Contributor

@jordanjennings here are some ideas on how we can approach this implementation:

  1. Implement weave using .KubeControllerManager.ClusterCIDR in the weave manifest. This code change will give you an easy way to do the difficult testing.
  2. Implement a new API value podClusterIPRange with the proper validation. The IP validation is going to require a review from someone who understands IP math. If @justinsb / @bboreham or @caseydavenport could assist, that would be awesome. I know my limitations, and IP math is something I have never figured out.
  3. Migrate weave to use the PodClusterIPRange.

Thoughts anyone? We could break up step 2 into API implementation and then validation, but I think it is simpler to implement together.

@jordanjennings
Copy link
Contributor

@chrislovecnm 👍 I'm testing out step 1 right now

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Jun 12, 2017

@ottoyiu, since you have been doing a lot of the calico work as of late, wanted to loop you in on this issue.

TLDR - Calico should not be using the non-masq range, but should be using .KubeControllerManager.ClusterCIDR, since that range, does not overlap with the service IP range. Pod IP range overlapping with service IP range is NOT good.

cc: @blakebarnett

@ottoyiu
Copy link
Contributor

ottoyiu commented Jun 12, 2017

@chrislovecnm thanks for the heads up. I'll test with the ClusterCIDR and report back. This ticket might be explaining these weird one-off service IP issues I've been experiencing.

@jamesbucher
Copy link

@austinmoore- I am not 100% sure that having a Weave Network outside of the nonMasqueradeCidr actually results in requests being MASQUERADED. I had 2 pods up and running in a cluster with NonMasqueradeCidr set to: 100.64.0.0/10 and the IPALLOC_RANGE for weavenet set to 192.168.240/21. With pods on separate nodes I am able to ping across pods and the requests appear to be coming directly through. I am able to use the Pod IP address (for the server) from a client (curl) to connect and the server (nginx). The server also actually sees the correct client podIP (verified from the nginx logs).

Correct me if I am wrong but wouldn't masqueraded traffic disallow this as it is a form of NAT (either the server or client would appear to have the wrong IP address)?

Regardless of weather or not this works its probably a really bad idea to have the internal weave network operate outside of the NonMasqueradeCidr range.

@chrislovecnm
Copy link
Contributor

@jamesbucher we are not planning on having weave outside of the range. We need to run it in the pod range. We worked out which subnet is for services and which subnet is for pods, which are subsets of the non masq cidr.

Make sense?

Btw the first PR is in for weave. The other providers require updates and testing as well.

@jamesbucher
Copy link

@chrislovecnm Makes sense. I just wanted to note that it does actually work. I actually went in today and updated our clusters to ensure that Weave's CIDR corresponds to the podClusterIPRange so that we don't have to do some crazy network migration at a later date when our clusters are taking more traffic.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 26, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@rifelpet rifelpet added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. and removed good-starter-issue labels Apr 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api area/networking Feature Request good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. P0
Projects
None yet
Development

No branches or pull requests