unable to fetch node CPU metrics #2605

texasbobs · 2019-11-12T17:33:46Z

We are using Flux 1.13.2 and in some clusters, it no longer clones repos to be synced. The error seems to indicate it is unable to execute a query on the Prometheus api.

ts=2019-11-12T17:00:47.74181943Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-11-12T17:00:54.041654513Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2019-11-12T17:04:33.083254717Z caller=loop.go:111 component=sync-loop event=refreshed url=git@github.myrepo.git branch=master HEAD=3e9
ts=2019-11-12T17:06:05.738649349Z caller=loop.go:85 component=sync-loop err="collating resources in cluster for sync: Error while fetching node metrics for selector : unable to fetch node CPU metrics: unable to execute query: Get http://prometheus-k8s.monitoring.svc:9090/api/v1/query?query=sum%281+-+rate%28node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B1m%5D%29+%2A+on%28namespace%2C+pod%29+group_left%28node%29+node_namespace_pod%3Akube_pod_info%3A%7Bnode%3D~%22url%7C url%7C url%22%7D%29+by+%28node%29&time=1573578364.541: dial tcp 10.233.9.37:9090: 
connect: connection refused"

How can we prevent this from stopping the sync?

Why does Flux care about CPU metrics at all?

It looked like there was a PR to ignore these types of errors back in 1.12.2. #2009

The text was updated successfully, but these errors were encountered:

squaremo · 2019-11-12T17:38:56Z

Why does Flux care about CPU metrics at all?

It doesn't. Possibly this is an error reported by the Kubernetes API client, either from something it is attempting, or something attempted by an intermediary, or the API server itself.

stefanprodan · 2019-11-12T17:52:02Z

@squaremo it looks like our use of the discovery API triggers the metrics queries, this is indeed very troubling since a busy cluster with many metrics will add a huge delay to Flux sync.

squaremo · 2019-11-12T17:53:50Z

it looks like our use of the discovery API triggers the metrics queries

How .. even ... I don't ...

stefanprodan · 2019-11-12T18:10:01Z

I think we should be using ServerGroups() and call ServerResourcesForGroupVersion for each group excluding the metrics one to avoid querying the metrics providers.

stefanprodan · 2019-11-12T20:01:25Z

@texasbobs can you please test stefanprodan/flux:fix-discovery.1 in your cluster and let me know if the sync works? Thanks

texasbobs · 2019-11-12T20:17:20Z

What version is that based on, @stefanprodan ? We have not tested beyond 1.13.2 and this cluster is in a prod environment.

stefanprodan · 2019-11-12T20:20:35Z

It's based on master. If you have a dev cluster, can you please scale prometheus to zero and test it out?

texasbobs · 2019-11-12T21:23:10Z

I confirmed that it fails in my test cluster on 1.13.2 and 1.15.0. The above image was able to function correctly.

stefanprodan · 2019-11-12T21:35:21Z

@texasbobs thanks a lot for testing it, I'm also running my own tests with prometheus-adapter and metrics-server.

texasbobs added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels Nov 12, 2019

stefanprodan mentioned this issue Nov 12, 2019

Exclude the metrics APIs from resources discovery #2606

Merged

stefanprodan removed the blocked-needs-validation Issue is waiting to be validated before we can proceed label Nov 12, 2019

stefanprodan closed this as completed in #2606 Nov 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to fetch node CPU metrics #2605

unable to fetch node CPU metrics #2605

texasbobs commented Nov 12, 2019

squaremo commented Nov 12, 2019

stefanprodan commented Nov 12, 2019

squaremo commented Nov 12, 2019

stefanprodan commented Nov 12, 2019

stefanprodan commented Nov 12, 2019

texasbobs commented Nov 12, 2019

stefanprodan commented Nov 12, 2019

texasbobs commented Nov 12, 2019

stefanprodan commented Nov 12, 2019

unable to fetch node CPU metrics #2605

unable to fetch node CPU metrics #2605

Comments

texasbobs commented Nov 12, 2019

squaremo commented Nov 12, 2019

stefanprodan commented Nov 12, 2019

squaremo commented Nov 12, 2019

stefanprodan commented Nov 12, 2019

stefanprodan commented Nov 12, 2019

texasbobs commented Nov 12, 2019

stefanprodan commented Nov 12, 2019

texasbobs commented Nov 12, 2019

stefanprodan commented Nov 12, 2019