Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive egress data (gcp cloud storage) after update from 0.34.0 to 0.35.0 or 0.36.0 #7720

Open
FranAguiar opened this issue Sep 10, 2024 · 6 comments
Labels

Comments

@FranAguiar
Copy link

Thanos version: 0.34.0 - bitnami/thanos:0.34.0-debian-11-r3

Object Storage Provider: Google Cloud Storage

What happened: After update the thanos version the egress network data increase dramatically. I log all the egress traffic in Microsoft sentinel and there I see a lots of requests to storage.googleapis.com with any version greater than 0.34.0

What you expected to happen: Egress traffic similar to 0.34.0 version

How to reproduce it (as minimally and precisely as possible): I installed thanos with the community helm chart from bitnami, in the images attached below shows how the egress data drops after rollback from 0.35.0 to 0.34.0

Screenshot 2024-09-10 at 15 46 01
Screenshot 2024-09-10 at 15 45 53

Copy link

dosubot bot commented Sep 10, 2024

Hello, @FranAguiar! I'm here to help you with any bugs, questions, or contributions. Let's tackle this issue together!

The dramatic increase in egress network data in Google Cloud Storage after updating Thanos from version 0.34.0 to 0.35.0 or 0.36.0 is likely due to changes in the block listing mechanism. Specifically, the recursive block listing mechanism introduced in version 0.35.0 was reverted in version 0.36.0, and the same strategy as in version 0.31 was used instead. This change can affect the amount of data being transferred.

In version 0.36.0, a --block-discovery-strategy flag was introduced to control the listing strategy, allowing the use of a recursive lister if the tradeoff of slower but cheaper discovery is preferred [1].

Additionally, other significant changes between versions 0.34.0 and 0.36.0 could impact egress network data, such as splitting remote-write HTTP requests via tenant labels and fixing high latency in the ingestion path [1].

To mitigate the increased egress data, you might consider adjusting the --block-discovery-strategy flag to use the recursive strategy, which could reduce the amount of data transferred during block discovery.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the bug label Sep 10, 2024
@MichaHoffmann
Copy link
Contributor

As the bot mentioned, can you try --block-discovery-strategy=recursive ?

@FranAguiar
Copy link
Author

FranAguiar commented Sep 11, 2024

Hi, thanks, I'll try.

I added it (in the compactor and in storage gateway) but I still see high egress traffic. To be precise, the charts in gcloud dashboards looks fine, but I still see lots of requests on sentinel...

Number of request to storage.googleapis.com per version in 30 minutes:

  • 0.34.0: 5
  • 0.36.1: 6000

@MichaHoffmann
Copy link
Contributor

Do you use defaults for the discovery loops intervals?

@FranAguiar
Copy link
Author

FranAguiar commented Sep 11, 2024

I think so, Where can I check? Which service should I review?

Do you mean this flag @MichaHoffmann ?

Thanos Querier [#](https://thanos.io/tip/thanos/service-discovery.md/#thanos-querier-1)
The repeatable flag --store.sd-files=<path> can be used to specify the path to files that contain addresses of StoreAPI servers. The <path> can be a glob pattern so you can specify several files using a single flag.

The flag --store.sd-interval=<5m> can be used to change the fallback re-read interval from the default 5 minutes.

@FranAguiar
Copy link
Author

Hello @MichaHoffmann, I tried with the flag --store.sd-interval in the querier service, but I still see a high number of request in my sentinel...

It's a different flag? Thanks for your time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants