Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVPROD-8323 Add query timeout to evergreen db client #8148

Merged
merged 1 commit into from
Aug 5, 2024

Conversation

hadjri
Copy link
Contributor

@hadjri hadjri commented Aug 1, 2024

DEVPROD-8323

Description

The attached jobs in this ticket all appear to have timed out after having excessively long queries to the MCI cluster. For this job for example, two consecutive queries to the tasks collection that each took 60 minutes before the socket was closed is why the job had a 2hr runtime. These queries show up and stick out as big outliers in the DB cluster query insights. From what I can tell (DB cluster logs don't go back that far), this is true for the other linked examples in the ticket.

I dug for a while and the 1 hour timeout doesn't appear to be related to anything in our DB client or Kanopy settings, and is most likely configured somewhere in our network infrastructure that isn't visible to us.

We could add more MaxTime values to the jobs that were getting stuck, but I think as an initial step we can just try to set a max query time to the mci cluster and see if that resolves the issue. Looking at DB activity for the cluster it seems like 5 minutes is a good conservative limit, but I'm fine to tweak this if there are concerns about any workflows that might run a query that long.

This seems to have been effective in #7879.

@hadjri hadjri requested a review from a team August 1, 2024 17:13
environment.go Show resolved Hide resolved
@hadjri hadjri requested a review from malikchaya2 August 1, 2024 18:34
Copy link
Contributor

@Kimchelly Kimchelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the 5 minutes thing, I think it's fine for us to just try it, and we can revert/push it higher if necessary. Setting a reasonable limit on how long queries can run for is a good step towards preventing expensive queries as well!

@hadjri hadjri merged commit c0b4dec into evergreen-ci:main Aug 5, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants