Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add context reasons to notifications failed counter #3631

Conversation

wallee94
Copy link
Contributor

@wallee94 wallee94 commented Dec 5, 2023

The alertmanager cancels the Dispatcher context when it receives a reload request, which cancels in-flight notifications and counts them as failed with reason "other".

This change adds two new reasons for context cancellation and deadline exceeding. It sets the reason in the counter if no other error was raised in previous attempts. It is helpful to monitor for failed requests, but at the same time, be able to discard failures from contexts canceled due to reloads or sigterms.

Signed-off-by: Walther Lee <walther.lee@reddit.com>
@wallee94 wallee94 force-pushed the add-context-reasons-to-notifications-failed-counter branch from 849cfb1 to 1aedbee Compare December 5, 2023 20:26
Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we didn't increment the failed counter if the notification is "failed" but rescheduled. But this should help with managing alerts in case of more reload cancels/reschedules on busy Alertmanagers.

@SuperQ
Copy link
Member

SuperQ commented Dec 6, 2023

Odd, this doesn't seem right to me.

notify/notify.go:838:6: ineffectual assignment to iErr (ineffassign)
					iErr = NewErrorWithReason(ContextCanceledReason, iErr)
					^
notify/notify.go:840:6: ineffectual assignment to iErr (ineffassign)
					iErr = NewErrorWithReason(ContextDeadlineExceededReason, iErr)
					^

Signed-off-by: Walther Lee <walther.lee@reddit.com>
@wallee94
Copy link
Contributor Author

wallee94 commented Dec 6, 2023

@SuperQ yep, typo, had added a colon by accident and created a new var 🤦

Copy link
Member

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@simonpasquier simonpasquier merged commit 3416d5a into prometheus:main Dec 8, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants