zet

Monitoring Alerts

Here is the provided text converted into markdown format:

Alerts that trigger call-out should be urgent, important, actionable, and real. They should represent either ongoing or imminent problems with your service.
Err on the side of removing noisy alerts – over-monitoring is a more challenging problem to solve than under-monitoring.
Classify the problem into the following categories:
- Availability & Basic Functionality
- Latency
- Correctness (completeness, freshness, and durability of data)
- Feature-Specific Problems
Use symptoms to capture problems more comprehensively and robustly with less effort.
Include cause-based information in symptom-based pages or on dashboards, but avoid alerting directly on causes.
The further up your serving stack you go, the more distinct problems you catch in a single rule. However, ensure that you can sufficiently distinguish what is going on.
If you want an on-call rotation, it’s imperative to have a system for dealing with things that need a timely response but aren’t imminently critical.