Alerting Best Practices

Build an effective alerting system that informs without overwhelming.

A good alerting system finds the balance: enough alerts to miss nothing, but not too many to avoid fatigue. Too many alerts and the team starts ignoring them. Not enough and real problems go unnoticed.

These best practices come from the experience of hundreds of teams. Apply them for serene monitoring and optimal incident response.

Alerting Strategy

  • Define criticalities: P1 (critical), P2 (important), P3 (normal), P4 (info). Each level has its channels.
  • Channel per criticality: SMS for P1, email for P2-P3, dashboard only for P4.
  • Clear owner: Each alert must have a defined owner who can act.

Channel Selection

  • Email: Good for P2-P3. Detailed, traceable, but can be ignored.
  • SMS: Reserved for P1. Expensive but unmissable.
  • Slack/Teams: Excellent for team visibility. Less intrusive than SMS.

Mistakes to Avoid

  • Alert on everything: Each alert must correspond to a possible action. Otherwise, it's noise.
  • Ignore false positives: Every false positive erodes trust. Invest to eliminate them.
  • No rotation: Without on-call rotation, the same people burn out.

Alerting Checklist

  • Define a P1-P4 criticality matrix
  • Associate each criticality with specific channels
  • Configure cooldown and deduplication
  • Enable double verification
  • Regularly review received alerts

Frequently Asked Questions

How to reduce the number of alerts?

Enable double verification, increase thresholds, configure cooldown, eliminate unnecessary monitors.

Should I alert on warnings?

Rarely. A warning should be visible on the dashboard but not trigger a notification.

How many alerts per day is acceptable?

Ideally less than 5 alerts requiring human action per day. More and it's fatigue.

How to know if our alerting is effective?

Measure: % false positives, average response time, missed incidents. Review monthly.

Ready to Sleep Soundly?

Start free, no credit card required.