The 6th episode walks through adaptive alerting in time series anomaly detection.
Adaptive Alerting
Adaptive Alerting by Expedia is established on a simple and clear motive: reducing MTTD (mean time to discovery).
- “discovery” means an anomaly has been noticed by a human-being or an automation system, which could push towards recovery. If an anomaly is alerted without any notification or overwhelmed by other signals, it’s not discovered.
- To make fault discovery possible, we need to observe and monitor as many time series as we can. For a distributed system at industry level, we should expect thousands or millions of time series worthy of monitoring. The anomaly detection system should scale up well.
- For anomaly detection at scale, reducing false positives is the primary task. Sophisticated and well tuned detection models should come to rescue.
- And the result is clear: model selection and tuning should be fully automated.