The 5th episode centers around the idea of AIOps.
AIOps at Meituan
Meituan has a nice write-up about AIOps implementation at production level. An AIOps system requires input from three different job models.
- Machine learning algorithms. Machine learning engineers curate a bunch of algorithms on time series anomaly detection, time series classification and multivariate root cause analysis.
- Learnwares. SREs fit solutions into automated pipelines. Such pipeline, powered by machine learning algorithms, called learnware, is designed to tackle a series of similar faults without repetitive inputs (which often lead to misoperations).
- AIOps platform. Backend engineers design and develop a platform that facilitate the learnware lifecycle: design, development, validation, integration and maintenance.
MLOps
MLOps is a set of practices that focus on model deployment and management at scale. To differentiate from AIOps, the object of the whole -Ops
thing is machine learning models. It’s a vital part in AIOps for sure, since the idea of learnware requires that its ML model should be deployed systematically and reliably. A benchmark of MLOps is Michelangelo by Uber.
Feature store
The idiom feature
is popular among data scientists, but the idea of feature store is even more important in parallel systems, especially in those with heterogeneous database. Feast is a standalone, open source solution but it still has a long journey ahead.