From f6ba1f10c95ea0a1f0371f1c207660b07e2c68dc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=C3=A9d=C3=A9ric=20Hurier=20=28Fmind=29?= Date: Sun, 28 Jul 2024 18:45:06 +0200 Subject: [PATCH] add cross links and fix observability information --- README.md | 5 +++++ docs/0. Overview/0.0. Course.md | 1 + docs/7. Observability/2. Alerting.md | 2 ++ docs/7. Observability/index.md | 2 +- docs/index.md | 6 +++++- pyproject.toml | 2 +- 6 files changed, 15 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index b64f6f8..b1ec08f 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,10 @@ This course is designed to dive deep into the intersection of software developme Whether you are a beginner eager to explore or an experienced professional seeking to enhance your skill set, this course offers valuable insights and hands-on experience. +**Related Resources:** +- **[MLOps Python Package (Example)](https://github.com/fmind/mlops-python-package)**: Kickstart your MLOps initiative with a flexible, robust, and productive Python package. +- **[Cookiecutter MLOps Package (Template)](https://github.com/fmind/cookiecutter-mlops-package)**: Build and deploy Python packages and Docker images for MLOps tasks. + ## Key Features - **Hands-on Python Coding**: Learn to code with Python in a way that's directly applicable to real-world AI projects. @@ -26,6 +30,7 @@ Whether you are a beginner eager to explore or an experienced professional seeki 4. **Validating**: Focus on code quality with typing, linting, testing, and debugging to ensure your ML projects are robust and maintainable. 5. **Refining**: Dive into advanced MLOps techniques including CI/CD workflows, software containers, and model registries to streamline your operations. 6. **Sharing**: Learn how to effectively organize and document your MLOps projects to ensure they are accessible and collaborative. +7. **Observability**: Gain comprehensive insights into the behavior and performance of your deployed models and infrastructure. ## Installation diff --git a/docs/0. Overview/0.0. Course.md b/docs/0. Overview/0.0. Course.md index e4428cc..f521aa4 100644 --- a/docs/0. Overview/0.0. Course.md +++ b/docs/0. Overview/0.0. Course.md @@ -44,6 +44,7 @@ The course is divided into six in-depth chapters, each focusing on different fac 4. **[Validating](../../4. Validating/)**: Adopt practices like typing, linting, testing, and logging to refine code quality. 5. **[Refining](../../5. Refining/)**: Leverage advanced software development techniques and tools to polish your project. 6. **[Sharing](../../6. Sharing/)**: Foster a productive team environment for effective contributions and communication. +7. **[Observability](../../7. Observability/)**: Implement tools and practices for monitoring your data, models, and infrastructure. ## What's beyond the scope of this course? diff --git a/docs/7. Observability/2. Alerting.md b/docs/7. Observability/2. Alerting.md index 13c9152..5287d8a 100644 --- a/docs/7. Observability/2. Alerting.md +++ b/docs/7. Observability/2. Alerting.md @@ -42,6 +42,7 @@ To implement an effective alerting system, you need to choose communication chan - **[Slack](https://slack.com/) and [Discord](https://discord.com/)**: Suitable for real-time team communication, these messaging platforms allow for instant notifications, discussions, and collaboration among team members. - **[Datadog](https://www.datadoghq.com/)**: A popular monitoring and observability platform, it provides comprehensive alerting capabilities for various system and application metrics, including those related to AI/ML models. - **[Statuspal](https://statuspal.io/)**: This platform specializes in status page monitoring and incident communication, making it useful for notifying users about any disruptions or downtime related to AI/ML services. +- **[PagerDuty](https://www.pagerduty.com/)**: A popular incident management platform that can be used for routing AI/ML alerts to the right team members, escalating issues if necessary, and ensuring that incidents are addressed promptly. ## How can you implement Alerting (local demo)? @@ -92,5 +93,6 @@ Here's how you can use the alerting service: - [Alerting in Datadog](https://docs.datadoghq.com/monitors/manage/status/#alerts) - [Slack API Documentation](https://api.slack.com/) - [Discord Developer Documentation](https://discord.com/developers/docs/intro) +- [PagerDuty](https://www.pagerduty.com/) - [Statuspal](https://statuspal.io/) - [Plyer](https://plyer.readthedocs.io/) diff --git a/docs/7. Observability/index.md b/docs/7. Observability/index.md index d10edcd..ce047d3 100644 --- a/docs/7. Observability/index.md +++ b/docs/7. Observability/index.md @@ -8,7 +8,7 @@ Observability in Machine Learning Operations (MLOps) is crucial for gaining insi - **[7.0. Reproducibility](./0. Reproducibility.md)**: Explore how to make machine learning experiments and pipelines more reproducible using MLflow Projects, enabling others to verify findings, share knowledge, and build upon existing work. - **[7.1. Monitoring](./1. Monitoring.md)**: Learn the fundamental principles and tools for monitoring AI/ML models, focusing on tracking key metrics, setting up alerts, and understanding changes in model behavior using MLflow Evaluate API and Evidently. -- **[7.2. Alerting](./2. Alerting.md)**: Understand how to design effective alert systems to promptly notify stakeholders of potential issues with models or infrastructure using tools like Slack, Discord, Datadog, and Statuspal. +- **[7.2. Alerting](./2. Alerting.md)**: Understand how to design effective alert systems to promptly notify stakeholders of potential issues with models or infrastructure using tools like Slack, Discord, Datadog, and PagerDuty. - **[7.3. Lineage](./3. Lineage.md)**: Delve into data and model lineage, discovering how to track the origin and transformation of data and models throughout the ML lifecycle using MLflow Dataset. - **[7.4. Costs and KPIs](./4. Costs-KPIs.md)**: Explore techniques for managing costs associated with running AI/ML workloads and for defining and tracking key performance indicators (KPIs) aligned with business goals, using MLflow Tracking for analysis. - **[7.5. Explainability](./5. Explainability.md)**: Explore the concept of explainable AI, focusing on techniques like SHAP to understand model predictions and build trust in AI systems. diff --git a/docs/index.md b/docs/index.md index 298d86e..f887140 100644 --- a/docs/index.md +++ b/docs/index.md @@ -38,7 +38,11 @@ In this chapter, we delve into refining MLOps projects to enhance their efficien ## [Chapter 6: Sharing](./6. Sharing/) -The final chapter focuses on sharing and distributing MLOps projects. We explore tools and practices that enhance collaboration, promote reuse, and facilitate the scaling of machine learning solutions. You will learn how to effectively organize, document, and disseminate your projects to make them more accessible and beneficial to others. +The chapter focuses on sharing and distributing MLOps projects. We explore tools and practices that enhance collaboration, promote reuse, and facilitate the scaling of machine learning solutions. You will learn how to effectively organize, document, and disseminate your projects to make them more accessible and beneficial to others. + +## [Chapter 7: Observability](./7. Observability/) + +This chapter dives into the essential aspects of observability in MLOps, equipping you with the knowledge and strategies to gain comprehensive insights into the performance, behavior, and health of your deployed models and infrastructure. You'll learn how to ensure reproducibility, implement monitoring and alerting systems, track data and model lineage, manage costs and KPIs, understand model explainability, and monitor infrastructure performance. ## Let's journey together! diff --git a/pyproject.toml b/pyproject.toml index 26347f6..44bbf4b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ [tool.poetry] name = "mlops-coding-course" -version = "3.0.1" +version = "3.1.0" description = "Learn how to create, develop, and maintain an MLOps code base." repository = "https://github.com/MLOps-Courses/mlops-coding-course" documentation = "https://mlops-coding-course.fmind.dev/"