This repository share the interactive PowerBI report that analyzing opensource in MLOps. You can get the following insights:
- Top 100 Contributors
- Traffic
- Contributor Commits
- Pull Requests
- Punch Card
- Issues
overview.mp4
Name | Overview | DashBoard | Others |
---|---|---|---|
Determined | Determined is an open-source deep learning training platform that makes building models fast and easy. | Dashboard | N/A |
Flyte | Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable, and maintainable workflows for and . It is a fabric that connects disparate computation backends using a type-safe data dependency graph. | Dashboard | N/A |
Kubeflow | Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment. | Dashboard | official docs at kubeflow.org, slack commnunity |
OpenPAI | OpenPAI is an open-source platform that provides complete AI model training and resource management capabilities, it is easy to extend and supports on-premise, cloud, and hybrid environments on various scales. | Dashboard | official docs at openPAI Handbook |
Orchest | Build data pipelines, the easy way! No framework. No YAML. Just write Python and R code in Notebooks. | Dashboard | official docs at Orchest |
Ploomber | Ploomber is the fastest way to build data pipelines ⚡️. Use your favorite editor (Jupyter, VSCode, PyCharm) to develop interactively and deploy ☁️ without code changes (Kubernetes, Airflow, AWS Batch, and SLURM). | Dashboard | official docs at Ploomber, slack community |
Spock | spock is a framework that helps users easily define, manage, and use complex parameter configurations within Python applications. It lets you focus on the code you need to write instead of re-implementing boilerplate code such as creating ArgParsers, reading configuration files, handling dependencies, implementing type validation, maintaining traceability, etc. | Dashboard | official docs at spock |
Stoke | stoke is a lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices (e.g. CPU, GPU), distributed modes, mixed-precision, and PyTorch extensions. It places no restrictions on code structure for model architecture, training/inference loops, loss functions, optimizer algorithm, etc. Stoke simply 'wraps' your existing PyTorch code to automatically handle the necessary underlying wiring for all of the supported backends.This allows you to switch from local full-precision CPU to mixed-precision distributed multi-GPU with extensions (like optimizer state sharding) by simply changing a few declarative flags. Additionally, exposes configuration settings for every underlying backend for those that want configurability and raw access to the underlying libraries. | Dashboard | official docs at stoke |
Name | Overview | DashBoard | Others |
---|---|---|---|
deepchecks | Deepchecks is the leading tool for testing and for validating your machine learning models and data, and it enables doing so with minimal effort. Deepchecks accompanies you through various validation and testing needs such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model and comparing between different models. | Dashboard | official docs at deepchecks, slack community |
Evidently AI | Evidently helps analyze and track data and ML model quality throughout the model lifecycle. You can think of it as an evaluation layer that fits into the existing ML stack.Evidently has a modular approach with 3 interfaces on top of the shared functionality. | Dashboard | official docs at Evidently AI, discord community |
MLRun | MLRun enables production pipeline design using a modular strategy, where the different parts contribute to a continuous, automated, and far simpler path from research and development to scalable production pipelines, without refactoring code, adding glue logic, or spending significant efforts on data and ML engineering. | Dashboard | official docs at MLRun, Feature Store is exist. |
whylogs | whylogs is an open source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to: 1. Track changes in their dataset. 2. Create data constraints to know whether their data looks the way it should. 3. Quickly visualize key summary statistics about their datasets. | Dashboard | official docs at whylogs, slack community |
- (Comming Soon...)
Name | Overview | DashBoard | Others |
---|---|---|---|
BentoML | BentoML simplifies ML model deployment and serves your models at production scale. | Dashboard | official docs at BentoML, slack community |
Bodywork | Bodywork is a command line tool that deploys machine learning pipelines to Kubernetes. It takes care of everything to do with containers and orchestration, so that you don't have to. | Dashboard | official docs at Bodywork |
Cortex | Cortex makes it simple to deploy machine learning models in production. | Dashboard | official docs at Cortex, slack community |
KFServing | KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. | Dashboard | official docs at KFServing |
OpenVINO™ Model Server | OpenVINO™ Model Server (OVMS) is a high-performance system for serving machine learning models. It is based on C++ for high scalability and optimized for Intel solutions, so that you can take advantage of all the power of the Intel® Xeon® processor or Intel’s AI accelerators and expose it over a network interface. | Dashboard | official docs at OpenVINO™ Model Server |
Seldon Core | Seldon core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices. | Dashboard | official docs at Seldon Core, slack community |
Tensorflow Serving | TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It deals with the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data. | Dashboard | official docs at TensorFlow.org, slack community |
TorchServe | TorchServe is a flexible and easy to use tool for serving and scaling PyTorch models in production. | Dashboard | official docs at TorchServe |
Triton Inference Server | Triton Inference Server is an open source inference serving software that streamlines AI inferencing. Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton supports inference across cloud, data center,edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton delivers optimized performance for many query types, including real time, batched, ensembles and audio/video streaming. | Dashboard | official docs at Triton Inference Server |
Name | Overview | DashBoard | Others |
---|---|---|---|
DVC | Data Version Control or DVC helps you develop reproducible machine learning projects. | Dashboard | official docs at DVC, discord community |
Pachyderm | Pachyderm is the leader in data versioning and pipelines for MLOps. We provide the data foundation that allows data science teams to automate and scale their machine learning lifecycle while guaranteeing reproducibility. | Dashboard | official docs at Pachyderm, slack community |
lakeFS | lakeFS is an open-source tool that transforms your object storage into a Git-like repository. It enables you to manage your data lake the way you manage your code. With lakeFS you can build repeatable, atomic, and versioned data lake operations - from complex ETL jobs to data science and analytics. lakeFS supports AWS S3, Azure Blob Storage, and Google Cloud Storage as its underlying storage service. It is API compatible with S3 and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc. | Dashboard | official docs at lakeFS, slack community |
Name | Overview | DashBoard | Others |
---|---|---|---|
Feast | Feast is an open source feature store for machine learning. Feast is the fastest path to productionizing analytic data for model training and online inference. | Dashboard | official docs at Feast, slack community |
Hopsworks | Hopsworks and its Feature Store are an open source data-intensive AI platform used for the development and operation of machine learning models at scale. | Dashboard | official docs at Hopsworks, slack community |
Feathr | Feathr is the feature store that is used in production in LinkedIn for many years and was open sourced in April 2022 | Dashboard | official docs at Feathr, slack community |
Name | Overview | DashBoard | Others |
---|---|---|---|
Aim | Aim is an open-source, self-hosted ML experiment tracking tool. It's good at tracking lots (1000s) of training runs and it allows you to compare them with a performant and beautiful UI. You can use not only the great Aim UI but also its SDK to query your runs' metadata programmatically. That's especially useful for automations and additional analysis on a Jupyter Notebook. | Dashboard | official docs at Aim, slack community |
CML | Continuous Machine Learning (CML) is an open-source CLI tool for implementing continuous integration & delivery (CI/CD) with a focus on MLOps. Use it to automate development workflows — including machine provisioning, model training and evaluation, comparing ML experiments across project history, and monitoring changing datasets. | Dashboard | official docs at CML |
ClearML | ClearML is a ML/DL development and production suite, it contains FOUR main modules: Experiment Manager, MLOps, Data-Management, Model-Serving | Dashboard | official docs at ClearML, Model-Serving is exist, slack community |
MLFlow | MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). | Dashboard | official docs at MLFlow, slack community |
Neptune | Neptune is a lightweight solution designed for: Experiment tracking, Model registry, Monitoring ML runs live | Dashboard | official docs at Neptune |
Weights & Biases | Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to production models. | Dashboard | official docs at Weights & Biases |
Name | Overview | DashBoard | Others |
---|---|---|---|
ELI5 | ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. | Dashboard | official docs at ELI5 |