Skip to content

hogaku/MLOps-OSS-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 

Repository files navigation

MLOps-OSS-Analytics

The Repository

This repository share the interactive PowerBI report that analyzing opensource in MLOps. You can get the following insights:

  • Top 100 Contributors
  • Traffic
  • Contributor Commits
  • Pull Requests
  • Punch Card
  • Issues
overview.mp4

Training Orchestration

Name Overview DashBoard Others
Determined Determined is an open-source deep learning training platform that makes building models fast and easy. Dashboard N/A
Flyte Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable, and maintainable workflows for and . It is a fabric that connects disparate computation backends using a type-safe data dependency graph. Dashboard N/A
Kubeflow Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment. Dashboard official docs at kubeflow.org, slack commnunity
OpenPAI OpenPAI is an open-source platform that provides complete AI model training and resource management capabilities, it is easy to extend and supports on-premise, cloud, and hybrid environments on various scales. Dashboard official docs at openPAI Handbook
Orchest Build data pipelines, the easy way! No framework. No YAML. Just write Python and R code in Notebooks. Dashboard official docs at Orchest
Ploomber Ploomber is the fastest way to build data pipelines ⚡️. Use your favorite editor (Jupyter, VSCode, PyCharm) to develop interactively and deploy ☁️ without code changes (Kubernetes, Airflow, AWS Batch, and SLURM). Dashboard official docs at Ploomber, slack community
Spock spock is a framework that helps users easily define, manage, and use complex parameter configurations within Python applications. It lets you focus on the code you need to write instead of re-implementing boilerplate code such as creating ArgParsers, reading configuration files, handling dependencies, implementing type validation, maintaining traceability, etc. Dashboard official docs at spock
Stoke stoke is a lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices (e.g. CPU, GPU), distributed modes, mixed-precision, and PyTorch extensions. It places no restrictions on code structure for model architecture, training/inference loops, loss functions, optimizer algorithm, etc. Stoke simply 'wraps' your existing PyTorch code to automatically handle the necessary underlying wiring for all of the supported backends.This allows you to switch from local full-precision CPU to mixed-precision distributed multi-GPU with extensions (like optimizer state sharding) by simply changing a few declarative flags. Additionally, exposes configuration settings for every underlying backend for those that want configurability and raw access to the underlying libraries. Dashboard official docs at stoke

Model Monitoring

Name Overview DashBoard Others
deepchecks Deepchecks is the leading tool for testing and for validating your machine learning models and data, and it enables doing so with minimal effort. Deepchecks accompanies you through various validation and testing needs such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model and comparing between different models. Dashboard official docs at deepchecks, slack community
Evidently AI Evidently helps analyze and track data and ML model quality throughout the model lifecycle. You can think of it as an evaluation layer that fits into the existing ML stack.Evidently has a modular approach with 3 interfaces on top of the shared functionality. Dashboard official docs at Evidently AI, discord community
MLRun MLRun enables production pipeline design using a modular strategy, where the different parts contribute to a continuous, automated, and far simpler path from research and development to scalable production pipelines, without refactoring code, adding glue logic, or spending significant efforts on data and ML engineering. Dashboard official docs at MLRun, Feature Store is exist.
whylogs whylogs is an open source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to: 1. Track changes in their dataset. 2. Create data constraints to know whether their data looks the way it should. 3. Quickly visualize key summary statistics about their datasets. Dashboard official docs at whylogs, slack community

Model Testing

  • (Comming Soon...)

Model Serving

Name Overview DashBoard Others
BentoML BentoML simplifies ML model deployment and serves your models at production scale. Dashboard official docs at BentoML, slack community
Bodywork Bodywork is a command line tool that deploys machine learning pipelines to Kubernetes. It takes care of everything to do with containers and orchestration, so that you don't have to. Dashboard official docs at Bodywork
Cortex Cortex makes it simple to deploy machine learning models in production. Dashboard official docs at Cortex, slack community
KFServing KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. Dashboard official docs at KFServing
OpenVINO™ Model Server OpenVINO™ Model Server (OVMS) is a high-performance system for serving machine learning models. It is based on C++ for high scalability and optimized for Intel solutions, so that you can take advantage of all the power of the Intel® Xeon® processor or Intel’s AI accelerators and expose it over a network interface. Dashboard official docs at OpenVINO™ Model Server
Seldon Core Seldon core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices. Dashboard official docs at Seldon Core, slack community
Tensorflow Serving TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It deals with the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data. Dashboard official docs at TensorFlow.org, slack community
TorchServe TorchServe is a flexible and easy to use tool for serving and scaling PyTorch models in production. Dashboard official docs at TorchServe
Triton Inference Server Triton Inference Server is an open source inference serving software that streamlines AI inferencing. Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton supports inference across cloud, data center,edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton delivers optimized performance for many query types, including real time, batched, ensembles and audio/video streaming. Dashboard official docs at Triton Inference Server

Data Versioning

Name Overview DashBoard Others
DVC Data Version Control or DVC helps you develop reproducible machine learning projects. Dashboard official docs at DVC, discord community
Pachyderm Pachyderm is the leader in data versioning and pipelines for MLOps. We provide the data foundation that allows data science teams to automate and scale their machine learning lifecycle while guaranteeing reproducibility. Dashboard official docs at Pachyderm, slack community
lakeFS lakeFS is an open-source tool that transforms your object storage into a Git-like repository. It enables you to manage your data lake the way you manage your code. With lakeFS you can build repeatable, atomic, and versioned data lake operations - from complex ETL jobs to data science and analytics. lakeFS supports AWS S3, Azure Blob Storage, and Google Cloud Storage as its underlying storage service. It is API compatible with S3 and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc. Dashboard official docs at lakeFS, slack community

Feature Store

Name Overview DashBoard Others
Feast Feast is an open source feature store for machine learning. Feast is the fastest path to productionizing analytic data for model training and online inference. Dashboard official docs at Feast, slack community
Hopsworks Hopsworks and its Feature Store are an open source data-intensive AI platform used for the development and operation of machine learning models at scale. Dashboard official docs at Hopsworks, slack community
Feathr Feathr is the feature store that is used in production in LinkedIn for many years and was open sourced in April 2022 Dashboard official docs at Feathr, slack community

Experiment Tracking

Name Overview DashBoard Others
Aim Aim is an open-source, self-hosted ML experiment tracking tool. It's good at tracking lots (1000s) of training runs and it allows you to compare them with a performant and beautiful UI. You can use not only the great Aim UI but also its SDK to query your runs' metadata programmatically. That's especially useful for automations and additional analysis on a Jupyter Notebook. Dashboard official docs at Aim, slack community
CML Continuous Machine Learning (CML) is an open-source CLI tool for implementing continuous integration & delivery (CI/CD) with a focus on MLOps. Use it to automate development workflows — including machine provisioning, model training and evaluation, comparing ML experiments across project history, and monitoring changing datasets. Dashboard official docs at CML
ClearML ClearML is a ML/DL development and production suite, it contains FOUR main modules: Experiment Manager, MLOps, Data-Management, Model-Serving Dashboard official docs at ClearML, Model-Serving is exist, slack community
MLFlow MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). Dashboard official docs at MLFlow, slack community
Neptune Neptune is a lightweight solution designed for: Experiment tracking, Model registry, Monitoring ML runs live Dashboard official docs at Neptune
Weights & Biases Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to production models. Dashboard official docs at Weights & Biases

Explainability

Name Overview DashBoard Others
ELI5 ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. Dashboard official docs at ELI5

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published