MLCommons Task force on Automation and Reproducibility

Goals

Provide free help for all MLCommons and the community to prepare, run, optimize and compare MLPerf benchmarks (training, inference and tiny) and submit Pareto-optimal results with the help of the MLCommons CM workflow automation language and MLCommons CK playground (reproducibility and optimization challenges) to reduce benchmarking, optimization and reproducibility costs. We are very glad to see that more than 80% of all recent MLPerf inference benchmark submissions were automated using our open-source technology - don't hesitate to get in touch with the task force via our public Discord server.
Automatically run any MLPerf benchmark out-of-the-box with any software, hardware, model and data from any vendor: prototype is available and validated during MLPerf inf v3.0 submission.
Automate optimization experiments and visualization of results with derived metrics: prototype.
Generate Pareto-optimal end-to-end applications based on reproducible MLPerf results: under development.
Organize reproducibility, replicability and optimization challenges to improve MLPerf results across diverse software, hardware, models and data: on-going (see adopted terminology here).
Bridge the growing gap between research and production.

Mission

This task force was established by MLCommons and the cTuning foundation in 2022 to apply the established automation and reproducibility methodology and open-source tools from ACM, IEEE and the cTuning foundation to run MLPerf benchmarks out-of-the-box across any software, hardware, models and data from any vendor with the help of the MLCommons CM automation language and the MLCommons CK playground.

We use this open-source technology to organize reproducibility, replicability and optimization challenges to reproduce results from research papers and MLPerf submissions, improve/optimize them in terms of accuracy, performance, power consumption, size, costs and other metrics, and validate them in the real-world applications.

We successfully validated the latest version of open-technology during the 1st collaborative challenge to run MLPerf inference v3.0 benchmark across diverse models, software and hardware from Neural Magic, Qualcomm, Nvidia, Intel, AMD, Microsoft, Amazon, Google, Krai, cKnowledge, cTuning foundation, OctoML, Deelvin, DELL, HPE, Lenovo, Hugging Face and Apple - CK and CM has helped to automate more than 80% of all recent MLPerf inference benchmark submissions (and 98% of all power results), make them more reproducible and reusable, and obtain record inference performance on the latest Qualcomm and Nvidia devices.

Our ultimate mission is to help all MLCommons members and the community slash their benchmarking, development, optimization and operational costs and accelerate innovation. They should be able to use the CK playground and CM language to automatically generate the most efficient, reproducible and deployable application from the most suitable combination of software, hardware and models based on their requirements, constraints and MLPerf results.

Discussions

Join our public Discord server to discuss developments and challenges.
Check our upcoming reproducibility, replicability and optimization challenges
Join our public conf-calls.
Check our news.

Chairs and Tech Leads

Grigori Fursin
Arjun Suresh

Development plan

2023

DONE: prototype the CM (CK2) automation to let the community submit MLPerf inference v3.0 results across any software and hardware stack (our technology powered 4K+ results (!) across diverse cloud and edge platforms with different versions of PyTorch, ONNX, TFLite, TF, TVM targeting diverse CPUs and GPUs that will be announced at the beginning of April)!
Prototype an open-source on-prem CK platform with a public API to automate SW/HW co-design for AI, ML and other emerging workloads based on user requirements and constraints.
Collaborative CK challenge for the community to reproduce, optimize and submit results to MLPerf inference v3.0 - 98% of all results were automated by the MLCommons CK technology!
New CK challenge to help MLCommons organizations and the community use our platform to prepare, optimize and compare their MLPerf inference v3.1 submissions on any SW/HW stack
Enhance the MLCommons CK2/CM automation meta-framework to support our platform across any SW/HW stacks from MLCommons members and the community.
Enhance the MLPerf C++ inference template library (MITL) to run and optimize MLPerf inference across any AI/ML/SW/HW stack.
Enhance the light MLPerf inference application to benchmark any ML model on any SW/HW stack without data sets and accuracy.
Enhance our platform and automation framework to support reproducibility initiatives and studies at conferences and journals across rapidly evolving software, hardware and data (collaboration with the cTuning foundation, ACM, IEEE and NeurIPS).

2022

DONE: Prototype of the 2nd version of the MLCommons CK framework to solve the dependency hell and run any application on any hardware/software stack.
DONE: GUI prototype to run MLPerf inference benchmarks on any software and hardware.
DONE: GUI prototype to prepare MLPerf inference submission.
DONE: GUI prototype to visualize AI/ML benchmarking results.

Archive of 2022 tasks.

Resources

Motivation:
Tools:
Reproducibility and replicability studies:
- Terminology (ACM/NISO): Repeatability, Reproducibility and Replicability
- Artifact Evaluation at ML and systems conferences

Acknowledgments

This task force is supported by MLCommons, cTuning foundation, cKnowledge and individual contributors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

taskforce.md

taskforce.md

MLCommons Task force on Automation and Reproducibility

Goals

Mission

Discussions

Chairs and Tech Leads

Development plan

2023

2022

Resources

Acknowledgments

Files

taskforce.md

Latest commit

History

taskforce.md

File metadata and controls

MLCommons Task force on Automation and Reproducibility

Goals

Mission

Discussions

Chairs and Tech Leads

Development plan

2023

2022

Resources

Acknowledgments