[ Back to index ]

MLCommons Task Force on Automation and Reproducibility

News (May 2024): our task force has successfully accomplished its mission and our on-going developments will be funded by MLCommons and integrated with several MLCommons Working Groups - please stay tuned for more details!

Mission

Extend MLCommons CM workflow automation framework and reusable automation recipes (CM scripts) to automate MLCommons projects and make it easier to assemble, run, reproduce, customize and optimize ML(Perf) benchmarks in a unified and automated way across diverse models, data sets, software and hardware from different vendors.
Extend CM workflows to automate and reproduce MLPerf inference submissions from different vendors starting from v3.1.
Encode MLPerf rules and best practices in the CM automation recipes and workflows for MLPerf to help MLPerf submitters avoid going through many README files and track all the latest MLPerf changes and updates.

Current projects

Continue improving CM to support different MLCommons projects for universal benchmarking and optimization across different platforms.
Extend CM workflows to reproduce MLPerf inference v4.0 submissions (Intel, Nvidia, Qualcomm, Google, Red Hat, etc) via a unified interface.
Prepare tutorial for MLPerf inference v4.1 submissions via CM.
Discuss how to introduce the CM automation badge to MLPerf inference v4.1 submission similar to ACM/IEEE/NeurIPS reproducibility badges to make it easier for all submitters to re-run and reproduce each others’ results before the publication date.
Develop a more universal Python and C++ wrapper for the MLPerf loadgen with the CM automation to support different models, data sets, software and hardware: Python prototype; C++ prototype.
Collaborate with system vendors and cloud providers to help them benchmark their platforms using the best available MLPerf inference implementation.
Collaborate with other MLCommons working groups to autoamte, modularize and unify their benchmarks using CM automation recipes.
Use CM to modularize and automate the upcoming automotive benchmark.
Use MLCommons Croissant to unify MLPerf datasets.

Current tasks

Improving CM workflow automation framework: GitHub ticket
Updating/refactoring CM docs (framework and MLPef workflows): GitHub ticket
Improving CM scripts to support MLPerf: GitHub ticket
Adding profiling and performance analysis during benchmarking: GitHub ticket
Improving universal build and run scripts to support cross-platform compilation: GitHub ticket
Automate ABTF benchmarking via CM: GitHub ticket
Help automate MLPerf inference benchmark at the Student Cluster Competition'24: GitHub ticket

Completed deliverables

Developed reusable and technology-agnostic automation recipes and workflows with a common and human-friendly interface (MLCommons Collective Mind aka CM) to modularize MLPerf inference benchmarks and run them in a unified and automated way across diverse models, data sets, software and hardware from different vendors.
Added GitHub actions to test MLPerf inference benchmarks using CM.
Encoded MLPerf inference rules and best practices in the CM automation recipes and workflows for MLPerf and reduced the burden for submitters to go through numerous README files and track all the latest changes and reproduce results.
Automated MLPerf inference submissions and made it easier to re-run and reproduce results (see submitters orientation and CM-MLPerf documentation).
Started developing an open-source platform to automatically compose high-performance and cost-effective AI applications and systems using MLPerf and CM (see our presentation at MLPerf-Bench at HPCA’24).
Supported AI, ML and Systems conferences to automate artifact evaluation and reproducibility initiatives (see CM at ACM/IEEE MICRO’23 and SCC’23/SuperComputing’23).

Resources

CM GitHub project
CM concept (keynote at ACM REP'23)
CM Getting Started Guide
CM-MLPerf commands
CM-MLPerf GUI
ACM artifact review and badging methodology
Artifact Evaluation at ML and systems conferences
Terminology (ACM/NISO): Repeatability, Reproducibility and Replicability
CM motivation (ACM TechTalk about reproducing 150+ research papers and validating them in the real world)

Acknowledgments

This task force was established by Grigori Fursin after he donated his CK and CM automation technology to MLCommons in 2022 to benefit everyone. Since then, this open-source technology is being developed as a community effort based on user feedback. We would like to thank all our volunteers, collaborators and contributors for their support, fruitful discussions, and useful feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

taskforce.md

taskforce.md

MLCommons Task Force on Automation and Reproducibility

Mission

Sponsors

Current projects

Current tasks

Completed deliverables

Resources

Acknowledgments

Files

taskforce.md

Latest commit

History

taskforce.md

File metadata and controls

MLCommons Task Force on Automation and Reproducibility

Mission

Sponsors

Current projects

Current tasks

Completed deliverables

Resources

Acknowledgments