MLPerf™ Training Reference Implementations

This is a repository of reference implementations for the MLPerf training benchmarks. These implementations are valid as starting points for benchmark implementations but are not fully optimized and are not intended to be used for "real" performance measurements of software frameworks or hardware.

Please see the MLPerf Training Benchmark paper for a detailed description of the motivation and guiding principles behind the benchmark suite. If you use any part of this benchmark (e.g., reference implementations, submissions, etc.) in academic work, please cite the following:

@misc{mattson2019mlperf,
    title={MLPerf Training Benchmark},
    author={Peter Mattson and Christine Cheng and Cody Coleman and Greg Diamos and Paulius Micikevicius and David Patterson and Hanlin Tang and Gu-Yeon Wei and Peter Bailis and Victor Bittorf and David Brooks and Dehao Chen and Debojyoti Dutta and Udit Gupta and Kim Hazelwood and Andrew Hock and Xinyuan Huang and Atsushi Ike and Bill Jia and Daniel Kang and David Kanter and Naveen Kumar and Jeffery Liao and Guokai Ma and Deepak Narayanan and Tayo Oguntebi and Gennady Pekhimenko and Lillian Pentecost and Vijay Janapa Reddi and Taylor Robie and Tom St. John and Tsuguchika Tabaru and Carole-Jean Wu and Lingjie Xu and Masafumi Yamazaki and Cliff Young and Matei Zaharia},
    year={2019},
    eprint={1910.01500},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

These reference implementations are still very much "alpha" or "beta" quality. They could be improved in many ways. Please file issues or pull requests to help us improve quality.

Running Benchmarks

These benchmarks have been tested on the following machine configuration:

16 CPUs, one Nvidia P100.
Ubuntu 16.04, including docker with nvidia support.
600GB of disk (though many benchmarks do require less disk).
Either CPython 2 or CPython 3, depending on benchmark (see Dockerfiles for details).

Generally, a benchmark can be run with the following steps:

Setup docker & dependencies. There is a shared script (install_cuda_docker.sh) to do this. Some benchmarks will have additional setup, mentioned in their READMEs.
Download the dataset using ./download_dataset.sh. This should be run outside of docker, on your host machine. This should be run from the directory it is in (it may make assumptions about CWD).
Optionally, run verify_dataset.sh to ensure the was successfully downloaded.
Build and run the docker image, the command to do this is included with each Benchmark.

Each benchmark will run until the target quality is reached and then stop, printing timing results.

Some these benchmarks are rather slow or take a long time to run on the reference hardware (i.e. 16 CPUs and one P100). We expect to see significant performance improvements with more hardware and optimized implementations.

Name		Name	Last commit message	Last commit date
Latest commit History 314 Commits
.github		.github
image_classification		image_classification
image_segmentation/pytorch		image_segmentation/pytorch
language_model/tensorflow/bert		language_model/tensorflow/bert
large_language_model		large_language_model
object_detection		object_detection
recommendation		recommendation
recommendation_v2/torchrec_dlrm		recommendation_v2/torchrec_dlrm
retired_benchmarks		retired_benchmarks
rnn_speech_recognition/pytorch		rnn_speech_recognition/pytorch
single_stage_detector		single_stage_detector
stable_diffusion		stable_diffusion
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
benchmark_readme_template.md		benchmark_readme_template.md
install_cuda_docker.sh		install_cuda_docker.sh
reference_results.md		reference_results.md

License

Trainy-ai/training

Folders and files

Latest commit

History

Repository files navigation

MLPerf™ Training Reference Implementations

Contents

Running Benchmarks

About

Resources

License

Stars

Watchers

Forks

Languages