downloads and prepares various mnist-compatible datasets.
files are downloaded to ~/.mnist
and checked for integrity by SHA-256 hashes.
python 3.5 (or later), numpy.
pip install --upgrade 'https://github.com/notwa/mnists/tarball/master#egg=mnists'
I recommend adding --upgrade-strategy only-if-needed
to the command
so that you don't accidentally "upgrade" numpy to
a version not compiled specifically for your environment.
This can happen when using e.g. Anaconda.
import mnists
dataset = "emnist_balanced"
train_images, train_labels, test_images, test_labels = mnists.prepare(dataset)
the default images shape is (n, 1, 28, 28) and scaled to the range [0, 1]. labels are output in one-hot encoding.
pass flatten=True
to get a flattened (n, 784) image shape.
pass return_floats=False
to get the raw [0, 255] integer range of images.
pass return_onehot=False
to get the raw [0, M-1] integer encoding of labels.
you will notice that, by default, there is a single-dimensional entry in the shape of images: (n, 1, 28, 28). this exists to obtain compatibility with programs that expect a number of color channels in that place. since mnist-like datasets are (as of writing) all grayscale, there is only one color channel, and thus the size of this dimension is 1.
in alphabetical order, using default mnists.prepare
arguments:
subdirectory | dataset | train images shape | train labels shape | test images shape | test labels shape |
---|---|---|---|---|---|
emnist | emnist_balanced | (112800, 1, 28, 28) | (112800, 47) | (18800, 1, 28, 28) | (18800, 47) |
emnist | emnist_byclass | (697932, 1, 28, 28) | (697932, 62) | (116323, 1, 28, 28) | (116323, 62) |
emnist | emnist_bymerge | (697932, 1, 28, 28) | (697932, 47) | (116323, 1, 28, 28) | (116323, 47) |
emnist | emnist_digits | (240000, 1, 28, 28) | (240000, 10) | (40000, 1, 28, 28) | (40000, 10) |
emnist | emnist_letters | (124800, 1, 28, 28) | (124800, 26) | (20800, 1, 28, 28) | (20800, 26) |
emnist | emnist_mnist | (60000, 1, 28, 28) | (60000, 10) | (10000, 1, 28, 28) | (10000, 10) |
fashion-mnist | fashion_mnist | (60000, 1, 28, 28) | (60000, 10) | (10000, 1, 28, 28) | (10000, 10) |
mnist | mnist | (60000, 1, 28, 28) | (60000, 10) | (10000, 1, 28, 28) | (10000, 10) |
qmnist | qmnist | (60000, 1, 28, 28) | (60000, 10) | (60000, 1, 28, 28) | (60000, 10) |