MNIST Database Access API for Bob
The MNIST database is a database of handwritten digits, which consists of a training set of 60,000 examples, and a test set of 10,000 examples. It was made available by Yann Le Cun and Corinna Cortes (MNIST database). The data was originally extracted from a larger set made available by NIST, before being size-normalized and centered in a fixed-size image (28x28 pixels).
You would normally not install this package unless you are maintaining it. What you would do instead is to tie it in at the package you need to use it. There are a few ways to achieve this:
- You can add this package as a requirement at the setup.py for your own satellite package or to your Buildout .cfg file, if you prefer it that way. With this method, this package gets automatically downloaded and installed on your working environment, or
- You can manually download and install this package using commands like easy_install or pip.
The package is available in two different distribution formats:
The database raw files must be installed somewhere in your environment.
You can mix and match points 1/2 above based on your requirements. Here are some examples:
Modify your setup.py and download from PyPI
That is the easiest. Edit your setup.py in your satellite package and add the following entry in the install_requires section (note: ... means whatever extra stuff you may have in-between, don’t put that on your script):
install_requires=[ ... "xbob.db.mnist", ],
Proceed normally with your bootstrap/buildout steps and you should be all set. That means you can now import the namespace xbob.db.mnist into your scripts.
Modify your buildout.cfg and download from git
You will need to add a dependence to mr.developer to be able to install from our git repositories. Your buildout.cfg file should contain the following lines:
[buildout] ... extensions = mr.developer auto-checkout = * eggs = bob ... xbob.db.mnist [sources] xbob.db.mnist = git https://github.com/bioidiap/xbob.db.mnist.git ...
How to use this database API
After launching the python interpreter (assuming that the environment is properly set up), you could get the training set as follows:
>>> import xbob.db.mnist >>> db = xbob.db.mnist.Database('PATH_TO_DATA_FROM_YANN_LECUN_WEBSITE') # 4 binary .gz compressed files >>> images, labels = db.data(groups='train', labels=[0,1,2,3,4,5,6,7,8,9])
In this case, this should return two NumPy arrays:
- images contain the raw data (60,000 samples of dimension 784 [28x28 pixels images])
- labels are the corresponding classes (digits 0 to 9) for each of the 60,000 samples
If you don’t have the data installed on your machine, you can also use the following set of commands that will:
- first look for the database in the xbob/db/mnist/ subdirectory and use it if is available
- automatically download it from Yann Lecun’s website into a temporary folder that will be erased when the destructor of the xbob.db.mnist database is called.
- automatically download it into the provided directory that will not be deleted.
>>> import xbob.db.mnist >>> db = xbob.db.mnist.Database() # Check for the data files locally, and download them if required >>> images, labels = db.data(groups='train', labels=[0,1,2,3,4,5,6,7,8,9]) >>> del db # delete the temporary downloaded files if any
>>> db = xbob.db.mnist.Database("Directory") # Persistently downloads files into the folder "Directory" >>> images, labels = db.data(groups='train', labels=[0,1,2,3,4,5,6,7,8,9]) >>> del db # The download directory stays