Skip to main content

MNIST Database Access API for Bob

Project description

The MNIST database is a database of handwritten digits, which consists of a training set of 60,000 examples, and a test set of 10,000 examples. It was made available by Yann Le Cun and Corinna Cortes (MNIST database). The data was originally extracted from a larger set made available by NIST, before being size-normalized and centered in a fixed-size image (28x28 pixels).

The actual raw data for the database should be downloaded from the original website. This package only contains the Bob accessor methods to use this database directly from python.

You would normally not install this package unless you are maintaining it. What you would do instead is to tie it in at the package you need to use it. There are a few ways to achieve this:

  1. You can add this package as a requirement at the for your own satellite package or to your Buildout .cfg file, if you prefer it that way. With this method, this package gets automatically downloaded and installed on your working environment, or
  2. You can manually download and install this package using commands like easy_install or pip.

The package is available in two different distribution formats:

  1. You can download it from PyPI, or
  2. You can download it in its source form from its git repository.

The database raw files must be installed somewhere in your environment.

You can mix and match points 1/2 above based on your requirements. Here are some examples:

Modify your and download from PyPI

That is the easiest. Edit your in your satellite package and add the following entry in the install_requires section (note: ... means whatever extra stuff you may have in-between, don’t put that on your script):


Proceed normally with your bootstrap/buildout steps and you should be all set. That means you can now import the namespace xbob.db.mnist into your scripts.

Modify your buildout.cfg and download from git

You will need to add a dependence to mr.developer to be able to install from our git repositories. Your buildout.cfg file should contain the following lines:

extensions = mr.developer
auto-checkout = *
eggs = bob

xbob.db.mnist = git

How to use this database API

After launching the python interpreter (assuming that the environment is properly set up), you could get the training set as follows:

>>> import xbob.db.mnist
>>> db = xbob.db.mnist.Database('PATH_TO_DATA_FROM_YANN_LECUN_WEBSITE') # 4 binary .gz compressed files
>>> images, labels ='train', labels=[0,1,2,3,4,5,6,7,8,9])

In this case, this should return two NumPy arrays:

  1. images contain the raw data (60,000 samples of dimension 784 [28x28 pixels images])
  2. labels are the corresponding classes (digits 0 to 9) for each of the 60,000 samples

If you don’t have the data installed on your machine, you can also use the following set of command that will:

  1. first look for the database in the xbob/db/mnist/ subdirectory and use it if is available

2. or automatically download it from Yann Lecun’s website into a temporary folder, that will be erased when the destructor of the xbob.db.mnist database is called:

>>> import xbob.db.mnist
>>> db = xbob.db.mnist.Database() # Check for the data files locally, and download them if required
>>> images, labels ='train', labels=[0,1,2,3,4,5,6,7,8,9])
>>> del db # delete the temporary downloaded files if any

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for xbob.db.mnist, version 1.0.0g
Filename, size File type Python version Upload date Hashes
Filename, size (20.3 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page