Skip to main content

Machine Learning dataset loaders

Project description

Machine learning dataset loaders

Loaders for various machine learning datasets for testing and example scripts. Previously in thinc.extra.datasets.

PyPi Version

Setup and installation

The package can be installed via pip:

pip install ml-datasets


Loaders can be imported directly or used via their string name (which is useful if they're set via command line arguments). Some loaders may take arguments – see the source of details.

# Import directly
from ml_datasets import imdb
train_data, dev_data = imdb()
# Load via registry
from ml_datasets import loaders
imdb_loader = loaders.get("imdb")
train_data, dev_data = imdb_loader()

Available loaders

ID / Function Description From URL
imdb IMDB sentiment dataset.
mnist MNIST data.
quora_questions Quora question answer dataset.
reuters Reuters dataset.
snli Stanford Natural Language Inference corpus.
stack_exchange Stack Exchange dataset.
ud_ancora_pos_tags Universal Dependencies Spanish AnCora corpus (POS tagging).
ud_ewtb_pos_tags Universal Dependencies English EWT corpus (POS tagging).
wikiner WikiNER data.
dbpedia DBPedia ontology dataset via

Registering loaders

Loaders can be registered externally using the loaders registry as a decorator. For example:

def my_custom_loader():
    return load_some_data()

assert "my_custom_loader" in ml_datasets.loaders

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ml-datasets, version 0.1.4
Filename, size File type Python version Upload date Hashes
Filename, size ml_datasets-0.1.4.tar.gz (8.0 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page