Machine Learning dataset loaders
Project description
Machine learning dataset loaders
Loaders for various machine learning datasets for testing and example scripts.
Previously in thinc.extra.datasets
.
Setup and installation
The package can be installed via pip:
pip install ml-datasets
Loaders
Loaders can be imported directly or used via their string name (which is useful if they're set via command line arguments). Some loaders may take arguments – see the source of details.
# Import directly
from ml_datasets import imdb
train_data, dev_data = imdb()
# Load via registry
from ml_datasets import loaders
imdb_loader = loaders.get("imdb")
train_data, dev_data = imdb_loader()
Available loaders
ID / Function | Description | From URL |
---|---|---|
imdb |
IMDB sentiment dataset. | ✓ |
mnist |
MNIST data. | ✓ |
quora_questions |
Quora question answer dataset. | ✓ |
reuters |
Reuters dataset. | ✓ |
snli |
Stanford Natural Language Inference corpus. | ✓ |
stack_exchange |
Stack Exchange dataset. | |
ud_ancora_pos_tags |
Universal Dependencies Spanish AnCora corpus (POS tagging). | ✓ |
ud_ewtb_pos_tags |
Universal Dependencies English EWT corpus (POS tagging). | ✓ |
wikiner |
WikiNER data. |
Registering loaders
Loaders can be registered externally using the loaders
registry as a decorator. For example:
@ml_datasets.loaders("my_custom_loader")
def my_custom_loader():
return load_some_data()
assert "my_custom_loader" in ml_datasets.loaders
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ml_datasets-0.1.1.tar.gz
(7.3 kB
view hashes)