Machine Learning dataset loaders
Project description
Machine learning dataset loaders
Loaders for various machine learning datasets for testing and example scripts.
Previously in thinc.extra.datasets
.
Setup and installation
The package can be installed via pip:
pip install ml-datasets
Loaders
Loaders can be imported directly or used via their string name (which is useful if they're set via command line arguments). Some loaders may take arguments – see the source of details.
# Import directly
from ml_datasets import imdb
train_data, dev_data = imdb()
# Load via registry
from ml_datasets import loaders
imdb_loader = loaders.get("imdb")
train_data, dev_data = imdb_loader()
Available loaders
ID / Function | Description | From URL |
---|---|---|
imdb |
IMDB sentiment dataset. | ✓ |
mnist |
MNIST data. | ✓ |
quora_questions |
Quora question answer dataset. | ✓ |
reuters |
Reuters dataset. | ✓ |
snli |
Stanford Natural Language Inference corpus. | ✓ |
stack_exchange |
Stack Exchange dataset. | |
ud_ancora_pos_tags |
Universal Dependencies Spanish AnCora corpus (POS tagging). | ✓ |
ud_ewtb_pos_tags |
Universal Dependencies English EWT corpus (POS tagging). | ✓ |
wikiner |
WikiNER data. | |
dbpedia |
DBPedia ontology dataset via fast.ai. | ✓ |
Registering loaders
Loaders can be registered externally using the loaders
registry as a decorator. For example:
@ml_datasets.loaders("my_custom_loader")
def my_custom_loader():
return load_some_data()
assert "my_custom_loader" in ml_datasets.loaders
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ml_datasets-0.1.4.tar.gz
(8.0 kB
view details)
File details
Details for the file ml_datasets-0.1.4.tar.gz
.
File metadata
- Download URL: ml_datasets-0.1.4.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8328e7d1c2e8d1331ff188474b26dac9ce129ab9538299b03c1626b627c7a8cc |
|
MD5 | 39d5eecdf57df4120e40536b5e131a9d |
|
BLAKE2b-256 | 91fb6f070c81003a8d540c2c4027d98b6faed8841253110b7bac2265a1e003bf |