Skip to main content

Machine Learning dataset loaders

Project description

Machine learning dataset loaders

Loaders for various machine learning datasets for testing and example scripts. Previously in thinc.extra.datasets.

Current Release Version PyPi Version

Setup and installation

The package can be installed via pip:

pip install ml-datasets

Loaders

Loaders can be imported directly or used via their string name (which is useful if they're set via command line arguments). Some loaders may take arguments – see the source of details.

# Import directly
from ml_datasets import imdb
train_data, dev_data = imdb()
# Load via registry
from ml_datasets import loaders
imdb_loader = loaders.get("imdb")
train_data, dev_data = imdb_loader()

Available loaders

ID / Function Description From URL
imdb IMDB sentiment dataset.
mnist MNIST data.
quora_questions Quora question answer dataset.
reuters Reuters dataset.
snli Stanford Natural Language Inference corpus.
stack_exchange Stack Exchange dataset.
ud_ancora_pos_tags Universal Dependencies Spanish AnCora corpus (POS tagging).
ud_ewtb_pos_tags Universal Dependencies English EWT corpus (POS tagging).
wikiner WikiNER data.

Registering loaders

Loaders can be registered externally using the loaders registry as a decorator. For example:

@ml_datasets.loaders("my_custom_loader")
def my_custom_loader():
    return load_some_data()

assert "my_custom_loader" in ml_datasets.loaders

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_datasets-0.1.1.tar.gz (7.3 kB view details)

Uploaded Source

File details

Details for the file ml_datasets-0.1.1.tar.gz.

File metadata

  • Download URL: ml_datasets-0.1.1.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.2

File hashes

Hashes for ml_datasets-0.1.1.tar.gz
Algorithm Hash digest
SHA256 918532d2ec922a511e4064cbe3e8e6d7267c63a21e1276cc9a2f1c339c2ad070
MD5 1d9037bd63fd80eb7216e2c376c05f3a
BLAKE2b-256 000f5165254d2c1107c22cf6efb5c8a1bbb72679cee79a346a45c2c7a66b2e6c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page