Skip to main content

nonechucks is a library for PyTorch that provides wrappers for PyTorch's Dataset and Sampler objects to allow for dropping unwanted or invalid samples dynamically during dataset iteration.

Project description

nonechucks

nonechucks is a library for PyTorch that provides wrappers for PyTorch's Dataset and Sampler objects to allow for dropping unwanted or invalid samples dynamically during dataset iteration.


Introduction

What if you have a dataset of 1000s of images, out of which a few dozen images are unreadable because the image files are corrupted? Or what if your dataset is a folder full of scanned PDFs that you have to OCRize, and then run a language detector on the resulting text, because you want only the ones that are in English? Or maybe you have an AlternateIndexSampler, and you want to be able to move to dataset[6] after dataset[4] fails while attempting to load!

PyTorch's data processing module expects you to rid your dataset of any unwanted or invalid samples before you feed them into its pipeline, and provides no easy way to define a "fallback policy" in case such samples are encountered during dataset iteration.

Why do I need it?

You might be wondering why this is such a big deal when you could simply filter out samples before sending it to your PyTorch dataset or sampler! Well, it turns out that it can be a huge deal in many cases:

  1. When you have a small fraction of undesirable samples in a large dataset, or
  2. When your sample-loading operation is expensive, or
  3. When you want to let downstream consumers know that a sample is undesirable, or
  4. When you want your dataset and sampler to be decoupled.

In such cases, it's either simply too expensive to have a separate step to weed out bad samples, or it's just plain impossible because you don't even know what constitutes as "bad", or worse - both!

nonechucks allows you to wrap your existing datasets and samplers with "safe" versions of them, which can fix all these problems for you!

Use Cases

Coming soon

Installation

To install nonechucks, simply use pip:

$ pip install nonechucks

or clone this repo, and build from source with:

$ python setup.py install.

Examples

Coming soon

Contributing

All PRs are welcome.

Licensing

nonechucks is MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nonechucks-0.1.10.tar.gz (4.8 kB view details)

Uploaded Source

File details

Details for the file nonechucks-0.1.10.tar.gz.

File metadata

  • Download URL: nonechucks-0.1.10.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.15rc1

File hashes

Hashes for nonechucks-0.1.10.tar.gz
Algorithm Hash digest
SHA256 ea0d45cdafe64e626d2c22dcee9d9cb5022404d894e1423920072b9543f1d0f2
MD5 a42cf9f54fb8b6dd9ae23b22a8825062
BLAKE2b-256 6d36515cc3c0d5433759da63f4a63937b5ade494da110a5aa83a4e94cf38e438

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page