Skip to main content

A collection of popular datasets for deep learning.

Project description


dbcollection is a library for downloading/parsing/managing datasets via simple methods.
It was built from the ground up to be cross-platform (Windows, Linux, MacOS) and
cross-language (Python, Lua, Matlab, etc.). This is achieved by using the popular HDF5
file format to store (meta)data of manually parsed datasets and Python for scripting.
By doing so, this library can target any platform that supports Python and any language
that has bindings for HDF5.

This package allows to easily manage and load datasets in an easy and simple
way by using HDF5 files as metadata storage. By storing all the necessary metadata
to disk, it allows for huge datasets to be used in systems with reduced
memory usage. Also, once a dataset is setup, it is setup forever! Users can reuse it
as many times as they want/need for a myriad of tasks without having to setup a
dataset each time they hack some code. This lets users focus on more important tasks
fast prototyping without having to spend time managing datasets or creating/modyfing
scripts to load/fetch data from disk.

Main features
-------------

Here are some of key features dbcollection provides:

- Simple API to load/download/setup/manage datasets
- Simple API to fetch data of a dataset
- All data is stored in disk, resulting in reduced RAM usage (useful for large datasets)
- Datasets only need to be setup once
- Cross-platform (Windows, Linux, MacOs).
- Easily extensible to other languages that have support for HDF5 files
- Concurrent/parallel data access is possible thanks to the HDF5 file format
- Diverse list of popular datasets are available for use
- All datasets were manually parsed by someone, meaning that some of the quirks were
already solved for you

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbcollection-0.1.11.tar.gz (11.5 MB view hashes)

Uploaded Source

Built Distribution

dbcollection-0.1.11-py2.py3-none-any.whl (11.7 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page