Skip to main content

A framework-agnostic datasets library for Machine Learning research and education.

Project description

Dataget

Dataget is an easy to use, framework-agnostic, dataset library that gives you quick access to a collection of Machine Learning datasets through a simple API.

Main features:

  • Minimal: Downloads entire datasets with just 1 line of code.
  • Compatible: Loads data as numpy arrays or pandas dataframes which can be easily used with the majority of Machine Learning frameworks.
  • Transparent: By default stores the data in your current project so you can easily inspect it.
  • Memory Efficient: When a dataset doesn't fit in memory it will return metadata instead so you can iteratively load it.
  • Integrates with Kaggle: Supports loading datasets directly from Kaggle in a variety of formats.

Checkout the documentation for the list of available datasets.

Getting Started

In dataget you just have to do two things:

  • Instantiate a Dataset from our collection.
  • Call the get method to download the data to disk and load it into memory.

Both are usually done in one line:

import dataget


X_train, y_train, X_test, y_test = dataget.image.mnist().get()

This example downloads the MNIST dataset to ./data/image_mnist and loads it as numpy arrays.

Kaggle Support

Kaggle promotes the use of csv files and dataget loves it! With dataget you can quickly download any dataset from the platform and have immediate access to the data:

import dataget

df_train, df_test = dataget.kaggle("cristiangarcia/pointcloudmnist2d").get(
    files=["train.csv", "test.csv"]
)

To start using Kaggle datasets just make sure you have properly installed and configured the Kaggle API. In the future we want to expand Kaggle support in the following ways:

  • Be able to load any file that numpy or pandas can read.
  • Have generic support for other types of datasets like images, audio, video, etc.
    • e.g dataget.data.kaggle(..., type="image").get(...)

Installation

dataget is available at pypi so you can use your favorite package manager.

pip
pip install dataget
pipenv
pipenv install pytest
poetry
poetry add dataget

Contributing

Read our guide on Creating a Dataset if you are interested in adding a dataset to dataget.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataget-0.4.8.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

dataget-0.4.8-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file dataget-0.4.8.tar.gz.

File metadata

  • Download URL: dataget-0.4.8.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.6 Linux/5.3.0-7629-generic

File hashes

Hashes for dataget-0.4.8.tar.gz
Algorithm Hash digest
SHA256 ef2afa43839c4e0f4d988ea63f23f1de9f6fc96f42e4884165b1a63b66edef81
MD5 e193d42d9c147b9cfc66767bfbba99cb
BLAKE2b-256 deb2e901fbaf7555b14183b7b18e38bfa5af3402284db7f8fae6cc30b59c52cc

See more details on using hashes here.

File details

Details for the file dataget-0.4.8-py3-none-any.whl.

File metadata

  • Download URL: dataget-0.4.8-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.6 Linux/5.3.0-7629-generic

File hashes

Hashes for dataget-0.4.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8fe378f911fd3943080697d4e4db0ce531d8f041cfa5f1ebd213a03e93ff9d1f
MD5 70e62c74060f71e75e0fb3aca50194b1
BLAKE2b-256 672044ec3a12a848b2f4ab55442ca9eddd85ffd97312bcb02b580195d935f9e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page