A framework-agnostic datasets library for Machine Learning research and education.
Project description
Dataget
Dataget is an easy to use, framework-agnostic, dataset library that gives you quick access to a collection of Machine Learning datasets through a simple API.
Main features:
- Minimal: Downloads entire datasets with just 1 line of code.
- Compatible: Loads data as
numpy
arrays orpandas
dataframes which can be easily used with the majority of Machine Learning frameworks. - Transparent: By default stores the data in your current project so you can easily inspect it.
- Memory Efficient: When a dataset doesn't fit in memory it will return metadata instead so you can iteratively load it.
- Integrates with Kaggle: Supports loading datasets directly from Kaggle in a variety of formats.
Checkout the documentation for the list of avaiable datasets.
Getting Started
In dataget you just have to do two things:
- Instantiate a
Dataset
from our collection. - Call the
get
method to download the data to disk and load it into memory.
Both are usually done in one line:
import dataget as dg
X_train, y_train, X_test, y_test = dg.vision.mnist().get()
This examples downloads the MNIST dataset to ./data/vision_mnist
and loads it as numpy
arrays.
Kaggle Support
Kaggle promotes the use of csv
files and dataget
loves it! With dataget you can quickly download any dataset from the platform and have immediate access to the data:
import dataget as dg
df_train, df_test = dg.kaggle("cristiangarcia/pointcloudmnist2d").get(
files=["train.csv", "test.csv"]
)
To start using Kaggle datasets just make sure you have properly installed and configured the Kaggle API. In the future we want to expand Kaggle support in the following ways:
- Be able to load any file that
numpy
orpandas
can read. - Have generic support for other types of datasets like images, audio, video, etc.
- e.g
dg.data.kaggle(..., type="vision").get(...)
- e.g
Installation
dataget
is avaiable at pypi so you can use your favorite package manager.
pip
pip install dataget
pipenv
pipenv install pytest
poetry
poetry add dataget
Contributing
Read our guide one Creating a Dataset if you are interested in adding a dataset to dataget.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.