Unified download and managing of real-world retail datasets for analysis and benchmarking.
Project description
RetailData
A unified interface for fetching and preparing retail datasets for benchmarking and analysis.
Features
- Unified API: Fetch datasets from various providers (HTTP, Kaggle, Hugging Face, UCI, OpenML) with a single command.
- Secure Credentials: Integrated support for Kaggle and Hugging Face API keys.
- Data Benchmark Pack: Curated retail datasets (Favorita, Rossmann, Instacart, M5, Olist, and more).
- Processing Pipeline: Automatic conversion to high-performance Parquet optimized for Polars.
- Cache Management: Programmatic disk usage tracking and clearing.
Installation
pip install retaildata
Or using uv (recommended for development):
uv pip install -e .
Quick Start
CLI
-
List available datasets:
retaildata list -
Download a dataset:
retaildata get test_http
-
Download with Preparation (Parquet):
retaildata get online_retail_ii --prepare
-
Manage Credentials (e.g. Kaggle):
retaildata auth set kaggle --file ~/.kaggle/kaggle.json
-
Clean Up:
retaildata rm test_http retaildata purge --all
Python API
import retaildata.api as rd
import polars as pl
from pathlib import Path
# Download and prepare dataset
rd.api.download("online_retail_ii", prepare=True)
# Load efficiently with Polars
df = pl.scan_parquet("~/.local/share/retaildata/prepared/online_retail_ii/*.parquet").collect()
print(df.head())
Supported Datasets
online_retail_ii: UK-based online retail transactions.olist: Brazilian e-commerce dataset.m5: Walmart time-series forecasting.store_sales: Corporación Favorita (Ecuador) store sales.rossmann: Rossmann store sales benchmarks.instacart: Online grocery basket analysis.online_retail_uci: Classical transactions dataset (UCI).credit_approval_openml: Financial benchmarking (OpenML).
See retaildata list for the full registry.
License
This package is licensed under the MIT License. Individual datasets may have their own licenses.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file retaildata-0.1.5.tar.gz.
File metadata
- Download URL: retaildata-0.1.5.tar.gz
- Upload date:
- Size: 25.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94c5b719fa92afe9f9d0632f0669fb0edc6946eef3d84515956652a0496915c4
|
|
| MD5 |
3a8e77f037d4b4c866e964a1f04026a3
|
|
| BLAKE2b-256 |
88d6f6ad0b2993faaee062d080e6e1f58ecbf9a5a5d1b7e70694a58e2284de68
|
File details
Details for the file retaildata-0.1.5-py3-none-any.whl.
File metadata
- Download URL: retaildata-0.1.5-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ce4195018c718e2fe2e80f1aeed8dda1a55b775909af1812ece99deec2bc113
|
|
| MD5 |
811c0ed22717f26bc0ce7f7de59af940
|
|
| BLAKE2b-256 |
81ff0e786f51eee64161337cfcad9dee213b10efc4313cd0e27e4f265bc70355
|