Skip to main content

Secure CLI + Python API to download and manage real-world retail benchmark datasets.

Project description

RetailData

A unified interface for fetching and preparing retail datasets for benchmarking and analysis.

Features

  • Unified API: Fetch datasets from various providers (HTTP, Kaggle, Hugging Face, UCI, OpenML) with a single command.
  • Secure Credentials: Integrated support for Kaggle and Hugging Face API keys.
  • Data Benchmark Pack: Curated retail datasets (Favorita, Rossmann, Instacart, M5, Olist, and more).
  • Processing Pipeline: Automatic conversion to high-performance Parquet optimized for Polars.
  • Cache Management: Programmatic disk usage tracking and clearing.

Installation

pip install retaildata

Or using uv (recommended for development):

uv pip install -e .

Quick Start

CLI

  1. List available datasets:

    retaildata list
    
  2. Download a dataset:

    retaildata get test_http
    
  3. Download with Preparation (Parquet):

    retaildata get online_retail_ii --prepare
    
  4. Manage Credentials (e.g. Kaggle):

    retaildata auth set kaggle --file ~/.kaggle/kaggle.json
    
  5. Clean Up:

    retaildata rm test_http
    retaildata purge --all
    

Python API

import retaildata.api as rd
import polars as pl
from pathlib import Path

# Download and prepare dataset
rd.api.download("online_retail_ii", prepare=True)

# Load efficiently with Polars
df = pl.scan_parquet("~/.local/share/retaildata/prepared/online_retail_ii/*.parquet").collect()
print(df.head())

Supported Datasets

  • online_retail_ii: UK-based online retail transactions.
  • olist: Brazilian e-commerce dataset.
  • m5: Walmart time-series forecasting.
  • store_sales: Corporación Favorita (Ecuador) store sales.
  • rossmann: Rossmann store sales benchmarks.
  • instacart: Online grocery basket analysis.
  • online_retail_uci: Classical transactions dataset (UCI).
  • credit_approval_openml: Financial benchmarking (OpenML).

See retaildata list for the full registry.

License

This package is licensed under the MIT License. Individual datasets may have their own licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retaildata-0.1.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retaildata-0.1.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file retaildata-0.1.0.tar.gz.

File metadata

  • Download URL: retaildata-0.1.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for retaildata-0.1.0.tar.gz
Algorithm Hash digest
SHA256 407a11cefb4cd438b938d909849b71997d14e527e4aebb43ee2c8333f513d9c5
MD5 0bc885de988fa61c1759c6900ad2ef93
BLAKE2b-256 13334efb4759b7edff282949d56d19b64f509bbadff576aec2f235d3159cf4f8

See more details on using hashes here.

File details

Details for the file retaildata-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: retaildata-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for retaildata-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb186a817e268ffc524955496bb1cb452c76dc4c0b4dde80e91af7031d3535a0
MD5 ac9f16c4362ede525ffc38e5654d2faf
BLAKE2b-256 70499e78f0b811393d463ba7dbba6bc6975adca9d0ed2f700a2984cb5fa36a57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page