Skip to main content

Unified download and managing of real-world retail datasets for analysis and benchmarking.

Project description

RetailData

A unified interface for fetching and preparing retail datasets for benchmarking and analysis.

Features

  • Unified API: Fetch datasets from various providers (HTTP, Kaggle, Hugging Face, UCI, OpenML) with a single command.
  • Secure Credentials: Integrated support for Kaggle and Hugging Face API keys.
  • Data Benchmark Pack: Curated retail datasets (Favorita, Rossmann, Instacart, M5, Olist, and more).
  • Processing Pipeline: Automatic conversion to high-performance Parquet optimized for Polars.
  • Cache Management: Programmatic disk usage tracking and clearing.

Installation

pip install retaildata

Or using uv (recommended for development):

uv pip install -e .

Quick Start

CLI

  1. List available datasets:

    retaildata list
    
  2. Download a dataset:

    retaildata get test_http
    
  3. Download with Preparation (Parquet):

    retaildata get online_retail_ii --prepare
    
  4. Manage Credentials (e.g. Kaggle):

    retaildata auth set kaggle --file ~/.kaggle/kaggle.json
    
  5. Clean Up:

    retaildata rm test_http
    retaildata purge --all
    

Python API

import retaildata.api as rd
import polars as pl
from pathlib import Path

# Download and prepare dataset
rd.api.download("online_retail_ii", prepare=True)

# Load efficiently with Polars
df = pl.scan_parquet("~/.local/share/retaildata/prepared/online_retail_ii/*.parquet").collect()
print(df.head())

Supported Datasets

  • online_retail_ii: UK-based online retail transactions.
  • olist: Brazilian e-commerce dataset.
  • m5: Walmart time-series forecasting.
  • store_sales: Corporación Favorita (Ecuador) store sales.
  • rossmann: Rossmann store sales benchmarks.
  • instacart: Online grocery basket analysis.
  • online_retail_uci: Classical transactions dataset (UCI).
  • credit_approval_openml: Financial benchmarking (OpenML).

See retaildata list for the full registry.

License

This package is licensed under the MIT License. Individual datasets may have their own licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retaildata-0.1.1.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retaildata-0.1.1-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file retaildata-0.1.1.tar.gz.

File metadata

  • Download URL: retaildata-0.1.1.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for retaildata-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b4c0f5d4d0324bc0c6bb66dcb85b2ca5f63468646c39a310ac26bcd8b20a64c8
MD5 6b5bab07a6985595ad54cc77a276635b
BLAKE2b-256 885519719bbec8f4e79463e8e3fb05cdc153311bc4acb4926072e13fe77c78a6

See more details on using hashes here.

File details

Details for the file retaildata-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: retaildata-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for retaildata-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e17d7098970e9527b399cd0ee2d176fffb0bcff1179e71c7ae0242d53d4ad9f9
MD5 26ca58ab9e659c0139583bddac21cfbb
BLAKE2b-256 95f9e256abbbddf9f03dbeac9b7095a4544c9675c50755b74c42932612433038

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page