Skip to main content

Unified download and managing of real-world retail datasets for analysis and benchmarking.

Project description

RetailData

A unified interface for fetching and preparing retail datasets for benchmarking and analysis.

Features

  • Unified API: Fetch datasets from various providers (HTTP, Kaggle, Hugging Face, UCI, OpenML) with a single command.
  • Secure Credentials: Integrated support for Kaggle and Hugging Face API keys.
  • Data Benchmark Pack: Curated retail datasets (Favorita, Rossmann, Instacart, M5, Olist, and more).
  • Processing Pipeline: Automatic conversion to high-performance Parquet optimized for Polars.
  • Cache Management: Programmatic disk usage tracking and clearing.

Installation

pip install retaildata

Or using uv (recommended for development):

uv pip install -e .

Quick Start

CLI

  1. List available datasets:

    retaildata list
    
  2. Download a dataset:

    retaildata get test_http
    
  3. Download with Preparation (Parquet):

    retaildata get online_retail_ii --prepare
    
  4. Manage Credentials (e.g. Kaggle):

    retaildata auth set kaggle --file ~/.kaggle/kaggle.json
    
  5. Clean Up:

    retaildata rm test_http
    retaildata purge --all
    

Python API

import retaildata.api as rd
import polars as pl
from pathlib import Path

# Download and prepare dataset
rd.api.download("online_retail_ii", prepare=True)

# Load efficiently with Polars
df = pl.scan_parquet("~/.local/share/retaildata/prepared/online_retail_ii/*.parquet").collect()
print(df.head())

Supported Datasets

  • online_retail_ii: UK-based online retail transactions.
  • olist: Brazilian e-commerce dataset.
  • m5: Walmart time-series forecasting.
  • store_sales: Corporación Favorita (Ecuador) store sales.
  • rossmann: Rossmann store sales benchmarks.
  • instacart: Online grocery basket analysis.
  • online_retail_uci: Classical transactions dataset (UCI).
  • credit_approval_openml: Financial benchmarking (OpenML).

See retaildata list for the full registry.

License

This package is licensed under the MIT License. Individual datasets may have their own licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retaildata-0.1.2.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retaildata-0.1.2-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file retaildata-0.1.2.tar.gz.

File metadata

  • Download URL: retaildata-0.1.2.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for retaildata-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3fcb306557c67fa002a2b04d303f6596db29a59ede90be22b25d96438ed7f1c0
MD5 60f9b579f24c755b25e5b7762c9dc386
BLAKE2b-256 fea6dee259a413b2f95d34eb6f7eee5acaf26fa3acba045b8e0d1d1f2ce8b111

See more details on using hashes here.

File details

Details for the file retaildata-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: retaildata-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for retaildata-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f3815d1ce470091bd62a5f9e5995d67eb2af42ee7f39b178dc42d6fd5d81ea0d
MD5 79f930b6051deb7740e80290cc09a4fa
BLAKE2b-256 1bcb3f77d288e2ae8b0520855a7513611a478c69256383bf952748e51f29d0f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page