Skip to main content

Unified download and managing of real-world retail datasets for analysis and benchmarking.

Project description

RetailData

A unified interface for fetching and preparing retail datasets for benchmarking and analysis.

Features

  • Unified API: Fetch datasets from various providers (HTTP, Kaggle, Hugging Face, UCI, OpenML) with a single command.
  • Secure Credentials: Integrated support for Kaggle and Hugging Face API keys.
  • Data Benchmark Pack: Curated retail datasets (Favorita, Rossmann, Instacart, M5, Olist, and more).
  • Processing Pipeline: Automatic conversion to high-performance Parquet optimized for Polars.
  • Cache Management: Programmatic disk usage tracking and clearing.

Installation

pip install retaildata

Or using uv (recommended for development):

uv pip install -e .

Quick Start

CLI

  1. List available datasets:

    retaildata list
    
  2. Download a dataset:

    retaildata get test_http
    
  3. Download with Preparation (Parquet):

    retaildata get online_retail_ii --prepare
    
  4. Manage Credentials (e.g. Kaggle):

    retaildata auth set kaggle --file ~/.kaggle/kaggle.json
    
  5. Clean Up:

    retaildata rm test_http
    retaildata purge --all
    

Python API

import retaildata.api as rd
import polars as pl
from pathlib import Path

# Download and prepare dataset
rd.api.download("online_retail_ii", prepare=True)

# Load efficiently with Polars
df = pl.scan_parquet("~/.local/share/retaildata/prepared/online_retail_ii/*.parquet").collect()
print(df.head())

Supported Datasets

  • online_retail_ii: UK-based online retail transactions.
  • olist: Brazilian e-commerce dataset.
  • m5: Walmart time-series forecasting.
  • store_sales: Corporación Favorita (Ecuador) store sales.
  • rossmann: Rossmann store sales benchmarks.
  • instacart: Online grocery basket analysis.
  • online_retail_uci: Classical transactions dataset (UCI).
  • credit_approval_openml: Financial benchmarking (OpenML).

See retaildata list for the full registry.

License

This package is licensed under the MIT License. Individual datasets may have their own licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retaildata-0.1.3.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retaildata-0.1.3-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file retaildata-0.1.3.tar.gz.

File metadata

  • Download URL: retaildata-0.1.3.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for retaildata-0.1.3.tar.gz
Algorithm Hash digest
SHA256 62387e51893b04a71daff8b653ac1f7fb4a465ced4029f510898ef2ab118e21a
MD5 feb164fd3f56ae02de6d2c4d531c00ae
BLAKE2b-256 7d1feea76d4fe2d39769c62bad6addc0d9df8d83fc60909aeb1e36a0823ccb2c

See more details on using hashes here.

File details

Details for the file retaildata-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: retaildata-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for retaildata-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 16b0c12eecb813d37c2f45bd527a7c7597f4e81432497c6913e9e99ef4a42fb3
MD5 580a5803a5ad1f386fde18060370ff7a
BLAKE2b-256 6c27b718cc725eb480f45b2c50e53ae2ebaaa8fe9a3f69682bdbd3081f97fd68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page