Skip to main content

Unified download and managing of real-world retail datasets for analysis and benchmarking.

Project description

RetailData

A unified interface for fetching and preparing retail datasets for benchmarking and analysis.

Features

  • Unified API: Fetch datasets from various providers (HTTP, Kaggle, Hugging Face, UCI, OpenML) with a single command.
  • Secure Credentials: Integrated support for Kaggle and Hugging Face API keys.
  • Data Benchmark Pack: Curated retail datasets (Favorita, Rossmann, Instacart, M5, Olist, and more).
  • Processing Pipeline: Automatic conversion to high-performance Parquet optimized for Polars.
  • Cache Management: Programmatic disk usage tracking and clearing.

Installation

pip install retaildata

Or using uv (recommended for development):

uv pip install -e .

Quick Start

CLI

  1. List available datasets:

    retaildata list
    
  2. Download a dataset:

    retaildata get test_http
    
  3. Download with Preparation (Parquet):

    retaildata get online_retail_ii --prepare
    
  4. Manage Credentials (e.g. Kaggle):

    retaildata auth set kaggle --file ~/.kaggle/kaggle.json
    
  5. Clean Up:

    retaildata rm test_http
    retaildata purge --all
    

Python API

import retaildata.api as rd
import polars as pl
from pathlib import Path

# Download and prepare dataset
rd.api.download("online_retail_ii", prepare=True)

# Load efficiently with Polars
df = pl.scan_parquet("~/.local/share/retaildata/prepared/online_retail_ii/*.parquet").collect()
print(df.head())

Supported Datasets

  • online_retail_ii: UK-based online retail transactions.
  • olist: Brazilian e-commerce dataset.
  • m5: Walmart time-series forecasting.
  • store_sales: Corporación Favorita (Ecuador) store sales.
  • rossmann: Rossmann store sales benchmarks.
  • instacart: Online grocery basket analysis.
  • online_retail_uci: Classical transactions dataset (UCI).
  • credit_approval_openml: Financial benchmarking (OpenML).

See retaildata list for the full registry.

License

This package is licensed under the MIT License. Individual datasets may have their own licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retaildata-0.1.5.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retaildata-0.1.5-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file retaildata-0.1.5.tar.gz.

File metadata

  • Download URL: retaildata-0.1.5.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for retaildata-0.1.5.tar.gz
Algorithm Hash digest
SHA256 94c5b719fa92afe9f9d0632f0669fb0edc6946eef3d84515956652a0496915c4
MD5 3a8e77f037d4b4c866e964a1f04026a3
BLAKE2b-256 88d6f6ad0b2993faaee062d080e6e1f58ecbf9a5a5d1b7e70694a58e2284de68

See more details on using hashes here.

File details

Details for the file retaildata-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: retaildata-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for retaildata-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2ce4195018c718e2fe2e80f1aeed8dda1a55b775909af1812ece99deec2bc113
MD5 811c0ed22717f26bc0ce7f7de59af940
BLAKE2b-256 81ff0e786f51eee64161337cfcad9dee213b10efc4313cd0e27e4f265bc70355

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page