Data from R package completejourney

These details have not been verified by PyPI

Project links

Project description

The Complete Journey (Python)

A Python data package providing access to grocery store shopping transaction data from 84.51°. This is a Python equivalent of the R package completejourney, using the more portable Parquet format for cross-platform compatibility.

Important: This package contains simulated data based on real grocery shopping patterns. It is intended for educational and exploratory data analysis purposes only. This data should not be used for academic research, commercial decision-making, or any purpose requiring authentic consumer behavior data.

Overview

The Complete Journey dataset represents grocery store shopping transactions over one year from a group of 801 households. The data includes detailed purchase information, household demographics, marketing campaigns, and coupon usage - providing a comprehensive view of retail shopping behavior.

Key Statistics:

1,469,307 transaction records
801 households
8 comprehensive datasets
1 year of shopping data

Installation

pip install completejourney_py

Development Installation

# Clone the repository
git clone https://github.com/cunningjames/completejourney_py.git
cd completejourney_py

# Install in development mode
pip install -e .

# Install with development dependencies
pip install -e ".[dev]"

Quick Start

from completejourney_py import get_data

# Load all datasets
data = get_data()

# Access individual datasets
transactions = data["transactions"]
demographics = data["demographics"]
products = data["products"]

print(f"Loaded {len(transactions):,} transaction records")
print(f"Covering {len(demographics):,} households")

📚 Documentation

Comprehensive documentation including analysis notebooks is available at: completejourney-py.readthedocs.io

Cookbook Examples

The documentation includes detailed analysis notebooks:

Dataset Summary Analysis - Overview of all 8 datasets
Top Selling Products - Product performance analysis
Shopping Frequency Analysis - Customer behavior patterns
Coupon Analysis - Promotional effectiveness
Traffic Patterns - Store visit timing and trends
Demographic Product Analysis - Purchase behavior by customer segments
Market Basket Analysis - Product associations and cross-selling

Datasets

Core Transaction Data

transactions - Complete purchase records (1.47M records)
products - Product metadata and categories
demographics - Household demographic information

Marketing & Promotions

campaigns - Marketing campaigns received by households
campaign_descriptions - Campaign metadata and details
promotions - Product placement in mailers and stores
coupons - Coupon metadata (UPC codes, campaigns)
coupon_redemptions - Detailed coupon usage records

Usage Examples

Load Specific Datasets

from completejourney_py import get_data

# Load single dataset
transactions = get_data("transactions")["transactions"]

# Load multiple datasets
sales_data = get_data(["transactions", "products", "demographics"])

Basic Analysis

import pandas as pd
from completejourney_py import get_data

# Load data
data = get_data(["transactions", "demographics", "products"])
transactions = data["transactions"]
demographics = data["demographics"]
products = data["products"]

# Basic transaction analysis
print("Transaction Summary:")
print(f"Total transactions: {len(transactions):,}")
print(f"Total households: {transactions['household_id'].nunique():,}")
print(f"Date range: {transactions['transaction_timestamp'].dt.date.min()} to {transactions['transaction_timestamp'].dt.date.max()}")

# Household spending analysis
household_spending = transactions.groupby('household_id')['sales_value'].sum()
print(f"\nAverage household spending: ${household_spending.mean():.2f}")
print(f"Median household spending: ${household_spending.median():.2f}")

Marketing Analysis

# Analyze campaign effectiveness
campaign_data = get_data(["campaigns", "campaign_descriptions", "transactions"])
campaigns = campaign_data["campaigns"]
descriptions = campaign_data["campaign_descriptions"]
transactions = campaign_data["transactions"]

# Join campaign data
campaign_analysis = campaigns.merge(descriptions, on='campaign')
print("Campaign Types:")
print(campaign_analysis['campaign_type'].value_counts())

Data Dictionary

Key Variables

Dataset	Key Variables	Description
`transactions`	`household_id`, `product_id`, `sales_value`, `quantity`	Purchase records
`demographics`	`household_id`, `age`, `income`, `household_size`	Household characteristics
`products`	`product_id`, `department`, `product_category`, `brand`	Product information
`campaigns`	`household_id`, `campaign_id`	Marketing campaigns
`coupons`	`coupon_upc`, `product_id`, `campaign_id`	Coupon details

Data Relationships

households (demographics) 
    ↓
transactions ← products
    ↓
campaigns → campaign_descriptions
    ↓
coupons → coupon_redemptions

Data Source & Important Notice

⚠️ Simulated Data Notice: This dataset contains simulated grocery shopping data created for educational purposes. While based on realistic shopping patterns, it is not real consumer data.

Appropriate Uses:

✅ Learning data analysis techniques
✅ Teaching retail analytics concepts
✅ Prototyping data science workflows
✅ Educational coursework and tutorials

Not Appropriate For:

❌ Academic research requiring real consumer data
❌ Commercial business decisions
❌ Market research or consumer insights
❌ Publication in academic journals

The original concept and data structure are from 84.51°, with additional insights available at the Complete Journey project page.

Citation for Educational Use:

84.51°. (2015). The Complete Journey: A comprehensive view of household shopping behavior [Dataset concept]. 84.51°. http://www.8451.com/area51/
[Note: This implementation contains simulated data for educational purposes]

Requirements

Python 3.8-3.14
pandas >= 1.0.0
pyarrow >= 1.0.0

Development

Running Tests

# Install test dependencies
pip install -e ".[test]"

# Run tests
pytest

# Run with coverage
pytest --cov=completejourney_py

Code Quality

# Install development dependencies
pip install -e ".[dev]"

# Format code
black completejourney_py/ tests/
isort completejourney_py/ tests/

# Lint code
flake8 completejourney_py/ tests/

# Type checking
mypy completejourney_py/

License

This package is released under the MIT License. The underlying data is provided by 84.51° for research and educational purposes.

Related Projects

completejourney (R) - Original R package
Complete Journey Analysis - Detailed data exploration

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Nov 11, 2025

0.0.3

Aug 23, 2019

0.0.2

Aug 14, 2019

0.0.1

Aug 14, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

completejourney_py-0.1.0.tar.gz (31.5 MB view details)

Uploaded Nov 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

completejourney_py-0.1.0-py3-none-any.whl (31.6 MB view details)

Uploaded Nov 11, 2025 Python 3

File details

Details for the file completejourney_py-0.1.0.tar.gz.

File metadata

Download URL: completejourney_py-0.1.0.tar.gz
Upload date: Nov 11, 2025
Size: 31.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for completejourney_py-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`77fd4189406506fdff6a82c9aa025de36f34f6452b16c71086f15f65c1397b07`
MD5	`c613f663ff247a5f546030e1b7f1c62a`
BLAKE2b-256	`b09cab8a34b8a765e6c902ef05188a0466ef2a43558a7461ab9d5026e60d17af`

See more details on using hashes here.

File details

Details for the file completejourney_py-0.1.0-py3-none-any.whl.

File metadata

Download URL: completejourney_py-0.1.0-py3-none-any.whl
Upload date: Nov 11, 2025
Size: 31.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for completejourney_py-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b8aa74688b37522ef2ef6f0623dd8b394636655c60cfc9adae18f0dff96c80b`
MD5	`68f0239b086115a265acf2739a68ee4b`
BLAKE2b-256	`9ac269282c24aa063bbfdb41b0c7053c2c403fe2d9eb5029e14a64e9cdf2354b`

See more details on using hashes here.

completejourney-py 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The Complete Journey (Python)

Overview

Installation

Development Installation

Quick Start

📚 Documentation

Cookbook Examples

Datasets

Core Transaction Data

Marketing & Promotions

Usage Examples

Load Specific Datasets

Basic Analysis

Marketing Analysis

Data Dictionary

Key Variables

Data Relationships

Data Source & Important Notice

Requirements

Development

Running Tests

Code Quality

License

Related Projects

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes