Skip to main content

Generate realistic enterprise payroll test data with chaos patterns

Project description

Synthetic Payroll Lab

Version: 0.1.0
License: MIT
Status: MVP Development

Overview

Generate realistic, enterprise-messy payroll and timekeeping test data with configurable "chaos" knobs to stress-test data pipelines.

Why This Module?

Existing synthetic data tools (Faker, Mockaroo) don't understand payroll domain semantics or simulate common enterprise chaos patterns like:

  • Late arriving facts
  • Schema drift
  • Foreign key orphans
  • Timezone errors
  • Duplicate records

Features (v0.1.0 MVP)

  • ✅ Generate 6 core payroll domains (employees, jobs, schedules, timecards, payroll runs, cost centers)
  • ✅ CSV/JSON output with Hive-style partitioning
  • ✅ Configurable chaos patterns (duplicates, nulls, late arrivals, schema drift)
  • ✅ Deterministic mode (seed for reproducibility)
  • ✅ CLI + Python API

Quick Start

# Install
pip install synthetic-payroll-lab

# Generate test data
synthetic-payroll generate \
    --config payroll_config.yaml \
    --output-dir ./landing \
    --start-date 2024-01-01 \
    --end-date 2024-12-31 \
    --employees 50000

Python API

from synthetic_payroll_lab import PayrollGenerator, ChaosConfig

gen = PayrollGenerator(
    employees=50000,
    start_date="2024-01-01",
    chaos=ChaosConfig(
        duplicate_rate=0.02,      # 2% duplicate rows
        null_spike_rate=0.01,     # 1% random null injection
        late_arrival_pct=0.15,    # 15% timecards arrive T+2 days
        schema_drift_days=90,     # Column added every 90 days
        timezone_error_rate=0.03  # 3% wrong timezone
    )
)

gen.generate_all_domains(output_path="./landing", format="csv")

Configuration

See config_reference.md for full YAML schema.

Roadmap

  • v0.1.0 (MVP): Core domains + basic chaos
  • v0.2.0: SCD2 dimension changes, retro adjustments
  • v1.0.0: Multi-region support, PII variants, Parquet output

Contributing

Issues and PRs welcome! See CONTRIBUTING.md.

License

MIT - see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_payroll_lab-0.1.0.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_payroll_lab-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_payroll_lab-0.1.0.tar.gz.

File metadata

  • Download URL: synthetic_payroll_lab-0.1.0.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for synthetic_payroll_lab-0.1.0.tar.gz
Algorithm Hash digest
SHA256 312dc306f00239c4c5e8f99262c5e3da5a34431a72db2c36b2d4d80b12577c63
MD5 7232b2f88bdbc237435d1902260835c8
BLAKE2b-256 d4fd449fa6b690f51fffebc21c37e1559841dbd35072451c319bed2034867598

See more details on using hashes here.

File details

Details for the file synthetic_payroll_lab-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for synthetic_payroll_lab-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ac2ec5067049895dcd0e8fda5283f380a32408f09e2c013df6a303dda598009c
MD5 efe81c042e2855389d6c670d25ce9576
BLAKE2b-256 2329ae7c53ca074ef563b55cf810e12f4c41894db47bf19a4e5a41b21978f35f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page