Generate realistic enterprise payroll test data with chaos patterns
Project description
Synthetic Payroll Lab
Version: 0.1.0
License: MIT
Status: MVP Development
Overview
Generate realistic, enterprise-messy payroll and timekeeping test data with configurable "chaos" knobs to stress-test data pipelines.
Why This Module?
Existing synthetic data tools (Faker, Mockaroo) don't understand payroll domain semantics or simulate common enterprise chaos patterns like:
- Late arriving facts
- Schema drift
- Foreign key orphans
- Timezone errors
- Duplicate records
Features (v0.1.0 MVP)
- ✅ Generate 6 core payroll domains (employees, jobs, schedules, timecards, payroll runs, cost centers)
- ✅ CSV/JSON output with Hive-style partitioning
- ✅ Configurable chaos patterns (duplicates, nulls, late arrivals, schema drift)
- ✅ Deterministic mode (seed for reproducibility)
- ✅ CLI + Python API
Quick Start
# Install
pip install synthetic-payroll-lab
# Generate test data
synthetic-payroll generate \
--config payroll_config.yaml \
--output-dir ./landing \
--start-date 2024-01-01 \
--end-date 2024-12-31 \
--employees 50000
Python API
from synthetic_payroll_lab import PayrollGenerator, ChaosConfig
gen = PayrollGenerator(
employees=50000,
start_date="2024-01-01",
chaos=ChaosConfig(
duplicate_rate=0.02, # 2% duplicate rows
null_spike_rate=0.01, # 1% random null injection
late_arrival_pct=0.15, # 15% timecards arrive T+2 days
schema_drift_days=90, # Column added every 90 days
timezone_error_rate=0.03 # 3% wrong timezone
)
)
gen.generate_all_domains(output_path="./landing", format="csv")
Configuration
See config_reference.md for full YAML schema.
Roadmap
- v0.1.0 (MVP): Core domains + basic chaos
- v0.2.0: SCD2 dimension changes, retro adjustments
- v1.0.0: Multi-region support, PII variants, Parquet output
Contributing
Issues and PRs welcome! See CONTRIBUTING.md.
License
MIT - see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthetic_payroll_lab-0.1.0.tar.gz.
File metadata
- Download URL: synthetic_payroll_lab-0.1.0.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
312dc306f00239c4c5e8f99262c5e3da5a34431a72db2c36b2d4d80b12577c63
|
|
| MD5 |
7232b2f88bdbc237435d1902260835c8
|
|
| BLAKE2b-256 |
d4fd449fa6b690f51fffebc21c37e1559841dbd35072451c319bed2034867598
|
File details
Details for the file synthetic_payroll_lab-0.1.0-py3-none-any.whl.
File metadata
- Download URL: synthetic_payroll_lab-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac2ec5067049895dcd0e8fda5283f380a32408f09e2c013df6a303dda598009c
|
|
| MD5 |
efe81c042e2855389d6c670d25ce9576
|
|
| BLAKE2b-256 |
2329ae7c53ca074ef563b55cf810e12f4c41894db47bf19a4e5a41b21978f35f
|