Generate realistic multi-table datasets from behavioral archetypes. No real data needed.
Project description
plotsim
Generate realistic multi-table datasets from behavioral archetypes.
plotsim generates multi-table CSV datasets where entity behavior follows configurable trajectory curves. Metrics like engagement, revenue, and churn move together because they derive from the same underlying position — not from independent random generators.
No real data required. No network calls. Fully deterministic.
pip install plotsim
Usage
Generate from a template
plotsim template saas -o config.yaml
plotsim run config.yaml -o ./output --validate
Generate from Python
from plotsim import load_config, generate_tables, write_tables
config = load_config("config.yaml")
tables = generate_tables(config)
write_tables(tables, config)
Same config and seed produces identical output every run.
What it generates
A single config produces a complete relational schema:
output/
├── dim_date.csv # date spine
├── dim_company.csv # entity attributes
├── dim_user.csv # sub-entity attributes
├── dim_plan.csv # reference lookup
├── fct_engagement.csv # entity × period metrics
├── fct_revenue.csv # entity × period metrics
├── fct_support_tickets.csv # entity × period metrics
├── evt_login.csv # behavioral events
├── evt_churn.csv # threshold-triggered events
├── config.yaml # config that produced this output
└── validation_report.txt # integrity checks
All foreign keys resolve. Event tables derive from fact values, not from independent random generation. If an entity's engagement declines, its login events decrease and churn events fire — across separate CSV files.
Templates
Five domain configs ship with the package:
| Template | Domain | Entities | Tables |
|---|---|---|---|
saas |
B2B SaaS | accounts with users | 10 |
hr |
HR department | employees in departments | 7 |
ecommerce |
E-commerce | customer segments | 8 |
education |
University | student cohorts | 7 |
healthcare |
Clinic | patient groups | 8 |
plotsim list-templates # see all available
plotsim template hr -o hr.yaml # export one to edit
Custom domains
The config file defines everything: entity types, metrics, behavioral archetypes, table schemas, correlations, and noise levels. Copy any template and modify it, or generate one with any LLM:
"Change this SaaS config to model a food delivery service with restaurants, orders, delivery times, and customer ratings."
Validate before generating:
plotsim validate my_config.yaml
plotsim run my_config.yaml -o ./output
How it works
Each entity is assigned an archetype — a trajectory curve composed of segments like sigmoid, exponential decay, step, plateau, or oscillation.
At each time step, the engine reads the entity's trajectory position (a value between 0 and 1) and derives every metric from it:
- Positive polarity metrics (engagement, revenue) rise when the trajectory rises.
- Negative polarity metrics (churn risk, support tickets) rise when the trajectory falls.
Distributions (lognormal, gamma, poisson, beta, normal, weibull) shape the raw values. Correlated noise is applied via Cholesky decomposition on the configured correlation matrix. Causal lag lets one metric trail another by N periods.
Dimension tables are generated first (dates, entities, reference lookups), then fact tables (trajectory-driven metrics per period), then event tables (derived from completed fact values, never from raw trajectories).
Config overview
A plotsim config has these sections:
- domain — name and entity label
- time_window — start, end, granularity (monthly / weekly / daily)
- seed — integer controlling all randomness
- metrics — name, distribution, polarity, optional causal lag
- archetypes — named trajectory shapes built from curve segments
- entities — instances assigned to archetypes
- tables — dim / fact / event schemas with typed columns
- correlations — optional metric-pair coefficients
- noise — gaussian sigma, outlier rate, missing data rate, temporal jitter
- stages — optional lifecycle sequence with enforceable ordering
Full schema with type annotations: plotsim/config.py
CLI reference
plotsim run <config> Generate dataset from config
-o, --output-dir <path> Output directory (default: from config)
-s, --seed <int> Override seed
-v, --validate Run validation after generation
--strict Fail on validation warnings
-q, --quiet Suppress output
plotsim validate <config> Check config without generating
plotsim info <config> Preview tables, rows, entities
plotsim list-templates Show bundled templates
plotsim template <name> Print template YAML to stdout
-o, --output <path> Write to file instead
Validation
The engine runs these checks after generation:
- FK integrity — every foreign key resolves to a parent row
- PK uniqueness — no duplicate primary keys
- Date spine — no gaps in the date dimension
- Causal coherence — lagged metrics inflect after their drivers
- Null policy — no unexpected nulls outside configured missing rates
- Correlation PSD — correlation matrix is positive semi-definite
plotsim run config.yaml --validate
The validation report is written alongside the CSVs as validation_report.txt.
Limitations
- Output format is CSV only.
Contributing
See CONTRIBUTING.md for dev setup, test commands, and how to add templates or curve types.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file plotsim-0.1.0.tar.gz.
File metadata
- Download URL: plotsim-0.1.0.tar.gz
- Upload date:
- Size: 99.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee74fc67bbf9fb4b505adff1b6e4eaf2a4802af403382205dd25bc2c299d651c
|
|
| MD5 |
b200a922af64a42c39d5d362c2a0f654
|
|
| BLAKE2b-256 |
d955b5da9b55f0a6ab2721b6edb213c33d7f4fd62d7d1ce7c0722a9c189b6e99
|
File details
Details for the file plotsim-0.1.0-py3-none-any.whl.
File metadata
- Download URL: plotsim-0.1.0-py3-none-any.whl
- Upload date:
- Size: 64.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9e81115f336efca9ca14f6ea3869fa797d09eb70b7a384605f04da8abeedfe8
|
|
| MD5 |
c4b6ac39c328225c6b4536866dc194ba
|
|
| BLAKE2b-256 |
b3d0a761e9c35649a5964e5777e3e99ce949294fca653c25abd00322ed92eb21
|