Skip to main content

Generate realistic multi-table datasets from behavioral archetypes. No real data needed.

Project description

plotsim

Generate realistic multi-table datasets from behavioral archetypes.

Python 3.10+ License: Apache 2.0 Status: Alpha

plotsim generates multi-table CSV datasets where entity behavior follows configurable trajectory curves. Metrics like engagement, revenue, and churn move together because they derive from the same underlying position — not from independent random generators.

No real data required. No network calls. Fully deterministic.

pip install plotsim

Usage

Generate from a template

plotsim template saas -o config.yaml
plotsim run config.yaml -o ./output --validate

Generate from Python

from plotsim import load_config, generate_tables, write_tables

config = load_config("config.yaml")
tables = generate_tables(config)
write_tables(tables, config)

Same config and seed produces identical output every run.

What it generates

A single config produces a complete relational schema:

output/
├── dim_date.csv                # date spine
├── dim_company.csv             # entity attributes
├── dim_user.csv                # sub-entity attributes
├── dim_plan.csv                # reference lookup
├── fct_engagement.csv          # entity × period metrics
├── fct_revenue.csv             # entity × period metrics
├── fct_support_tickets.csv     # entity × period metrics
├── evt_login.csv               # behavioral events
├── evt_churn.csv               # threshold-triggered events
├── config.yaml                 # config that produced this output
└── validation_report.txt       # integrity checks

All foreign keys resolve. Event tables derive from fact values, not from independent random generation. If an entity's engagement declines, its login events decrease and churn events fire — across separate CSV files.

Templates

Five domain configs ship with the package:

Template Domain Entities Tables
saas B2B SaaS accounts with users 10
hr HR department employees in departments 7
ecommerce E-commerce customer segments 8
education University student cohorts 7
healthcare Clinic patient groups 8
plotsim list-templates          # see all available
plotsim template hr -o hr.yaml  # export one to edit

Custom domains

The config file defines everything: entity types, metrics, behavioral archetypes, table schemas, correlations, and noise levels. Copy any template and modify it, or generate one with any LLM:

"Change this SaaS config to model a food delivery service with restaurants, orders, delivery times, and customer ratings."

Validate before generating:

plotsim validate my_config.yaml
plotsim run my_config.yaml -o ./output

How it works

Each entity is assigned an archetype — a trajectory curve composed of segments like sigmoid, exponential decay, step, plateau, or oscillation.

At each time step, the engine reads the entity's trajectory position (a value between 0 and 1) and derives every metric from it:

  • Positive polarity metrics (engagement, revenue) rise when the trajectory rises.
  • Negative polarity metrics (churn risk, support tickets) rise when the trajectory falls.

Distributions (lognormal, gamma, poisson, beta, normal, weibull) shape the raw values. Correlated noise is applied via Cholesky decomposition on the configured correlation matrix. Causal lag lets one metric trail another by N periods.

Dimension tables are generated first (dates, entities, reference lookups), then fact tables (trajectory-driven metrics per period), then event tables (derived from completed fact values, never from raw trajectories).

Config overview

A plotsim config has these sections:

  • domain — name and entity label
  • time_window — start, end, granularity (monthly / weekly / daily)
  • seed — integer controlling all randomness
  • metrics — name, distribution, polarity, optional causal lag
  • archetypes — named trajectory shapes built from curve segments
  • entities — instances assigned to archetypes
  • tables — dim / fact / event schemas with typed columns
  • correlations — optional metric-pair coefficients
  • noise — gaussian sigma, outlier rate, missing data rate, temporal jitter
  • stages — optional lifecycle sequence with enforceable ordering

Full schema with type annotations: plotsim/config.py

CLI reference

plotsim run <config>              Generate dataset from config
  -o, --output-dir <path>         Output directory (default: from config)
  -s, --seed <int>                Override seed
  -v, --validate                  Run validation after generation
  --strict                        Fail on validation warnings
  -q, --quiet                     Suppress output

plotsim validate <config>         Check config without generating
plotsim info <config>             Preview tables, rows, entities
plotsim list-templates            Show bundled templates
plotsim template <name>           Print template YAML to stdout
  -o, --output <path>             Write to file instead

Validation

The engine runs these checks after generation:

  • FK integrity — every foreign key resolves to a parent row
  • PK uniqueness — no duplicate primary keys
  • Date spine — no gaps in the date dimension
  • Causal coherence — lagged metrics inflect after their drivers
  • Null policy — no unexpected nulls outside configured missing rates
  • Correlation PSD — correlation matrix is positive semi-definite
plotsim run config.yaml --validate

The validation report is written alongside the CSVs as validation_report.txt.

Limitations

  • Output format is CSV only.

Contributing

See CONTRIBUTING.md for dev setup, test commands, and how to add templates or curve types.

License

Apache-2.0 — see LICENSE and NOTICE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plotsim-0.1.0.tar.gz (99.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plotsim-0.1.0-py3-none-any.whl (64.5 kB view details)

Uploaded Python 3

File details

Details for the file plotsim-0.1.0.tar.gz.

File metadata

  • Download URL: plotsim-0.1.0.tar.gz
  • Upload date:
  • Size: 99.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for plotsim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ee74fc67bbf9fb4b505adff1b6e4eaf2a4802af403382205dd25bc2c299d651c
MD5 b200a922af64a42c39d5d362c2a0f654
BLAKE2b-256 d955b5da9b55f0a6ab2721b6edb213c33d7f4fd62d7d1ce7c0722a9c189b6e99

See more details on using hashes here.

File details

Details for the file plotsim-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: plotsim-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 64.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for plotsim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e9e81115f336efca9ca14f6ea3869fa797d09eb70b7a384605f04da8abeedfe8
MD5 c4b6ac39c328225c6b4536866dc194ba
BLAKE2b-256 b3d0a761e9c35649a5964e5777e3e99ce949294fca653c25abd00322ed92eb21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page