plotsim

Generate multi-table relational datasets with behavioral trajectories, correlations, and causal lags. Config-driven, deterministic, no real data required

These details have not been verified by PyPI

Project links

Project description

plotsim — datasets that tell a story

Generate multi-table relational datasets where every metric tells the same story. Config-driven. No real data required.

pip install plotsim

plotsim generates multi-table relational datasets from a behavioral description. You define metrics, segments, and how entities behave over time — the engine produces a star schema where every value traces back to one trajectory. shape: every entity follows a behavioral trajectory, and every metric across every table reads from the same trajectory position. When engagement rises, revenue follows. When it declines, churn fires.

Quick start

from plotsim import create, generate_tables

cfg = create(
    about="Subscription customers",
    unit="customer",
    window=("2024-01", "2024-12", "monthly"),
    metrics=[
        {"name": "engagement", "type": "score", "polarity": "positive"},
        {"name": "payments",   "type": "count", "polarity": "positive"},
    ],
    segments=[
        {"name": "active",   "count": 50, "archetype": "growth"},
        {"name": "inactive", "count": 30, "archetype": "decline"},
    ],
)
tables = generate_tables(cfg)
for name, df in tables.items():
    print(f"{name}: {len(df)} rows")
# dim_date: 12 rows
# dim_customer: 80 rows
# fct_customer: 960 rows

What you get

A complete dataset of CSV (or Parquet) files, ready to load into a warehouse, notebook, or BI tool. Same config plus same seed produces byte-identical output every time — see the output guide for details.

Who is this for

Educators and students who need realistic datasets for SQL courses, data modeling workshops, analytics training, or portfolio projects — five domain templates ready to go, same seed produces the same data every time.

Data engineers who need test fixtures that behave like production data — with FK integrity, realistic distributions, and configurable corruption — without copying production or hand-rolling three-row CSVs.

Data scientists who need labeled training data with known ground truth — archetype labels, trajectory positions, and temporal holdout splits — to validate models before touching real data.

Analytics engineers who need a star schema to build dbt models, test transformations, or demonstrate a pipeline end-to-end without waiting for upstream data.

BI and analytics teams who need a populated star schema to build dashboards, test reports, or demo a new tool to stakeholders — dims, facts, events, and SCD versioning out of the box.

Demo builders who need a convincing dataset for a conference talk, a product walkthrough, or a proof of concept — correlated metrics that tell a realistic story, not random noise.

How it works

Every entity in the dataset follows a behavioral trajectory — a curve shape like growth, decline, seasonal, or spike-then-crash. At each time period, the entity's position on that curve determines every metric value across every table. Revenue, engagement, churn risk, and support tickets all read from the same position, so they move together the way real business metrics do.

Metric relationships are enforced through a Gaussian copula — declare engagement opposes churn_risk and the engine delivers the configured correlation coefficient regardless of whether one metric is beta-distributed and the other is Poisson. Causal lags compose: if A → B (lag 2) → C (lag 3), then C reflects A from 5 periods ago.

Output is deterministic. Every random draw flows through a single seeded numpy.Generator. Same config + same seed = byte-identical tables across machines, OS, and Python versions. The manifest records every generation decision — archetype assignments, trajectory positions, correlation adjustments, quality injections — so any cell value can be traced back to its origin.

Config-time validation catches problems before generation starts: circular causal chains, non-positive-definite correlation matrices, broken foreign key references, duplicate metric names, and SQL-unsafe identifiers all surface as parse errors with fix suggestions.

See the docs site for the full pipeline.

Docs

mohossam01.github.io/plotsim — quickstart, user guide, tutorials, API reference, cookbooks.

Contributing

See CONTRIBUTING.md for dev setup, test commands, and how to add templates.

License

Apache-2.0 — see LICENSE and NOTICE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.6.0

May 3, 2026

0.5.0 yanked

Apr 26, 2026

0.4.0 yanked

Apr 24, 2026

0.1.0 yanked

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plotsim-0.6.0.tar.gz (503.7 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

plotsim-0.6.0-py3-none-any.whl (272.8 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file plotsim-0.6.0.tar.gz.

File metadata

Download URL: plotsim-0.6.0.tar.gz
Upload date: May 3, 2026
Size: 503.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for plotsim-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`f9a8171a7e7252116f4246fddeb5b31e479e9f1cad44f26c3509e757789ff945`
MD5	`79c2a7887118d3f3a40900674f2cb5bb`
BLAKE2b-256	`51dc675b8d19eb411b3cf6ecf401accc51e4d02f303a07a4d69721a589452819`

See more details on using hashes here.

File details

Details for the file plotsim-0.6.0-py3-none-any.whl.

File metadata

Download URL: plotsim-0.6.0-py3-none-any.whl
Upload date: May 3, 2026
Size: 272.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for plotsim-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c487803c8d14c1d2970af2369d2a366e25806e575049d2f055e1c8df63b3b759`
MD5	`597094906557c9ba77c5430800c7523f`
BLAKE2b-256	`0519813bfd94bc5a60b0ae31546af2d4af8eb04cb9eec0f58e6d3910ce60bc54`

See more details on using hashes here.

plotsim 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quick start

What you get

Who is this for

How it works

Docs

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes