Generate analytics-ready datasets from DBML models

Project description

model2data

model2data turns data models into analytics-ready datasets in seconds.

Given a DBML file, it generates synthetic but realistic data, a complete dbt project scaffold, and everything you need to start analyzing or testing data pipelines.

What problem does it solve?

Building analytics or testing dbt pipelines often requires realistic data, but using real data raises privacy concerns, and creating mock data manually is time-consuming. model2data automates this by generating synthetic datasets from your data model definitions, ensuring privacy-safe, deterministic, and relationship-preserving data for development and testing.

How it works (high level)

Parse DBML: Reads your database schema from a DBML file, extracting tables, columns, types, and relationships.
Generate Data: Uses Faker and custom logic to create realistic synthetic data, respecting foreign keys and constraints.
Scaffold dbt Project: Creates a dbt project with seeds (CSV files), staging models, profiles, and tests, ready to run with DuckDB.

Installation

pip install model2data

Quick start

We provide an example Hacker News dataset in examples/hackernews.dbml.

Generate a project with synthetic data:

model2data generate --file examples/hackernews.dbml --rows 200 --seed 42

This creates a dbt_hackernews/ folder with your data and dbt setup.

Run dbt to load and transform the data:

cd dbt_hackernews
dbt deps
dbt seed
dbt run

Your analytics-ready dataset is now in DuckDB!

Generated dbt project structure

The generated dbt project includes:

dbt_{project_name}/
├── seeds/
│   └── {project_name}/
│       ├── table1.csv
│       └── table2.csv
├── models/
│   └── {project_name}/
│       └── staging/
│           ├── __sources.yml
│           ├── stg_table1.sql
│           ├── stg_table1.yml
│           └── ...
├── macros/
│   └── generate_schema_name.sql
├── dbt_project.yml
├── profiles.yml  # DuckDB config
└── {project_name}.duckdb

Seeds: CSV files with generated synthetic data.
Staging Models: Basic dbt models that load from seeds.
Sources & Tests: YAML configs defining sources and basic tests (not_null, unique).
Profiles: Pre-configured for DuckDB with schema handling.

Design decisions / non-goals

DuckDB Default: Chosen for its zero-config, file-based nature, making it easy to get started without database setup. Other adapters can be configured manually.
dbt Integration: Leverages dbt's transformation capabilities for a familiar workflow in analytics engineering.
Synthetic Data: Uses deterministic generation for reproducibility; not intended for production use or as a replacement for real data.
Non-goals: This is not a data migration tool, ETL pipeline, or real-time data generator. It focuses on static, synthetic datasets for testing and prototyping.

Limitations

Supports basic DBML features; complex constraints or advanced SQL types may not be fully handled.
Synthetic data generation is heuristic-based and may not perfectly mimic real-world distributions or edge cases.
Currently optimized for DuckDB; other databases require manual profile adjustments.
No support for incremental models or advanced dbt features in generated projects.

Roadmap

Support for additional database adapters (e.g., Snowflake, BigQuery).
Enhanced data type handling and custom generators.
Integration with more dbt features like incremental models.
Web-based DBML editor and data preview.

Contributing

We welcome contributions!

Open issues for bugs or feature requests.
Submit PRs to add new DBML examples, custom data generators, or improvements.
Ensure all new features include tests if possible.

See CONTRIBUTING.md for detailed guidelines.

Code of Conduct

Please read our Code of Conduct to understand our community standards.

License

MIT License. See LICENSE for details.

Project details

Release history Release notifications | RSS feed

0.2.3

Jan 7, 2026

This version

0.2.2

Jan 7, 2026

0.2.1

Jan 5, 2026

0.1.1

Dec 19, 2025

0.1.0

Dec 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

model2data-0.2.2.tar.gz (26.8 kB view details)

Uploaded Jan 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

model2data-0.2.2-py3-none-any.whl (15.3 kB view details)

Uploaded Jan 7, 2026 Python 3

File details

Details for the file model2data-0.2.2.tar.gz.

File metadata

Download URL: model2data-0.2.2.tar.gz
Upload date: Jan 7, 2026
Size: 26.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for model2data-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`0f18c1557991d1520d569509977e9fec7ceb354d62b298f927d3448b4e9a7700`
MD5	`e4aaa5797c481865938faee7f6fab841`
BLAKE2b-256	`58122e4abca864cce02ee93f4eff9db37a8c3179c20642eb80e47f01ff72671e`

See more details on using hashes here.

File details

Details for the file model2data-0.2.2-py3-none-any.whl.

File metadata

Download URL: model2data-0.2.2-py3-none-any.whl
Upload date: Jan 7, 2026
Size: 15.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for model2data-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a6de7fe7ef86118be39d38adb2765d8c99b0cf045607dc6e65cf0660665f3de`
MD5	`59e5a3236a9dcbe4076967448ab8f7a0`
BLAKE2b-256	`0a35e9356c3c6856584e0712b5923d1bf905a1e71a55570ccdc78b2145651f28`

See more details on using hashes here.

model2data 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

model2data

What problem does it solve?

How it works (high level)

Installation

Quick start

Generated dbt project structure

Design decisions / non-goals

Limitations

Roadmap

Contributing

Code of Conduct

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes