A declarative data engineering framework - Explicit over implicit, Stories over magic
Project description
Odibi
Declarative data pipelines. YAML in, star schemas out.
Note: Personal open-source project. See IP_NOTICE.md for details.
Odibi is a framework for building data pipelines. You describe what you want in YAML; Odibi handles how. Every run generates a "Data Story" — an audit report showing exactly what happened to your data.
🤖 AI/LLM Users: For comprehensive context, see docs/ODIBI_DEEP_CONTEXT.md — 2,200+ lines covering all patterns, transformers, validation, connections, and runtime behavior.
🎯 Try Odibi in 5 Minutes (No Install Needed)
Click the badge above → run 3 cells → see your first simulation. No Python install, no cloning, no setup.
The notebook walks you through:
pip install odibi(runs in the cloud)- Define a simulation in YAML (sensors, sales data, or industrial equipment)
- Run the pipeline → see the output → chart it with Altair
When you're ready for more: 38 simulation configs covering buildings, compressors, reactors, cooling towers, wastewater, production lines, and sales pipelines.
⚡ Quick Start (Local)
pip install odibi
Option 1: Simulate data from YAML
Create sim.yaml:
project: my_first_sim
engine: pandas
connections:
output:
type: local
base_path: ./data
story:
connection: output
path: stories/
system:
connection: output
pipelines:
- pipeline: demo
nodes:
- name: sensors
read:
connection: null
format: simulation
options:
simulation:
scope:
start_time: "2026-01-01T00:00:00Z"
timestep: "5m"
row_count: 100
seed: 42
entities:
count: 3
id_prefix: "sensor_"
columns:
- name: sensor_id
data_type: string
generator: {type: constant, value: "{entity_id}"}
- name: timestamp
data_type: timestamp
generator: {type: timestamp}
- name: temperature
data_type: float
generator:
type: random_walk
start: 22.0
min: 16.0
max: 30.0
volatility: 0.3
mean_reversion: 0.15
write:
connection: output
format: parquet
path: bronze/sensors.parquet
mode: overwrite
Run it:
python -c "from odibi.pipeline import PipelineManager; PipelineManager.from_yaml('sim.yaml').run()"
Output: data/bronze/sensors.parquet — 300 rows of realistic sensor data with memory, drift, and mean reversion. No database needed.
Option 2: Build a star schema from CSV
odibi init my_project --template star-schema
cd my_project
odibi run odibi.yaml
odibi story last # View the audit report
Option 3: Clone the reference example
git clone https://github.com/henryodibi11/Odibi.git
cd Odibi/docs/examples/canonical/runnable
odibi run 04_fact_table.yaml
This builds a complete star schema in seconds:
- 3 dimension tables (customer, product, date)
- 1 fact table with FK lookups and orphan handling
- HTML audit report
📖 The Canonical Example
pipelines:
- pipeline: build_dimensions
nodes:
- name: dim_customer
read:
connection: source
format: csv
path: customers.csv
pattern:
type: dimension
params:
natural_key: customer_id
surrogate_key: customer_sk
scd_type: 1
write:
connection: gold
format: parquet
path: dim_customer
- name: dim_date
pattern:
type: date_dimension
params:
start_date: "2025-01-01"
end_date: "2025-12-31"
write:
connection: gold
format: parquet
path: dim_date
- pipeline: build_facts
nodes:
- name: fact_sales
depends_on: [dim_customer, dim_date]
read:
connection: source
format: csv
path: orders.csv
pattern:
type: fact
params:
grain: [order_id, line_item_id]
dimensions:
- source_column: customer_id
dimension_table: dim_customer
dimension_key: customer_id
surrogate_key: customer_sk
orphan_handling: unknown
write:
connection: gold
format: parquet
path: fact_sales
🚀 Key Features
| Feature | Description |
|---|---|
| Data Stories | Every run generates an HTML audit report |
| Dimensional Patterns | 6 built-in patterns: SCD1/SCD2, date dimension, fact tables, merge, aggregation |
| 56 Transformers | Comprehensive library for data manipulation and quality |
| Validation & Contracts | Fail-fast checks, quarantine bad rows |
| Multi-Engine | Pandas, Polars, and Spark — same config across all engines |
| Production Ready | Retry, alerting, secrets, Delta Lake support |
| Battle-Tested | 5500+ tests ensure reliability and correctness |
📚 Documentation
| Goal | Link |
|---|---|
| Get running in 10 minutes | Golden Path |
| Copy THE working example | THE_REFERENCE.md |
| Solve a specific problem | Playbook |
| Understand when to use what | Decision Guide |
| See all config options | YAML Schema |
📦 Installation
# Standard (Pandas engine)
pip install odibi
# With Polars engine
pip install "odibi[polars]"
# With Spark + Azure support
pip install "odibi[spark,azure]"
# All engines and features
pip install "odibi[all]"
🎯 Who is this for?
- Solo data engineers building pipelines without a team
- Analytics engineers moving from dbt to Python-based pipelines
- Anyone tired of writing the same boilerplate for every project
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md.
Maintainer: Henry Odibi (@henryodibi11)
License: Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file odibi-3.9.0.tar.gz.
File metadata
- Download URL: odibi-3.9.0.tar.gz
- Upload date:
- Size: 910.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb52934a45abe92a90471e0c8f3f234441fb344e327893445b245eed7f8462d6
|
|
| MD5 |
4341610a83c3a50275fc36627261b0f1
|
|
| BLAKE2b-256 |
e8cea018f11ef80ed78540ff3c97fafdf18b28cd910f59b35196313dc591dd08
|
Provenance
The following attestation bundles were made for odibi-3.9.0.tar.gz:
Publisher:
publish.yml on henryodibi11/Odibi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
odibi-3.9.0.tar.gz -
Subject digest:
eb52934a45abe92a90471e0c8f3f234441fb344e327893445b245eed7f8462d6 - Sigstore transparency entry: 1342289089
- Sigstore integration time:
-
Permalink:
henryodibi11/Odibi@13c791207e3f893152273d08579b35f288ab85ba -
Branch / Tag:
refs/tags/v3.9.0 - Owner: https://github.com/henryodibi11
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@13c791207e3f893152273d08579b35f288ab85ba -
Trigger Event:
release
-
Statement type:
File details
Details for the file odibi-3.9.0-py3-none-any.whl.
File metadata
- Download URL: odibi-3.9.0-py3-none-any.whl
- Upload date:
- Size: 914.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
469deeca511228565dd89ac0ee899b7175563e550632c336305e0d46ab7c25a7
|
|
| MD5 |
838578c6b9f8618c64ed2473f2506cf9
|
|
| BLAKE2b-256 |
e7a03b9c6ddd5a49dcadeeb894f48048dc7c3a65fe3271eb5fb7d4c67f5b4ced
|
Provenance
The following attestation bundles were made for odibi-3.9.0-py3-none-any.whl:
Publisher:
publish.yml on henryodibi11/Odibi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
odibi-3.9.0-py3-none-any.whl -
Subject digest:
469deeca511228565dd89ac0ee899b7175563e550632c336305e0d46ab7c25a7 - Sigstore transparency entry: 1342289121
- Sigstore integration time:
-
Permalink:
henryodibi11/Odibi@13c791207e3f893152273d08579b35f288ab85ba -
Branch / Tag:
refs/tags/v3.9.0 - Owner: https://github.com/henryodibi11
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@13c791207e3f893152273d08579b35f288ab85ba -
Trigger Event:
release
-
Statement type: