Skip to main content

A declarative data engineering framework - Explicit over implicit, Stories over magic

Project description

Odibi

Declarative data pipelines. YAML in, star schemas out.

Note: Personal open-source project. See IP_NOTICE.md for details.

CI PyPI Python 3.9+ License Docs

Odibi is a framework for building data pipelines. You describe what you want in YAML; Odibi handles how. Every run generates a "Data Story" — an audit report showing exactly what happened to your data.

🤖 AI/LLM Users: For comprehensive context, see docs/ODIBI_DEEP_CONTEXT.md — 2,200+ lines covering all patterns, transformers, validation, connections, and runtime behavior.


⚡ Quick Start

pip install odibi

Option 1: Start from a template

odibi init my_project --template star-schema
cd my_project
odibi run odibi.yaml
odibi story last          # View the audit report

Option 2: Clone the reference example

git clone https://github.com/henryodibi11/Odibi.git
cd Odibi/docs/examples/canonical/runnable
odibi run 04_fact_table.yaml

This builds a complete star schema in seconds:

  • 3 dimension tables (customer, product, date)
  • 1 fact table with FK lookups and orphan handling
  • HTML audit report

See the full breakdown →


📖 The Canonical Example

pipelines:
  - pipeline: build_dimensions
    nodes:
      - name: dim_customer
        read:
          connection: source
          format: csv
          path: customers.csv
        pattern:
          type: dimension
          params:
            natural_key: customer_id
            surrogate_key: customer_sk
            scd_type: 1
        write:
          connection: gold
          format: parquet
          path: dim_customer

      - name: dim_date
        pattern:
          type: date_dimension
          params:
            start_date: "2025-01-01"
            end_date: "2025-12-31"
        write:
          connection: gold
          format: parquet
          path: dim_date

  - pipeline: build_facts
    nodes:
      - name: fact_sales
        depends_on: [dim_customer, dim_date]
        read:
          connection: source
          format: csv
          path: orders.csv
        pattern:
          type: fact
          params:
            grain: [order_id, line_item_id]
            dimensions:
              - source_column: customer_id
                dimension_table: dim_customer
                dimension_key: customer_id
                surrogate_key: customer_sk
            orphan_handling: unknown
        write:
          connection: gold
          format: parquet
          path: fact_sales

Full runnable example →


🚀 Key Features

Feature Description
Data Stories Every run generates an HTML audit report
Dimensional Patterns 6 built-in patterns: SCD1/SCD2, date dimension, fact tables, merge, aggregation
56 Transformers Comprehensive library for data manipulation and quality
Validation & Contracts Fail-fast checks, quarantine bad rows
Multi-Engine Pandas, Polars, and Spark — same config across all engines
Production Ready Retry, alerting, secrets, Delta Lake support
Battle-Tested 5500+ tests ensure reliability and correctness

📚 Documentation

Goal Link
Get running in 10 minutes Golden Path
Copy THE working example THE_REFERENCE.md
Solve a specific problem Playbook
Understand when to use what Decision Guide
See all config options YAML Schema

📦 Installation

# Standard (Pandas engine)
pip install odibi

# With Polars engine
pip install "odibi[polars]"

# With Spark + Azure support
pip install "odibi[spark,azure]"

# All engines and features
pip install "odibi[all]"

🎯 Who is this for?

  • Solo data engineers building pipelines without a team
  • Analytics engineers moving from dbt to Python-based pipelines
  • Anyone tired of writing the same boilerplate for every project

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md.


Maintainer: Henry Odibi (@henryodibi11)
License: Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

odibi-3.4.3.tar.gz (880.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

odibi-3.4.3-py3-none-any.whl (883.5 kB view details)

Uploaded Python 3

File details

Details for the file odibi-3.4.3.tar.gz.

File metadata

  • Download URL: odibi-3.4.3.tar.gz
  • Upload date:
  • Size: 880.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for odibi-3.4.3.tar.gz
Algorithm Hash digest
SHA256 d024bb6f590cc02e48421950e800c802947ad2a39f321ca95e874d25d5fe0b3d
MD5 f47ab264f679c66184aca3fb99e313b9
BLAKE2b-256 2ade01fc897606ab92bf506db1e00ee5afa4eb3032b5a64177066ab57ec2ba59

See more details on using hashes here.

Provenance

The following attestation bundles were made for odibi-3.4.3.tar.gz:

Publisher: publish.yml on henryodibi11/Odibi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file odibi-3.4.3-py3-none-any.whl.

File metadata

  • Download URL: odibi-3.4.3-py3-none-any.whl
  • Upload date:
  • Size: 883.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for odibi-3.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c9a2cd5c93f8edec2b79f35154f1b9f746f1fb53d47e2e9dea0e8e0e4bbb90ce
MD5 e236b8a75d1f6fae2cfecab27ce2468a
BLAKE2b-256 50bca2c11d8408ab4ef10bcfaa14685791e20b4e7d9ae79e9934c1a55aada725

See more details on using hashes here.

Provenance

The following attestation bundles were made for odibi-3.4.3-py3-none-any.whl:

Publisher: publish.yml on henryodibi11/Odibi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page