Skip to main content

Code-first, traceable BI and analytics framework for Python

Project description

TraceBi

A code-first, traceable BI and analytics framework for Python.

Define your data model, transformations, and reports entirely in code. Every dataset, report, and pipeline run is traceable back to the exact connector, query, and transform steps that produced it.


Why TraceBi?

Feature Dash / Streamlit dbt Qlik / Tableau TraceBi
Code-defined reports
Relational data model
Full data lineage per report partial
Excel / HTML output
Medallion architecture
Scheduled pipelines
Live dashboard

What's built

  • Phase 1 — Connectors (CSV, SQL, BigQuery, Snowflake, Memory), DataModel, DataSet with immutable lineage chain
  • Phase 2 — Report engine (Excel + HTML renderers, lineage manifest per render)
  • Phase 2.5 — Medallion architecture (Bronze/Silver/Gold), StarSchema, LineageDiagram
  • Phase 3 — Live Dash dashboard with associative filters
  • Phase 4 — Pipeline runner with APScheduler, DB persistence, cross-layer lineage
  • Phase 5 — Web UI (FastAPI + Jinja2, Dash embedded, medallion-aware registry)

Installation

# Core
pip install pandas

# Reports (Excel + charts)
pip install -e ".[reports]"

# Dashboard
pip install -e ".[dashboard]"

# Pipelines (scheduling + DB write-back)
pip install -e ".[pipeline]"

# Lineage diagrams
pip install -e ".[lineage]"

# Everything
pip install -e ".[reports,dashboard,pipeline,lineage,sql]"

Quick Start

1. Connect to data

from tracebi import DataModel, SQLConnector, MemoryConnector

# SQLite / Postgres / MySQL / BigQuery / Snowflake
db = SQLConnector("sales_db", url="sqlite:///data/sales.db")

model = DataModel("SalesModel")
model.add_connector(db)
model.add_table("orders",    connector="sales_db", source="orders")
model.add_table("customers", connector="sales_db", source="customers")
model.add_relationship("orders_customers", "orders", "customers",
                        left_key="customer_id", how="left")

2. Load and transform (full lineage at every step)

orders = (
    model.load("orders")
    .filter("status == 'shipped'", description="Shipped orders only")
    .transform(
        lambda df: df.assign(margin=df["revenue"] - df["cost"]),
        description="margin = revenue - cost",
    )
    .sort("margin", ascending=False)
)

orders.print_lineage()
# Step 1: [LOAD]       Loaded 'orders' from connector 'sales_db'
# Step 2: [FILTER]     Shipped orders only  (250 → 198 rows)
# Step 3: [TRANSFORM]  margin = revenue - cost
# Step 4: [SORT]       Sorted by margin (desc)

3. Build a report

from tracebi.reports.report import Report, TextSection, TableSection, ChartSection
from tracebi.reports.excel_renderer import ExcelRenderer
from tracebi.reports.html_renderer import HTMLRenderer

report = (
    Report("Q2 Sales Report")
    .author("Data Team")
    .parameter("period", "Q2 2024")
    .add(TextSection(title="Summary", content="Summary", style="heading1"))
    .add(TextSection(content="Revenue up 12% vs Q1.", style="normal"))
    .add(ChartSection(title="Revenue Trend", dataset=trend_ds,
                      chart_type="line", x="month", y="revenue"))
    .add(TableSection(title="Top Orders", dataset=orders,
                      columns=["region", "product", "revenue"],
                      totals=["revenue"]))
)

ExcelRenderer().render(report, "output/q2_sales.xlsx")  # + saves manifest.json
HTMLRenderer().render(report, "output/q2_sales.html")
HTMLRenderer().serve(report, port=8080)   # open in browser
HTMLRenderer().preview(report)            # inline in Jupyter

4. Medallion architecture

from tracebi import BronzeLayer, SilverLayer, GoldLayer
from tracebi.model.star_schema import StarSchema

# Bronze — raw ingest, zero transforms
bronze = BronzeLayer(connector=db, source="orders_raw",
                     sink=db, sink_table="orders_bronze")
ds_bronze = bronze.execute()   # loads + writes to DB

# Silver — declarative cleaning pipeline
silver = (
    SilverLayer(source=db, source_table="orders_bronze",
                sink=db, sink_table="orders_silver")
    .cast({"qty": "int64", "order_date": "datetime64[ns]"})
    .drop_nulls(subset=["order_id"])
    .deduplicate(subset=["order_id"])
)
ds_silver = silver.execute()   # loads bronze → cleans → writes silver

# Star schema
schema = StarSchema("Sales", model=model)
schema.add_dimension("dim_customer", table_name="customers",
                     key_col="customer_id", attributes=["region", "segment"])
schema.add_fact("fact_orders", table_name="orders_silver",
                measures=["revenue", "qty"],
                foreign_keys={"dim_customer": "customer_id"})

# Gold — aggregated via StarSchema
gold = GoldLayer(schema=schema, fact="fact_orders",
                 measures={"revenue": "sum", "qty": "sum"},
                 dimensions=["dim_customer.region"],
                 sink=db, sink_table="revenue_by_region_gold")
ds_gold = gold.execute()   # queries → aggregates → writes gold

5. Schedule pipelines

from tracebi.pipeline.runner import PipelineRunner

runner = PipelineRunner(db_url="sqlite:///data/tracebi.db")

# Each layer has its own independent schedule
runner.register(bronze, name="orders_bronze",   schedule="0 * * * *")
runner.register(silver, name="orders_silver",   schedule="15 * * * *",
                depends_on="orders_bronze")
runner.register(gold,   name="revenue_by_region", schedule="30 6 * * *",
                depends_on="orders_silver")

# On-demand: run one layer
runner.run("orders_silver")

# On-demand: full refresh (bronze → silver → gold)
runner.run("revenue_by_region", refresh=True)

# View run history with cross-layer lineage
runner.lineage("revenue_by_region")

# Start the scheduler (blocking)
runner.start()

Every run is recorded in tracebi_runs with rows_in, rows_out, status, and an upstream_run_id linking back to the previous layer's run.

6. Live dashboard

from tracebi.dashboard import Dashboard, DashboardServer
from tracebi.dashboard import FilterPanel, MetricPanel, ChartPanel, TablePanel

dashboard = (
    Dashboard("Q2 Sales Dashboard")
    .columns(2)
    .add_filter(FilterPanel("region-filter", label="Region",
                            column="region", table_name="orders"))
    .add_panel(MetricPanel("total-revenue", title="Total Revenue",
                           table_name="orders", column="revenue",
                           aggregation="sum", prefix="$"))
    .add_panel(ChartPanel("by-region", title="Revenue by Region",
                          table_name="orders", chart_type="bar",
                          x="region", y="revenue"))
    .add_panel(TablePanel("orders-table", title="Orders",
                          table_name="orders",
                          columns=["order_id", "region", "revenue"]))
)

DashboardServer(dashboard, model=model).run(port=8050)
# Open http://localhost:8050/

Filters are associative — selecting a region automatically filters every panel that shares that column.

7. Lineage diagrams

from tracebi.lineage.diagram import LineageDiagram

diag = LineageDiagram(ds_gold)   # or LineageDiagram(report)
diag.show()                       # matplotlib / Jupyter inline
diag.to_html("lineage.html")      # standalone HTML with embedded SVG
print(diag.to_mermaid())          # paste into GitHub markdown

Web UI

A browser interface over your TraceBi registry — connectors, models, reports, pipelines, and live dashboards all in one place.

# Install web dependencies
pip install -r web/requirements.txt

# Start the server (hot-reload on by default)
python web/run.py
# Open http://localhost:8000

The web UI auto-detects data/tracebi.db. If the Silver pipeline layer has been run, reports and dashboards read from real medallion data. Otherwise it falls back to in-memory demo data.

To point the UI at your own data module instead of the built-in demo:

TRACEBI_APP=mypackage.tracebi_config python web/run.py

Your module just needs to import registry and call registry.add_connector(), registry.add_model(), @registry.report(...), and optionally registry.add_pipeline() / registry.add_dashboard().

8. Adding reports and dashboards to the web UI

from web.api.registry import registry
from tracebi.reports import Report, TableSection
from tracebi.dashboard import Dashboard, DashboardServer, ChartPanel

# Register a report
@registry.report("my_report", description="My custom report")
def my_report():
    ds = model.load("orders")
    return Report("My Report").add(TableSection(title="Orders", dataset=ds))

# Register a dashboard
dashboard = Dashboard("My Dashboard").add_panel(
    ChartPanel("rev", title="Revenue", dataset=ds, chart_type="bar", x="region", y="revenue")
)
registry.add_dashboard("my_dashboard", DashboardServer(dashboard, model=model))

Local database setup (example)

# Create data/tracebi.db, seed source tables, run initial Bronze load
python seeds/seed_db.py

# Run Silver
python -c "from seeds.seed_db import runner; runner.run('orders_silver')"

# Full Gold refresh
python -c "from seeds.seed_db import runner; runner.run('revenue_by_region', refresh=True)"

# Start scheduler
python -c "from seeds.seed_db import runner; runner.start()"

Running the examples

python examples/phase1_example.py    # connectors + DataModel + lineage
python examples/phase2_example.py    # report engine (opens browser)
python examples/phase25_example.py   # medallion + star schema + lineage diagram
python examples/phase3_example.py    # live Dash dashboard
python examples/phase4_example.py    # full pipeline (run seeds/seed_db.py first)

Running tests

pytest tests/
# 163 passed

Project structure

tracebi/
├── tracebi/
│   ├── connectors/       CSV, SQL, BigQuery, Snowflake, Memory
│   ├── model/            DataSet, DataModel, StarSchema
│   ├── etl/              BronzeLayer, SilverLayer, GoldLayer
│   ├── reports/          Report, ExcelRenderer, HTMLRenderer
│   ├── dashboard/        Dashboard, DashboardServer, panels
│   ├── pipeline/         PipelineRunner (APScheduler + DB)
│   └── lineage/          LineageDiagram
├── web/
│   ├── api/              FastAPI app, routers, registry
│   ├── templates/        Jinja2 HTML templates
│   ├── demo_app.py       Built-in demo (medallion + in-memory fallback)
│   ├── run.py            Dev server entrypoint
│   └── requirements.txt  Web-only dependencies
├── examples/             Runnable demos (phase1–4)
├── tests/                163 tests across all phases
├── seeds/                seed_db.py — one-command DB setup
├── requests/             _template.py — scaffold for ad hoc report scripts
├── data/                 SQLite DB lives here (gitignored)
└── NOTES.md              Design decisions and architecture reference

Ad hoc reports

Copy requests/_template.py, rename it, fill in the four sections (connect → build datasets → build report → render), and commit it to git. The script is the permanent, auditable record of how the numbers were produced.

requests/
├── _template.py
├── 2024_06_open_orders_by_region.py
└── 2024_07_customer_churn_analysis.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tracebi-0.5.1.tar.gz (112.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tracebi-0.5.1-py3-none-any.whl (63.5 kB view details)

Uploaded Python 3

File details

Details for the file tracebi-0.5.1.tar.gz.

File metadata

  • Download URL: tracebi-0.5.1.tar.gz
  • Upload date:
  • Size: 112.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for tracebi-0.5.1.tar.gz
Algorithm Hash digest
SHA256 0d25565e868ee7787d2dd86930b382b47e3009f29b5c39814d70d6b63af3caf3
MD5 be4a57313b5e47a5644a76c38f373d67
BLAKE2b-256 c3a29a1e08e2d1d09c0804c6ca347cde2931ddf763dd2a1908943daa346fbcbe

See more details on using hashes here.

File details

Details for the file tracebi-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: tracebi-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 63.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for tracebi-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8097f829959264ffa31aee07a1ea388bf843ea750d66e3ac0d08ca616120b11c
MD5 947e48868ec10044ee79bb1d164eac8e
BLAKE2b-256 e63a1fe50c18b11fff87a9de555205b6e73d7b8c82119eff7330c8ff36d66f5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page