Skip to main content

Local-first AI investigation CLI for OpenMetadata data pipelines.

Project description

OpenBlame

OpenBlame is a local-first CLI investigation agent for data pipelines running on OpenMetadata. Point it to a table, and it will trace lineage, inspect recent quality failures, parse schema-change events, surface governance gaps, and collect owner metadata to build a concrete incident narrative.

The reasoning layer runs on a local Ollama model, so investigations stay inside your environment. This makes OpenBlame useful for secure internal datasets and fast incident response workflows where you want reproducible, metadata-driven triage without external LLM APIs. The project behaves like "git blame for your data pipeline," but with lineage, observability, and governance context in one investigation loop.

Why It Stands Out

  • Autonomous investigation loop: plan, gather metadata, reason, and draft an incident report
  • Native OpenMetadata story: lineage, quality, schema history, owners, tags, domain, and tier
  • Governance-aware triage: missing owner, missing tier, missing tags, and missing description are surfaced as explicit operational risks
  • Local-first by default: Ollama only, no external LLM API required
  • Demo-ready outputs: Rich terminal UX, markdown incident report, and MCP server wrapper

Architecture

                         +--------------------+
                         |    openblame CLI   |
                         | Typer + Rich UX    |
                         +---------+----------+
                                   |
                                   v
                         +--------------------+
                         |  OpenBlame Agent   |
                         |  ReAct-style loop  |
                         +----+----------+----+
                              |          |
                +-------------+          +------------------+
                v                                         v
   +---------------------------+              +---------------------------+
   | OpenMetadata REST tools   |              | Local Ollama Reasoner     |
   | lineage/quality/diff/owner|              | plan() + reason()         |
   +-------------+-------------+              +-------------+-------------+
                 |                                            |
                 v                                            v
        +---------------------+                     +---------------------+
        | Structured evidence |-------------------->| Markdown report      |
        +---------------------+                     +---------------------+

Prerequisites

  • Python 3.11+
  • OpenMetadata instance reachable from your machine
  • OpenMetadata JWT token with read access
  • Ollama installed locally and running (ollama serve)
  • An installed local model such as llama3

Installation

pip install openblame

For local development:

pip install -e ".[dev]"

Run tests with plugin autoload disabled so unrelated third-party pytest plugins from openmetadata-ingestion do not interfere:

PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python -m pytest tests/ -v -p pytest_asyncio.plugin

Quick Start

  1. Copy .env.example to .env and set credentials.
  2. Run:
openblame investigate default.public.orders --depth 3 --days 7

Example output:

[CRITICAL] default.public.orders
Root Cause: `order_total` type changed from DECIMAL to STRING without downstream migration.
Impact: 6 downstream tables plus BI dashboard refresh failures.
Owner: Data Platform (data-platform@company.com)
Suggested Fix: Restore compatible type, backfill, rerun failed checks.

Demo Flow

The strongest demo is a single broken metric traced end to end:

  1. Pick a table with a recent schema drift or quality failure.
  2. Run openblame investigate <table_fqn>.
  3. Show the agent plan, anomaly panels, governance risk briefing, and final incident report.
  4. Highlight the downstream blast radius and owner handoff.
  5. End by showing the MCP server or generated GitHub issue payload.

CLI Commands

Investigate

openblame investigate <table_fqn> --depth 3 --days 7 --output report.md --model llama3

Runs the full investigation loop, prints a Rich report, optionally writes markdown, and can suggest a GitHub issue payload.

Schema Diff

openblame diff default.public.orders --days 7

Prints table schema changes over the lookback window.

Lineage

openblame lineage default.public.orders --depth 3 --direction both

Renders upstream and downstream lineage as a Rich tree.

MCP Server

openblame mcp-server

Starts OpenBlame as an MCP stdio server exposing investigation tools.

Configuration

OpenBlame reads .env and environment variables:

OPENMETADATA_HOST=http://localhost:8585
OPENMETADATA_JWT_TOKEN=<token>
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3

MCP Server Setup

The server exposes:

  • investigate_table({ table_fqn, depth, days })
  • get_lineage({ table_fqn, depth, direction })
  • get_schema_diff({ table_fqn, days })

Use stdio transport with:

openblame mcp-server

How It Works

  1. Fetch baseline metadata such as owners and schema snapshot.
  2. Ask Ollama to produce an investigation plan.
  3. Execute OpenMetadata tools in parallel across lineage, quality, schema history, and ownership.
  4. Convert raw metadata into evidence, anomalies, governance risks, and downstream blast radius summaries.
  5. Send gathered evidence back to Ollama for incident reasoning.
  6. Render and optionally persist a markdown report.

Tool failures are non-fatal. OpenBlame continues with partial data whenever possible.

Publishing

This repo is set up for GitHub Actions CI and can be wired for PyPI Trusted Publishing. Once the PyPI package-name issue is resolved, publishing can be automated from GitHub releases.

Hackathon Context

Built for the WeMakeDevs x OpenMetadata hackathon as an AI-powered metadata investigator focused on local-first reasoning, blast-radius analysis, and governance-aware incident response workflows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openblame-0.1.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openblame-0.1.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file openblame-0.1.0.tar.gz.

File metadata

  • Download URL: openblame-0.1.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openblame-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dfb3213b806f3fea78ebe80ee317a7f6be39d40ef8c1edf14d133a29cb13a161
MD5 b65f39a6edaa8645c9e71043540cf5f3
BLAKE2b-256 fb71d7381c91d2f300651cb4bc723c789783a9270a998a6cb733096b7158556c

See more details on using hashes here.

Provenance

The following attestation bundles were made for openblame-0.1.0.tar.gz:

Publisher: publish.yml on manasdutta04/openblame

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openblame-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: openblame-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openblame-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14d292f40ead65a4347fae1f646ab1ee1f2d0e1858d6ecc0c602fbcf946dfae3
MD5 f6078868253b892964bd855fe93a00dc
BLAKE2b-256 a9776b716faaa95c17ea10a6bcd9f6ce07f82138301e584a041066b665e596f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for openblame-0.1.0-py3-none-any.whl:

Publisher: publish.yml on manasdutta04/openblame

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page