Skip to main content

DataBridge AI - An open-source, MCP-native data reconciliation engine with tools for hierarchy management, data quality, and analytics

Project description

DataBridge AI

PyPI version Python 3.10+ License: MIT

DataBridge AI is a headless, MCP-native data reconciliation engine with 292 tools for hierarchy management, data quality, and analytics.

Features

Module Tools Description
Data Reconciliation 38 Compare and validate data from CSV, SQL, PDF, JSON sources
Hierarchy Builder 44 Create and manage multi-level hierarchy projects (up to 15 levels)
Cortex AI 25 Snowflake Cortex AI with natural language to SQL
Wright Module 18 Hierarchy-driven data mart generation with 4-object pipeline
Data Catalog 15 Centralized metadata registry with business glossary
Data Versioning 12 Semantic versioning, snapshots, rollback, and diff
Lineage Tracking 11 Column-level lineage and impact analysis
Git/CI-CD 12 Automated workflows and GitHub integration
dbt Integration 8 Generate dbt projects from hierarchies
Data Quality 7 Expectation suites and data contracts
Templates & Skills 16 Pre-built templates and AI expertise definitions
AI Orchestrator 27 Multi-agent coordination with task queues
And more... 54 Diff utilities, recommendations, console, etc.

Installation

# Basic installation
pip install databridge-ai

# With PDF support
pip install databridge-ai[pdf]

# With Snowflake support
pip install databridge-ai[snowflake]

# Full installation
pip install databridge-ai[all]

Quick Start

As MCP Server (Claude Desktop)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "DataBridge_AI": {
      "command": "python",
      "args": ["-m", "src.server"]
    }
  }
}

Web UI Dashboard

cd databridge-ce/ui
python server.py
# Open http://127.0.0.1:5050

Programmatic Usage

from src.server import mcp

# Run as MCP server
mcp.run()

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Claude / LLM Client                       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      MCP Protocol                            │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   DataBridge MCP Server                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  Hierarchy  │  │   Cortex    │  │   Wright    │         │
│  │   Builder   │  │   Agent     │  │   Module    │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │    Data     │  │   Lineage   │  │    Data     │         │
│  │   Catalog   │  │   Tracker   │  │   Quality   │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│              Snowflake / CSV / SQL / PDF                     │
└─────────────────────────────────────────────────────────────┘

Core Modules

Hierarchy Builder

Create and manage multi-level hierarchy projects for financial reporting, organizational structures, and data classification.

# Create a hierarchy project
create_hierarchy_project(name="Revenue P&L", description="Revenue hierarchy")

# Add hierarchies with source mappings
create_hierarchy(project_id="...", name="Product Revenue", parent_id="...")
add_source_mapping(hierarchy_id="...", source_column="ACCOUNT_CODE", source_uid="41%")

# Export and deploy
export_hierarchy_csv(project_id="...")
generate_hierarchy_scripts(project_id="...")

Cortex AI Integration

Natural language to SQL and AI-powered data operations using Snowflake Cortex.

# Configure Cortex
configure_cortex_agent(connection_id="snowflake-prod", cortex_model="mistral-large")

# Natural language query
analyst_ask(question="What was total revenue by region last quarter?",
            semantic_model_file="@ANALYTICS.PUBLIC.MODELS/sales.yaml")

# AI reasoning loop
cortex_reason(goal="Analyze data quality in PRODUCTS table")

Wright Module (Data Mart Factory)

Generate data marts using the 4-object pipeline pattern: VW_1 → DT_2 → DT_3A → DT_3

# Create mart configuration
create_mart_config(project_name="upstream_gross", report_type="GROSS",
                   hierarchy_table="TBL_0_GROSS_LOS_REPORT_HIERARCHY")

# Generate pipeline
generate_mart_pipeline(config_name="upstream_gross")

Data Versioning

Track changes to hierarchies, catalog assets, and semantic models with semantic versioning.

# Create version snapshot
version_create(object_type="hierarchy", object_id="revenue-pl",
               description="Added new cost centers", bump="minor")

# Compare versions
version_diff(object_type="hierarchy", object_id="revenue-pl",
             from_version="1.0.0", to_version="1.1.0")

# Rollback
version_rollback(object_type="hierarchy", object_id="revenue-pl", to_version="1.0.0")

Configuration

Create a .env file:

# Data directory
DATA_DIR=./data

# NestJS backend (optional)
NESTJS_BACKEND_URL=http://localhost:8001
NESTJS_API_KEY=your-api-key

# Snowflake (optional)
SNOWFLAKE_ACCOUNT=your-account
SNOWFLAKE_USER=your-user
SNOWFLAKE_PASSWORD=your-password

# Cortex AI
CORTEX_DEFAULT_MODEL=mistral-large

Documentation

  • CLAUDE.md - Complete tool reference and usage guide
  • docs/MANIFEST.md - Auto-generated tool manifest
  • Wiki - Architecture, getting started, and tutorials

Community Edition

The databridge-ce/ folder contains the open-source Community Edition with:

  • Plugin architecture for custom tools
  • Web UI dashboard
  • Starter templates

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs.

License

MIT License - See LICENSE for details.

Copyright (c) 2024-2026 DataBridge AI Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databridge_ai-0.34.0.tar.gz (414.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databridge_ai-0.34.0-py3-none-any.whl (404.7 kB view details)

Uploaded Python 3

File details

Details for the file databridge_ai-0.34.0.tar.gz.

File metadata

  • Download URL: databridge_ai-0.34.0.tar.gz
  • Upload date:
  • Size: 414.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databridge_ai-0.34.0.tar.gz
Algorithm Hash digest
SHA256 916c3c54589a98bd05993abc9f2d8a2db1b2658914a5a904ce16dbe8cb9adacc
MD5 a4c66f422623870c025a523ef80eae17
BLAKE2b-256 379edce49a36062410c86dfeb8cb55297154fe48782e572dcc1988715c7fcc26

See more details on using hashes here.

File details

Details for the file databridge_ai-0.34.0-py3-none-any.whl.

File metadata

  • Download URL: databridge_ai-0.34.0-py3-none-any.whl
  • Upload date:
  • Size: 404.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databridge_ai-0.34.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5092b5ae29f37bfa319f477a994d2dc75a444e8823d695b32339d6feec584963
MD5 b117b69060414ee90076cd9eede9a144
BLAKE2b-256 d91e6e9ad497869f4587b588c8a947dbb971fa3f178ca38766f36799f6435d47

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page