Skip to main content

DataBridge AI - An open-source, MCP-native data reconciliation engine with tools for hierarchy management, data quality, and analytics

Project description

DataBridge AI

PyPI version Python 3.10+ License: MIT

DataBridge AI is a headless, MCP-native data reconciliation engine with 287 tools for hierarchy management, data quality, and analytics.

Features

Module Tools Description
Data Reconciliation 38 Compare and validate data from CSV, SQL, PDF, JSON sources
Hierarchy Builder 44 Create and manage multi-level hierarchy projects (up to 15 levels)
Cortex AI 25 Snowflake Cortex AI with natural language to SQL
Wright Module 18 Hierarchy-driven data mart generation with 4-object pipeline
Data Catalog 15 Centralized metadata registry with business glossary
Data Versioning 12 Semantic versioning, snapshots, rollback, and diff
Lineage Tracking 11 Column-level lineage and impact analysis
Git/CI-CD 12 Automated workflows and GitHub integration
dbt Integration 8 Generate dbt projects from hierarchies
Data Quality 7 Expectation suites and data contracts
Templates & Skills 16 Pre-built templates and AI expertise definitions
AI Orchestrator 27 Multi-agent coordination with task queues
And more... 54 Diff utilities, recommendations, console, etc.

Installation

# Basic installation
pip install databridge-ai

# With PDF support
pip install databridge-ai[pdf]

# With Snowflake support
pip install databridge-ai[snowflake]

# Full installation
pip install databridge-ai[all]

Quick Start

As MCP Server (Claude Desktop)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "DataBridge_AI": {
      "command": "python",
      "args": ["-m", "src.server"]
    }
  }
}

Web UI Dashboard

cd databridge-ce/ui
python server.py
# Open http://127.0.0.1:5050

Programmatic Usage

from src.server import mcp

# Run as MCP server
mcp.run()

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Claude / LLM Client                       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      MCP Protocol                            │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   DataBridge MCP Server                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  Hierarchy  │  │   Cortex    │  │   Wright    │         │
│  │   Builder   │  │   Agent     │  │   Module    │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │    Data     │  │   Lineage   │  │    Data     │         │
│  │   Catalog   │  │   Tracker   │  │   Quality   │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│              Snowflake / CSV / SQL / PDF                     │
└─────────────────────────────────────────────────────────────┘

Core Modules

Hierarchy Builder

Create and manage multi-level hierarchy projects for financial reporting, organizational structures, and data classification.

# Create a hierarchy project
create_hierarchy_project(name="Revenue P&L", description="Revenue hierarchy")

# Add hierarchies with source mappings
create_hierarchy(project_id="...", name="Product Revenue", parent_id="...")
add_source_mapping(hierarchy_id="...", source_column="ACCOUNT_CODE", source_uid="41%")

# Export and deploy
export_hierarchy_csv(project_id="...")
generate_hierarchy_scripts(project_id="...")

Cortex AI Integration

Natural language to SQL and AI-powered data operations using Snowflake Cortex.

# Configure Cortex
configure_cortex_agent(connection_id="snowflake-prod", cortex_model="mistral-large")

# Natural language query
analyst_ask(question="What was total revenue by region last quarter?",
            semantic_model_file="@ANALYTICS.PUBLIC.MODELS/sales.yaml")

# AI reasoning loop
cortex_reason(goal="Analyze data quality in PRODUCTS table")

Wright Module (Data Mart Factory)

Generate data marts using the 4-object pipeline pattern: VW_1 → DT_2 → DT_3A → DT_3

# Create mart configuration
create_mart_config(project_name="upstream_gross", report_type="GROSS",
                   hierarchy_table="TBL_0_GROSS_LOS_REPORT_HIERARCHY")

# Generate pipeline
generate_mart_pipeline(config_name="upstream_gross")

Data Versioning

Track changes to hierarchies, catalog assets, and semantic models with semantic versioning.

# Create version snapshot
version_create(object_type="hierarchy", object_id="revenue-pl",
               description="Added new cost centers", bump="minor")

# Compare versions
version_diff(object_type="hierarchy", object_id="revenue-pl",
             from_version="1.0.0", to_version="1.1.0")

# Rollback
version_rollback(object_type="hierarchy", object_id="revenue-pl", to_version="1.0.0")

Configuration

Create a .env file:

# Data directory
DATA_DIR=./data

# NestJS backend (optional)
NESTJS_BACKEND_URL=http://localhost:8001
NESTJS_API_KEY=your-api-key

# Snowflake (optional)
SNOWFLAKE_ACCOUNT=your-account
SNOWFLAKE_USER=your-user
SNOWFLAKE_PASSWORD=your-password

# Cortex AI
CORTEX_DEFAULT_MODEL=mistral-large

Documentation

  • CLAUDE.md - Complete tool reference and usage guide
  • docs/MANIFEST.md - Auto-generated tool manifest
  • Wiki - Architecture, getting started, and tutorials

Community Edition

The databridge-ce/ folder contains the open-source Community Edition with:

  • Plugin architecture for custom tools
  • Web UI dashboard
  • Starter templates

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs.

License

MIT License - See LICENSE for details.

Copyright (c) 2024-2026 DataBridge AI Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databridge_ai-0.33.0.tar.gz (414.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databridge_ai-0.33.0-py3-none-any.whl (404.7 kB view details)

Uploaded Python 3

File details

Details for the file databridge_ai-0.33.0.tar.gz.

File metadata

  • Download URL: databridge_ai-0.33.0.tar.gz
  • Upload date:
  • Size: 414.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databridge_ai-0.33.0.tar.gz
Algorithm Hash digest
SHA256 c6507ac27f4a324ecb368fe61cd88d3c3c453e21efc35fc63b96f1e246d63ead
MD5 23d7c1a68a7b6b52a666e15ee48d8f9b
BLAKE2b-256 cdffdbc0f04042cee1c7fc252d672eb6e4c8b8db6e0dc63cbe9a549bce9349e0

See more details on using hashes here.

File details

Details for the file databridge_ai-0.33.0-py3-none-any.whl.

File metadata

  • Download URL: databridge_ai-0.33.0-py3-none-any.whl
  • Upload date:
  • Size: 404.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databridge_ai-0.33.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fed0425d82be651efa21d1997db0ea4840b4ba784ed522bccae6897d03e0393b
MD5 fe50b7c329e8b887c10af45b961151b1
BLAKE2b-256 943599c8cd7d983fc2d94d7ebddff4b39cd63c79d30256cb7aad6cce854c82f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page