Skip to main content

DataBridge AI Community Edition - An open-source, MCP-native data reconciliation engine with tools for hierarchy management, data quality, and analytics

Project description

DataBridge AI - Community Edition

Open-source, MCP-native data reconciliation engine for comparing, profiling, and managing data quality.

PyPI version License: MIT Python 3.10+


Overview

DataBridge AI Community Edition is a free, open-source data reconciliation toolkit built on the Model Context Protocol (MCP). It provides essential tools for:

  • Data Comparison - Hash-based row comparison, orphan detection, conflict identification
  • Fuzzy Matching - Find approximate matches between datasets using RapidFuzz
  • Data Profiling - Statistical analysis and quality metrics for your data
  • PDF/OCR Extraction - Extract text from PDFs and images
  • dbt Integration - Generate dbt projects from your data
  • Data Quality - Create and run data validation rules

Installation

# Basic installation
pip install databridge-ai

# With PDF support
pip install databridge-ai[pdf]

# With OCR support
pip install databridge-ai[ocr]

# With all optional dependencies
pip install databridge-ai[all]

Quick Start

As an MCP Server

DataBridge AI works as an MCP server, making its tools available to AI assistants like Claude:

# Run the MCP server
databridge-mcp

Using the Dashboard

# Start the web dashboard
databridge-ui

Then open http://localhost:5050 in your browser.

The dashboard includes a dedicated Hierarchy Builder page for visually creating and managing multi-level hierarchy projects, importing from CSV, and exporting deployment scripts.

Python API

from databridge_ai import load_csv, profile_data, fuzzy_match_columns

# Load and profile a CSV file
result = load_csv("data.csv")
profile = profile_data("data.csv")

# Find fuzzy matches between two files
matches = fuzzy_match_columns(
    source_file="source.csv",
    target_file="target.csv",
    source_column="name",
    target_column="customer_name",
    threshold=80
)

Available Tools (~106 tools)

Category Tools Description
File Discovery 3 find_files, get_working_directory, stage_file
Data Loading 3 load_csv, load_json, query_database
Data Profiling 2 profile_data, profile_book_sources
Hashing & Comparison 3 compare_hashes, compare_table_data, get_data_comparison_summary
Fuzzy Matching 2 fuzzy_match_columns, fuzzy_deduplicate
PDF/OCR 3 extract_text_from_pdf, ocr_image, parse_table_from_text
Workflow 4 analyze_request, save_workflow_step, get_workflow, clear_workflow
Transform 2 transform_column, convert_sql_format
Documentation 1 get_application_documentation
Templates 10 list_financial_templates, get_template_details, get_skill_prompt, etc.
Diff Utilities 6 diff_text, diff_dicts, diff_lists, explain_diff, generate_patch, find_similar_strings
dbt Integration 8 create_dbt_project, generate_dbt_model, generate_dbt_schema, validate_dbt_project, etc.
Data Quality 7 generate_expectation_suite, run_validation, create_data_contract, add_column_expectation, etc.
License 1 get_license_status

Editions

DataBridge AI is available in four editions:

Community (CE) Pro Pro Examples Enterprise
Tools ~106 ~297 Tests & Tutorials 361+
Price Free Licensed Licensed Add-on Custom
Distribution Public PyPI GitHub Packages GitHub Packages Private Deploy
Data Reconciliation
Fuzzy Matching
Data Profiling
PDF/OCR
dbt Basic
Data Quality
UI Dashboard
Diff Utilities
Templates (Basic)
Hierarchy Builder (49 tools)
Wright Pipeline (31 tools)
Cortex AI Agent (26 tools)
Data Catalog (19 tools)
Faux Objects (18 tools)
Connections (16 tools)
AI Orchestrator (16 tools)
Data Observability (15 tools)
Data Versioning (12 tools)
Git/CI-CD (12 tools)
Lineage Tracking (11 tools)
PlannerAgent (11 tools)
GraphRAG Engine (10 tools)
Unified AI Agent (10 tools)
Hierarchy-Graph Bridge (5 tools)
Console Dashboard (5 tools)
Schema Matcher (5 tools)
Data Matcher (4 tools)
47 Tests + 19 Tutorials
Custom Agents
White-label
SLA Support
On-premise Deploy

Upgrade to Pro

# Set your license key
export DATABRIDGE_LICENSE_KEY="DB-PRO-YOURKEY-20260101-signature"

# Install Pro (from GitHub Packages)
pip install databridge-ai-pro --extra-index-url https://ghp_TOKEN@raw.githubusercontent.com/datanexum/DATABRIDGE_AI/main/

# Install Pro Examples (tests & tutorials, requires Pro key)
pip install databridge-ai-examples                # CE tests + beginner tutorials
pip install databridge-ai-examples[pro]           # + Pro tests + advanced tutorials

License Key Format: DB-{TIER}-{CUSTOMER_ID}-{EXPIRY}-{SIGNATURE}

Contact sales@databridge.ai for pricing and license keys.

Configuration

Create a .env file in your project root:

# Database connection (optional)
DATABRIDGE_DATABASE_URL=postgresql://user:pass@localhost/db

# OCR settings (optional)
DATABRIDGE_TESSERACT_PATH=/usr/bin/tesseract

# Fuzzy matching threshold (default: 80)
DATABRIDGE_FUZZY_THRESHOLD=80

# Max rows to display (default: 10)
DATABRIDGE_MAX_ROWS_DISPLAY=10

Plugin System

Extend DataBridge AI with custom plugins:

plugins/
├── my_plugin/
│   ├── __init__.py
│   └── mcp_tools.py  # Must have register_tools(mcp)
# plugins/my_plugin/mcp_tools.py
def register_tools(mcp):
    @mcp.tool()
    def my_custom_tool(param: str) -> str:
        """My custom tool description."""
        return f"Processed: {param}"

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databridge_ai-0.49.2.tar.gz (348.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databridge_ai-0.49.2-py3-none-any.whl (356.5 kB view details)

Uploaded Python 3

File details

Details for the file databridge_ai-0.49.2.tar.gz.

File metadata

  • Download URL: databridge_ai-0.49.2.tar.gz
  • Upload date:
  • Size: 348.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for databridge_ai-0.49.2.tar.gz
Algorithm Hash digest
SHA256 1aada17983e838c33ace0d40e9ba72018abd9bc9eb4d8c9a1dde820e234263c2
MD5 ea1d1dac962cd31c4c25f699824076ea
BLAKE2b-256 aa760c16dedcc4d887f3daa7f79687fe28f79a9d62b7f8205a29858c3752023b

See more details on using hashes here.

File details

Details for the file databridge_ai-0.49.2-py3-none-any.whl.

File metadata

  • Download URL: databridge_ai-0.49.2-py3-none-any.whl
  • Upload date:
  • Size: 356.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for databridge_ai-0.49.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a73cecc4f4288d8695d1bb13874d297a509ab552ba498045c35e1de3c713eef9
MD5 0ef0044ee4e0e073fe6a265c182be582
BLAKE2b-256 534a87f51570a826f76a39ec3314b87464ab0c8f86a71af45db4512cc9a9c1ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page