Skip to main content

DataBridge AI Community Edition - An open-source, MCP-native data reconciliation engine with tools for hierarchy management, data quality, and analytics

Project description

DataBridge AI - Community Edition

Open-source, MCP-native data reconciliation engine for comparing, profiling, and managing data quality.

PyPI version License: MIT Python 3.10+


Overview

DataBridge AI Community Edition is a free, open-source data reconciliation toolkit built on the Model Context Protocol (MCP). It provides essential tools for:

  • Data Comparison - Hash-based row comparison, orphan detection, conflict identification
  • Fuzzy Matching - Find approximate matches between datasets using RapidFuzz
  • Data Profiling - Statistical analysis and quality metrics for your data
  • PDF/OCR Extraction - Extract text from PDFs and images
  • dbt Integration - Generate dbt models from your data
  • Data Quality - Create and run data validation rules

Installation

# Basic installation
pip install databridge-ai

# With PDF support
pip install databridge-ai[pdf]

# With OCR support
pip install databridge-ai[ocr]

# With all optional dependencies
pip install databridge-ai[all]

Quick Start

As an MCP Server

DataBridge AI works as an MCP server, making its tools available to AI assistants like Claude:

# Run the MCP server
databridge-mcp

Using the Dashboard

# Start the web dashboard
databridge-ui

Then open http://localhost:5050 in your browser.

Python API

from databridge_ai import load_csv, profile_data, fuzzy_match_columns

# Load and profile a CSV file
result = load_csv("data.csv")
profile = profile_data("data.csv")

# Find fuzzy matches between two files
matches = fuzzy_match_columns(
    source_file="source.csv",
    target_file="target.csv",
    source_column="name",
    target_column="customer_name",
    threshold=80
)

Available Tools (Community Edition)

Category Tools Description
File Discovery find_files, get_working_directory Search for files across common directories
Data Loading load_csv, load_json, query_database Load data from various sources
Data Profiling profile_data Generate comprehensive data statistics
Comparison compare_hashes Hash-based row comparison with orphan/conflict detection
Fuzzy Matching fuzzy_match_columns Find approximate matches using RapidFuzz
PDF/OCR extract_text_from_pdf Extract text from PDF files
Diff Utilities diff_text Compare text strings
License get_license_status Check license tier and available features

Upgrade to Pro

DataBridge AI Pro unlocks advanced features:

Feature Community Pro
Data Reconciliation
Fuzzy Matching
Data Profiling
PDF/OCR
dbt Basic
Cortex AI Agent
Wright Pipeline
GraphRAG Engine
Data Observability
Full Data Catalog
Column Lineage
AI Orchestrator
# Install Pro (requires license)
pip install databridge-ai-pro --extra-index-url https://pypi.yourcompany.com/simple/

# Set your license key
export DATABRIDGE_LICENSE_KEY="DB-PRO-YOURKEY-20260101-signature"

Visit databridge.ai/pro for pricing and features.

Configuration

Create a .env file in your project root:

# Database connection (optional)
DATABRIDGE_DATABASE_URL=postgresql://user:pass@localhost/db

# OCR settings (optional)
DATABRIDGE_TESSERACT_PATH=/usr/bin/tesseract

# Fuzzy matching threshold (default: 80)
DATABRIDGE_FUZZY_THRESHOLD=80

# Max rows to display (default: 10)
DATABRIDGE_MAX_ROWS_DISPLAY=10

Plugin System

Extend DataBridge AI with custom plugins:

plugins/
├── my_plugin/
│   ├── __init__.py
│   └── mcp_tools.py  # Must have register_tools(mcp)
# plugins/my_plugin/mcp_tools.py
def register_tools(mcp):
    @mcp.tool()
    def my_custom_tool(param: str) -> str:
        """My custom tool description."""
        return f"Processed: {param}"

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databridge_ai-0.39.0.tar.gz (35.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databridge_ai-0.39.0-py3-none-any.whl (37.3 kB view details)

Uploaded Python 3

File details

Details for the file databridge_ai-0.39.0.tar.gz.

File metadata

  • Download URL: databridge_ai-0.39.0.tar.gz
  • Upload date:
  • Size: 35.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for databridge_ai-0.39.0.tar.gz
Algorithm Hash digest
SHA256 2f71fc9c040fac2c5d5c4cd8a871b4f65b8862098b79ca08ba0ad36444c0dbe8
MD5 a3ec968709db0e38eecdf551298c6a56
BLAKE2b-256 192d1bbf918cd600ccdbe79247d6385051f1cfd1cf0cdbd7c023720942bd0697

See more details on using hashes here.

File details

Details for the file databridge_ai-0.39.0-py3-none-any.whl.

File metadata

  • Download URL: databridge_ai-0.39.0-py3-none-any.whl
  • Upload date:
  • Size: 37.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for databridge_ai-0.39.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f7323d534ece2378de9f5e345f5c30080120b7a2fed2d15039e8674fd85dcb01
MD5 2aae438ef9e8cbcb5f326818761a5532
BLAKE2b-256 aca63c4352a78f13c18ca48c160227ffa9a461a7bd5f500706a3d49fb5e13736

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page