DataBridge AI Community Edition - An open-source, MCP-native data reconciliation engine with tools for hierarchy management, data quality, and analytics
Project description
DataBridge AI - Community Edition
Open-source, MCP-native data reconciliation engine for comparing, profiling, and managing data quality.
Overview
DataBridge AI Community Edition is a free, open-source data reconciliation toolkit built on the Model Context Protocol (MCP). It provides essential tools for:
- Data Comparison - Hash-based row comparison, orphan detection, conflict identification
- Fuzzy Matching - Find approximate matches between datasets using RapidFuzz
- Data Profiling - Statistical analysis and quality metrics for your data
- PDF/OCR Extraction - Extract text from PDFs and images
- dbt Integration - Generate dbt models from your data
- Data Quality - Create and run data validation rules
Installation
# Basic installation
pip install databridge-ai
# With PDF support
pip install databridge-ai[pdf]
# With OCR support
pip install databridge-ai[ocr]
# With all optional dependencies
pip install databridge-ai[all]
Quick Start
As an MCP Server
DataBridge AI works as an MCP server, making its tools available to AI assistants like Claude:
# Run the MCP server
databridge-mcp
Using the Dashboard
# Start the web dashboard
databridge-ui
Then open http://localhost:5050 in your browser.
Python API
from databridge_ai import load_csv, profile_data, fuzzy_match_columns
# Load and profile a CSV file
result = load_csv("data.csv")
profile = profile_data("data.csv")
# Find fuzzy matches between two files
matches = fuzzy_match_columns(
source_file="source.csv",
target_file="target.csv",
source_column="name",
target_column="customer_name",
threshold=80
)
Available Tools (Community Edition)
| Category | Tools | Description |
|---|---|---|
| File Discovery | find_files, get_working_directory |
Search for files across common directories |
| Data Loading | load_csv, load_json, query_database |
Load data from various sources |
| Data Profiling | profile_data |
Generate comprehensive data statistics |
| Comparison | compare_hashes |
Hash-based row comparison with orphan/conflict detection |
| Fuzzy Matching | fuzzy_match_columns |
Find approximate matches using RapidFuzz |
| PDF/OCR | extract_text_from_pdf |
Extract text from PDF files |
| Diff Utilities | diff_text |
Compare text strings |
| License | get_license_status |
Check license tier and available features |
Upgrade to Pro
DataBridge AI Pro unlocks advanced features:
| Feature | Community | Pro |
|---|---|---|
| Data Reconciliation | ✅ | ✅ |
| Fuzzy Matching | ✅ | ✅ |
| Data Profiling | ✅ | ✅ |
| PDF/OCR | ✅ | ✅ |
| dbt Basic | ✅ | ✅ |
| Cortex AI Agent | ❌ | ✅ |
| Wright Pipeline | ❌ | ✅ |
| GraphRAG Engine | ❌ | ✅ |
| Data Observability | ❌ | ✅ |
| Full Data Catalog | ❌ | ✅ |
| Column Lineage | ❌ | ✅ |
| AI Orchestrator | ❌ | ✅ |
# Install Pro (requires license)
pip install databridge-ai-pro --extra-index-url https://pypi.yourcompany.com/simple/
# Set your license key
export DATABRIDGE_LICENSE_KEY="DB-PRO-YOURKEY-20260101-signature"
Visit databridge.ai/pro for pricing and features.
Configuration
Create a .env file in your project root:
# Database connection (optional)
DATABRIDGE_DATABASE_URL=postgresql://user:pass@localhost/db
# OCR settings (optional)
DATABRIDGE_TESSERACT_PATH=/usr/bin/tesseract
# Fuzzy matching threshold (default: 80)
DATABRIDGE_FUZZY_THRESHOLD=80
# Max rows to display (default: 10)
DATABRIDGE_MAX_ROWS_DISPLAY=10
Plugin System
Extend DataBridge AI with custom plugins:
plugins/
├── my_plugin/
│ ├── __init__.py
│ └── mcp_tools.py # Must have register_tools(mcp)
# plugins/my_plugin/mcp_tools.py
def register_tools(mcp):
@mcp.tool()
def my_custom_tool(param: str) -> str:
"""My custom tool description."""
return f"Processed: {param}"
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Links
- Documentation: github.com/tghanchidnx/databridge-ai/wiki
- Issues: github.com/tghanchidnx/databridge-ai/issues
- Pro Features: databridge.ai/pro
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databridge_ai-0.39.0.tar.gz.
File metadata
- Download URL: databridge_ai-0.39.0.tar.gz
- Upload date:
- Size: 35.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f71fc9c040fac2c5d5c4cd8a871b4f65b8862098b79ca08ba0ad36444c0dbe8
|
|
| MD5 |
a3ec968709db0e38eecdf551298c6a56
|
|
| BLAKE2b-256 |
192d1bbf918cd600ccdbe79247d6385051f1cfd1cf0cdbd7c023720942bd0697
|
File details
Details for the file databridge_ai-0.39.0-py3-none-any.whl.
File metadata
- Download URL: databridge_ai-0.39.0-py3-none-any.whl
- Upload date:
- Size: 37.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7323d534ece2378de9f5e345f5c30080120b7a2fed2d15039e8674fd85dcb01
|
|
| MD5 |
2aae438ef9e8cbcb5f326818761a5532
|
|
| BLAKE2b-256 |
aca63c4352a78f13c18ca48c160227ffa9a461a7bd5f500706a3d49fb5e13736
|