DataBridge AI Community Edition - An open-source, MCP-native data reconciliation engine with tools for hierarchy management, data quality, and analytics
Project description
DataBridge AI - Community Edition
Open-source, MCP-native data reconciliation engine for comparing, profiling, and managing data quality.
Overview
DataBridge AI Community Edition is a free, open-source data reconciliation toolkit built on the Model Context Protocol (MCP). It provides essential tools for:
- Data Comparison - Hash-based row comparison, orphan detection, conflict identification
- Fuzzy Matching - Find approximate matches between datasets using RapidFuzz
- Data Profiling - Statistical analysis and quality metrics for your data
- PDF/OCR Extraction - Extract text from PDFs and images
- dbt Integration - Generate dbt projects from your data
- Data Quality - Create and run data validation rules
Installation
# Basic installation
pip install databridge-ai
# With PDF support
pip install databridge-ai[pdf]
# With OCR support
pip install databridge-ai[ocr]
# With all optional dependencies
pip install databridge-ai[all]
Quick Start
As an MCP Server
DataBridge AI works as an MCP server, making its tools available to AI assistants like Claude:
# Run the MCP server
databridge-mcp
Using the Dashboard
# Start the web dashboard
databridge-ui
Then open http://localhost:5050 in your browser.
The dashboard includes a dedicated Hierarchy Builder page for visually creating and managing multi-level hierarchy projects, importing from CSV, and exporting deployment scripts.
Python API
from databridge_ai import load_csv, profile_data, fuzzy_match_columns
# Load and profile a CSV file
result = load_csv("data.csv")
profile = profile_data("data.csv")
# Find fuzzy matches between two files
matches = fuzzy_match_columns(
source_file="source.csv",
target_file="target.csv",
source_column="name",
target_column="customer_name",
threshold=80
)
Available Tools (~106 tools)
| Category | Tools | Description |
|---|---|---|
| File Discovery | 3 | find_files, get_working_directory, stage_file |
| Data Loading | 3 | load_csv, load_json, query_database |
| Data Profiling | 2 | profile_data, profile_book_sources |
| Hashing & Comparison | 3 | compare_hashes, compare_table_data, get_data_comparison_summary |
| Fuzzy Matching | 2 | fuzzy_match_columns, fuzzy_deduplicate |
| PDF/OCR | 3 | extract_text_from_pdf, ocr_image, parse_table_from_text |
| Workflow | 4 | analyze_request, save_workflow_step, get_workflow, clear_workflow |
| Transform | 2 | transform_column, convert_sql_format |
| Documentation | 1 | get_application_documentation |
| Templates | 10 | list_financial_templates, get_template_details, get_skill_prompt, etc. |
| Diff Utilities | 6 | diff_text, diff_dicts, diff_lists, explain_diff, generate_patch, find_similar_strings |
| dbt Integration | 8 | create_dbt_project, generate_dbt_model, generate_dbt_schema, validate_dbt_project, etc. |
| Data Quality | 7 | generate_expectation_suite, run_validation, create_data_contract, add_column_expectation, etc. |
| License | 1 | get_license_status |
Editions
DataBridge AI is available in four editions:
| Community (CE) | Pro | Pro Examples | Enterprise | |
|---|---|---|---|---|
| Tools | ~106 | ~297 | Tests & Tutorials | 361+ |
| Price | Free | Licensed | Licensed Add-on | Custom |
| Distribution | Public PyPI | GitHub Packages | GitHub Packages | Private Deploy |
| Data Reconciliation | ✅ | ✅ | ✅ | |
| Fuzzy Matching | ✅ | ✅ | ✅ | |
| Data Profiling | ✅ | ✅ | ✅ | |
| PDF/OCR | ✅ | ✅ | ✅ | |
| dbt Basic | ✅ | ✅ | ✅ | |
| Data Quality | ✅ | ✅ | ✅ | |
| UI Dashboard | ✅ | ✅ | ✅ | |
| Diff Utilities | ✅ | ✅ | ✅ | |
| Templates (Basic) | ✅ | ✅ | ✅ | |
| Hierarchy Builder (49 tools) | ✅ | ✅ | ||
| Wright Pipeline (31 tools) | ✅ | ✅ | ||
| Cortex AI Agent (26 tools) | ✅ | ✅ | ||
| Data Catalog (19 tools) | ✅ | ✅ | ||
| Faux Objects (18 tools) | ✅ | ✅ | ||
| Connections (16 tools) | ✅ | ✅ | ||
| AI Orchestrator (16 tools) | ✅ | ✅ | ||
| Data Observability (15 tools) | ✅ | ✅ | ||
| Data Versioning (12 tools) | ✅ | ✅ | ||
| Git/CI-CD (12 tools) | ✅ | ✅ | ||
| Lineage Tracking (11 tools) | ✅ | ✅ | ||
| PlannerAgent (11 tools) | ✅ | ✅ | ||
| GraphRAG Engine (10 tools) | ✅ | ✅ | ||
| Unified AI Agent (10 tools) | ✅ | ✅ | ||
| Hierarchy-Graph Bridge (5 tools) | ✅ | ✅ | ||
| Console Dashboard (5 tools) | ✅ | ✅ | ||
| Schema Matcher (5 tools) | ✅ | ✅ | ||
| Data Matcher (4 tools) | ✅ | ✅ | ||
| 47 Tests + 19 Tutorials | ✅ | |||
| Custom Agents | ✅ | |||
| White-label | ✅ | |||
| SLA Support | ✅ | |||
| On-premise Deploy | ✅ |
Upgrade to Pro
# Set your license key
export DATABRIDGE_LICENSE_KEY="DB-PRO-YOURKEY-20260101-signature"
# Install Pro (from GitHub Packages)
pip install databridge-ai-pro --extra-index-url https://ghp_TOKEN@raw.githubusercontent.com/datanexum/DATABRIDGE_AI/main/
# Install Pro Examples (tests & tutorials, requires Pro key)
pip install databridge-ai-examples # CE tests + beginner tutorials
pip install databridge-ai-examples[pro] # + Pro tests + advanced tutorials
License Key Format: DB-{TIER}-{CUSTOMER_ID}-{EXPIRY}-{SIGNATURE}
Contact sales@databridge.ai for pricing and license keys.
Configuration
Create a .env file in your project root:
# Database connection (optional)
DATABRIDGE_DATABASE_URL=postgresql://user:pass@localhost/db
# OCR settings (optional)
DATABRIDGE_TESSERACT_PATH=/usr/bin/tesseract
# Fuzzy matching threshold (default: 80)
DATABRIDGE_FUZZY_THRESHOLD=80
# Max rows to display (default: 10)
DATABRIDGE_MAX_ROWS_DISPLAY=10
Plugin System
Extend DataBridge AI with custom plugins:
plugins/
├── my_plugin/
│ ├── __init__.py
│ └── mcp_tools.py # Must have register_tools(mcp)
# plugins/my_plugin/mcp_tools.py
def register_tools(mcp):
@mcp.tool()
def my_custom_tool(param: str) -> str:
"""My custom tool description."""
return f"Processed: {param}"
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Links
- Documentation: github.com/datanexum/DATABRIDGE_AI/wiki
- Commercialization Guide: docs/COMMERCIALIZATION.md
- Issues: github.com/datanexum/DATABRIDGE_AI/issues
- Pro Features: Pro Features Wiki
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databridge_ai-0.49.2.tar.gz.
File metadata
- Download URL: databridge_ai-0.49.2.tar.gz
- Upload date:
- Size: 348.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1aada17983e838c33ace0d40e9ba72018abd9bc9eb4d8c9a1dde820e234263c2
|
|
| MD5 |
ea1d1dac962cd31c4c25f699824076ea
|
|
| BLAKE2b-256 |
aa760c16dedcc4d887f3daa7f79687fe28f79a9d62b7f8205a29858c3752023b
|
File details
Details for the file databridge_ai-0.49.2-py3-none-any.whl.
File metadata
- Download URL: databridge_ai-0.49.2-py3-none-any.whl
- Upload date:
- Size: 356.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a73cecc4f4288d8695d1bb13874d297a509ab552ba498045c35e1de3c713eef9
|
|
| MD5 |
0ef0044ee4e0e073fe6a265c182be582
|
|
| BLAKE2b-256 |
534a87f51570a826f76a39ec3314b87464ab0c8f86a71af45db4512cc9a9c1ff
|