Skip to main content

Upload your Chart of Accounts. Get a production-ready financial hierarchy and dbt models. Zero config.

Project description

DataBridge Core

PyPI version Python License: MIT

Your finance team just spent 4 hours on VLOOKUP. This takes 5 seconds.

DataBridge Core is a Python toolkit for data reconciliation, profiling, ingestion, and Excel triage. Compare CSV files, find fuzzy matches, detect schema drift, scan Excel workbooks, and send results to Slack -- from the command line or Python.

pip install databridge-core

5-Second Demo

# Profile a file
databridge profile sales.csv

# Compare two sources -- find orphans, conflicts, match rate
databridge compare source.csv target.csv --keys id

# Fuzzy match names across systems
databridge fuzzy erp_accounts.csv gl_accounts.csv --column name --threshold 80

# Scan Excel files and classify by archetype
pip install 'databridge-core[triage]'
databridge triage ./excel_files/

Python API

from databridge_core import compare_hashes, profile_data, load_csv

# Profile your data
profile = profile_data("chart_of_accounts.csv")
print(f"{profile['rows']} rows, {profile['columns']} columns")
print(f"Potential keys: {profile['potential_key_columns']}")

# Compare two sources
result = compare_hashes("source.csv", "target.csv", key_columns="account_id")
stats = result["statistics"]
print(f"Match rate: {stats['match_rate_percent']}%")
print(f"Conflicts: {stats['conflicts']}, Orphans: {stats['total_orphans']}")

Templates

from databridge_core.templates import TemplateService

svc = TemplateService(templates_dir="templates")
templates = svc.list_templates(domain="accounting")
rec = svc.get_template_recommendations(industry="manufacturing", statement_type="pl")

Slack Integration

from databridge_core.integrations import SlackClient

slack = SlackClient(bot_token="xoxb-...")
slack.send_message("#data-ops", "Reconciliation complete: 99.5% match rate")
slack.post_reconciliation_report("#data-ops", result)

Grounded Anomaly Detection

from databridge_core.detection import detect_grounded, record_feedback

# Detect anomalies grounded in Knowledge Base rules
result = detect_grounded("trial_balance.csv", knowledge_dir="data/knowledge/")
print(f"{result['total_findings']} findings from {result['summary']['rules_applied']} KB rules")

# User feedback loop — improve detection over time
record_feedback("finding_abc123", confirmed=True, notes="Real sign reversal")

Excel Triage

from databridge_core.triage import scan_and_classify

result = scan_and_classify("./excel_files/", output_dir="./reports/")
print(f"Scanned {result['summary']['total_files']} files")
print(f"Archetypes: {result['summary']['archetype_counts']}")

Commands

Command Description
databridge profile <file> Profile data: structure, quality, cardinality
databridge compare <a> <b> --keys <col> Hash comparison: orphans, conflicts, match rate
databridge fuzzy <a> <b> -c <col> Fuzzy match columns across two files
databridge diff <a> <b> Text diff between two files
databridge drift <old> <new> Detect schema drift between CSVs
databridge transform <file> -c <col> --op upper Clean a column (upper/lower/strip/trim/remove_special)
databridge merge <a> <b> --keys <col> Merge two CSVs on key columns
databridge find "*.csv" Find files matching a pattern
databridge parse <text> Parse tabular data from messy text
databridge triage <dir> Scan Excel files and classify by archetype

Optional Extras

pip install 'databridge-core[fuzzy]'    # Fuzzy matching (rapidfuzz)
pip install 'databridge-core[pdf]'      # PDF text extraction (pypdf)
pip install 'databridge-core[ocr]'      # OCR image extraction (pytesseract)
pip install 'databridge-core[sql]'      # Database queries (sqlalchemy)
pip install 'databridge-core[triage]'   # Excel triage scanning (openpyxl)
pip install 'databridge-core[detection]' # AI verification pipeline (langgraph, langchain)
pip install 'databridge-core[all]'      # Everything
pip install 'databridge-core[dev]'      # Development tools (pytest, ruff, build)

Modules

Module Description Extra Required
reconciler Hash comparison, fuzzy matching, diffing, merging -
profiler Data profiling, schema drift detection -
ingestion CSV, JSON, PDF, OCR loading [pdf], [ocr]
templates Industry hierarchy templates, skills, knowledge base -
integrations Slack client (BaseClient + SlackClient) -
triage Batch Excel scanning and archetype classification [triage]
detection KB-grounded anomaly detection with AI verification [detection]

Built for Finance

DataBridge Core is the open-source foundation of DataBridge AI -- a full platform for financial hierarchy management, dbt model generation, and enterprise data reconciliation with 287 MCP tools.

How it works: Upload your Chart of Accounts. Get a production-ready financial hierarchy and dbt models. Zero config.

What's Next?

DataBridge Core provides the SDK foundation. For the full platform experience:

  • MCP Server (268 tools): pip install databridge-ai -- headless AI-native data engine
  • Docker: docker run -p 786:786 ghcr.io/datanexum/databridge-mcp:latest
  • Claude Code Plugin: claude plugin install datanexum/databridge-plugin

See the full documentation for details.

Changelog

See CHANGELOG.md for full version history.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databridge_core-1.5.0.tar.gz (334.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databridge_core-1.5.0-py3-none-any.whl (239.3 kB view details)

Uploaded Python 3

File details

Details for the file databridge_core-1.5.0.tar.gz.

File metadata

  • Download URL: databridge_core-1.5.0.tar.gz
  • Upload date:
  • Size: 334.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for databridge_core-1.5.0.tar.gz
Algorithm Hash digest
SHA256 cc0e3dcdf5234d8e7c2c021118b9bf9c30810e1ef2be57b5df56836122b94b91
MD5 973561996e60ae968ecb0e42083396dc
BLAKE2b-256 11c38a384b43cb5a5ec803edfefa4721d41378acde06d09b34295a949ef851f2

See more details on using hashes here.

File details

Details for the file databridge_core-1.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for databridge_core-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9af7071d7d512e446fd757b7d84d2fdd484cae6275b5cfe878d33c3a9298657
MD5 4e580a619fefd08a8acbf50053a181e5
BLAKE2b-256 5bae1dc70f5f70e82473b94f7769dc68934ac44a2f6dd186a18b379d2c67c31b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page