Skip to main content

DataBridge AI Model Discovery Engine - Automated SQL parsing, CASE extraction, and hierarchy generation

Project description

DataBridge Discovery Engine

Automated SQL parsing, CASE statement extraction, and hierarchy generation for data warehouse modeling.

Features

  • SQL Parsing: Multi-dialect SQL parsing using sqlglot (Snowflake, PostgreSQL, T-SQL, MySQL, BigQuery)
  • CASE Extraction: Automatic extraction of CASE WHEN statements with hierarchy detection
  • Semantic Graph: Graph-based semantic modeling with NetworkX
  • Entity Detection: Detects 12 standard entity types (account, cost_center, department, etc.)
  • Librarian Integration: Direct export to Librarian hierarchy project format

Installation

# Basic installation
pip install databridge-discovery

# With embeddings support
pip install databridge-discovery[embeddings]

# With MCP tools
pip install databridge-discovery[mcp]

# Full installation
pip install databridge-discovery[all]

Quick Start

from databridge_discovery import SQLParser, CaseExtractor, DiscoverySession

# Parse SQL
parser = SQLParser(dialect="snowflake")
ast = parser.parse(sql_query)

# Extract CASE statements
extractor = CaseExtractor()
cases = extractor.extract(ast)

# Start discovery session
session = DiscoverySession()
session.add_sql_source(sql_query)
session.analyze()

# Get proposed hierarchies
hierarchies = session.get_proposed_hierarchies()

MCP Tools

The library provides 50 MCP tools across 7 phases:

Phase 1: SQL Parser & Session (6 tools)

  • parse_sql - Parse SQL and return AST
  • extract_case_statements - Extract CASE WHEN logic
  • analyze_sql_complexity - Query complexity metrics
  • start_discovery_session - Initialize discovery session
  • get_discovery_session - Get session state
  • export_discovery_evidence - Export evidence

Phase 2: Semantic Graph (8 tools)

  • build_semantic_graph - Build from schema
  • add_graph_relationship - Add edge
  • find_join_paths - Find join candidates
  • And more...

Phase 3-7: See full documentation

Entity Types

The discovery engine detects 12 standard entity types:

Entity Description
account GL accounts, chart of accounts
cost_center Cost centers, profit centers
department Organizational departments
entity Legal entities, companies
project Projects, work orders
product Products, SKUs
customer Customers, clients
vendor Vendors, suppliers
employee Employees, workers
location Geographic locations
time_period Time periods, fiscal periods
currency Currencies

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databridge_discovery-0.44.0.tar.gz (198.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databridge_discovery-0.44.0-py3-none-any.whl (198.1 kB view details)

Uploaded Python 3

File details

Details for the file databridge_discovery-0.44.0.tar.gz.

File metadata

  • Download URL: databridge_discovery-0.44.0.tar.gz
  • Upload date:
  • Size: 198.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for databridge_discovery-0.44.0.tar.gz
Algorithm Hash digest
SHA256 6993090567ccff95e48b3df36edd638f75372a012aba45b1d3a9c04980d011f4
MD5 a0e7ac656f1126751f4df9de51a80d03
BLAKE2b-256 c718ebaecd10ca21b69e7d470da07ee1a882aeb3738ef243c81625e12dccb68c

See more details on using hashes here.

File details

Details for the file databridge_discovery-0.44.0-py3-none-any.whl.

File metadata

File hashes

Hashes for databridge_discovery-0.44.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7bf587166c35da9461ebbf1147a612d8c0fc039c695a3a5d1907d72d3cfc993b
MD5 09fd10bf4d2d1424d9c0970e0d78e623
BLAKE2b-256 78d2e058e05dda19b09d1c888364f7285ae57ece5090315b1f02060ec19fe2df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page