Skip to main content

DataBridge AI Model Discovery Engine - Automated SQL parsing, CASE extraction, and hierarchy generation

Project description

DataBridge Discovery Engine

Automated SQL parsing, CASE statement extraction, and hierarchy generation for data warehouse modeling.

Features

  • SQL Parsing: Multi-dialect SQL parsing using sqlglot (Snowflake, PostgreSQL, T-SQL, MySQL, BigQuery)
  • CASE Extraction: Automatic extraction of CASE WHEN statements with hierarchy detection
  • Semantic Graph: Graph-based semantic modeling with NetworkX
  • Entity Detection: Detects 12 standard entity types (account, cost_center, department, etc.)
  • Librarian Integration: Direct export to Librarian hierarchy project format

Installation

# Basic installation
pip install databridge-discovery

# With embeddings support
pip install databridge-discovery[embeddings]

# With MCP tools
pip install databridge-discovery[mcp]

# Full installation
pip install databridge-discovery[all]

Quick Start

from databridge_discovery import SQLParser, CaseExtractor, DiscoverySession

# Parse SQL
parser = SQLParser(dialect="snowflake")
ast = parser.parse(sql_query)

# Extract CASE statements
extractor = CaseExtractor()
cases = extractor.extract(ast)

# Start discovery session
session = DiscoverySession()
session.add_sql_source(sql_query)
session.analyze()

# Get proposed hierarchies
hierarchies = session.get_proposed_hierarchies()

MCP Tools

The library provides 50 MCP tools across 7 phases:

Phase 1: SQL Parser & Session (6 tools)

  • parse_sql - Parse SQL and return AST
  • extract_case_statements - Extract CASE WHEN logic
  • analyze_sql_complexity - Query complexity metrics
  • start_discovery_session - Initialize discovery session
  • get_discovery_session - Get session state
  • export_discovery_evidence - Export evidence

Phase 2: Semantic Graph (8 tools)

  • build_semantic_graph - Build from schema
  • add_graph_relationship - Add edge
  • find_join_paths - Find join candidates
  • And more...

Phase 3-7: See full documentation

Entity Types

The discovery engine detects 12 standard entity types:

Entity Description
account GL accounts, chart of accounts
cost_center Cost centers, profit centers
department Organizational departments
entity Legal entities, companies
project Projects, work orders
product Products, SKUs
customer Customers, clients
vendor Vendors, suppliers
employee Employees, workers
location Geographic locations
time_period Time periods, fiscal periods
currency Currencies

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databridge_discovery-0.43.0.tar.gz (198.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databridge_discovery-0.43.0-py3-none-any.whl (198.1 kB view details)

Uploaded Python 3

File details

Details for the file databridge_discovery-0.43.0.tar.gz.

File metadata

  • Download URL: databridge_discovery-0.43.0.tar.gz
  • Upload date:
  • Size: 198.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for databridge_discovery-0.43.0.tar.gz
Algorithm Hash digest
SHA256 a933daa303f4c1ca91488b833fb7e874c98f3bedd2219211b981e8f07654e089
MD5 f005e3c1c3367b07c13423f9b49ad0d9
BLAKE2b-256 71697dd92c7060b8964952a2c00bd451574c717fbaff13a1713a57b45a1f640e

See more details on using hashes here.

File details

Details for the file databridge_discovery-0.43.0-py3-none-any.whl.

File metadata

File hashes

Hashes for databridge_discovery-0.43.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2793ed9c60daaf8c463bc51a507e8d907e0e7a52b4d4ba39b366fb832d9eb858
MD5 5229a42532911ebe701e9bee5f8c92c9
BLAKE2b-256 24b242e124712e32b5375f61e17366ca6d5ec5e7c60c748a6cdbe8711fd367ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page