DataBridge AI Model Discovery Engine - Automated SQL parsing, CASE extraction, and hierarchy generation
Project description
DataBridge Discovery Engine
Automated SQL parsing, CASE statement extraction, and hierarchy generation for data warehouse modeling.
Features
- SQL Parsing: Multi-dialect SQL parsing using sqlglot (Snowflake, PostgreSQL, T-SQL, MySQL, BigQuery)
- CASE Extraction: Automatic extraction of CASE WHEN statements with hierarchy detection
- Semantic Graph: Graph-based semantic modeling with NetworkX
- Entity Detection: Detects 12 standard entity types (account, cost_center, department, etc.)
- Librarian Integration: Direct export to Librarian hierarchy project format
Installation
# Basic installation
pip install databridge-discovery
# With embeddings support
pip install databridge-discovery[embeddings]
# With MCP tools
pip install databridge-discovery[mcp]
# Full installation
pip install databridge-discovery[all]
Quick Start
from databridge_discovery import SQLParser, CaseExtractor, DiscoverySession
# Parse SQL
parser = SQLParser(dialect="snowflake")
ast = parser.parse(sql_query)
# Extract CASE statements
extractor = CaseExtractor()
cases = extractor.extract(ast)
# Start discovery session
session = DiscoverySession()
session.add_sql_source(sql_query)
session.analyze()
# Get proposed hierarchies
hierarchies = session.get_proposed_hierarchies()
MCP Tools
The library provides 50 MCP tools across 7 phases:
Phase 1: SQL Parser & Session (6 tools)
parse_sql- Parse SQL and return ASTextract_case_statements- Extract CASE WHEN logicanalyze_sql_complexity- Query complexity metricsstart_discovery_session- Initialize discovery sessionget_discovery_session- Get session stateexport_discovery_evidence- Export evidence
Phase 2: Semantic Graph (8 tools)
build_semantic_graph- Build from schemaadd_graph_relationship- Add edgefind_join_paths- Find join candidates- And more...
Phase 3-7: See full documentation
Entity Types
The discovery engine detects 12 standard entity types:
| Entity | Description |
|---|---|
| account | GL accounts, chart of accounts |
| cost_center | Cost centers, profit centers |
| department | Organizational departments |
| entity | Legal entities, companies |
| project | Projects, work orders |
| product | Products, SKUs |
| customer | Customers, clients |
| vendor | Vendors, suppliers |
| employee | Employees, workers |
| location | Geographic locations |
| time_period | Time periods, fiscal periods |
| currency | Currencies |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databridge_discovery-0.43.0.tar.gz.
File metadata
- Download URL: databridge_discovery-0.43.0.tar.gz
- Upload date:
- Size: 198.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a933daa303f4c1ca91488b833fb7e874c98f3bedd2219211b981e8f07654e089
|
|
| MD5 |
f005e3c1c3367b07c13423f9b49ad0d9
|
|
| BLAKE2b-256 |
71697dd92c7060b8964952a2c00bd451574c717fbaff13a1713a57b45a1f640e
|
File details
Details for the file databridge_discovery-0.43.0-py3-none-any.whl.
File metadata
- Download URL: databridge_discovery-0.43.0-py3-none-any.whl
- Upload date:
- Size: 198.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2793ed9c60daaf8c463bc51a507e8d907e0e7a52b4d4ba39b366fb832d9eb858
|
|
| MD5 |
5229a42532911ebe701e9bee5f8c92c9
|
|
| BLAKE2b-256 |
24b242e124712e32b5375f61e17366ca6d5ec5e7c60c748a6cdbe8711fd367ff
|