Extract semantic intelligence from Power BI .pbix files and convert to formal ontologies
Project description
PowerBI Ontology Extractor
Transform 20 million Power BI dashboards into AI-ready ontologies
Installation • Quick Start • Features • Documentation • Contributing
🎯 The Problem
Enterprises have 20+ million Power BI semantic models that are actually informal ontologies trapped in proprietary .pbix files.
- The Challenge: Each Power BI model contains entities, relationships, and business logic—but AI agents can't access this semantic intelligence
- The Cost: Enterprises spend $50K-$200K per semantic definition to reconcile conflicts across dashboards
- The $4.6M Mistake: A logistics company lost $4.6M when an AI agent used a renamed column (
Warehouse_Location→FacilityID) because there was no semantic binding validation
💡 The Solution
PowerBI Ontology Extractor unlocks the hidden ontologies in your Power BI dashboards and transforms them into formal, AI-ready ontologies.
┌─────────────────────┐ ┌──────────────────────┐ ┌─────────────────────────────┐
│ Power BI .pbix │────▶│ Ontology Extractor │────▶│ OntoGuard │
│ (20M+ dashboards) │ │ (this project) │ │ Semantic Firewall │
└─────────────────────┘ └──────────────────────┘ └─────────────────────────────┘
│ │
│ OWL/Fabric IQ │ Semantic Validation
▼ ▼
┌──────────────────────┐ ┌─────────────────────────────┐
│ Semantic Contract │────▶│ Universal Agent Connector │
│ (permissions) │ │ AI Agent Infrastructure │
└──────────────────────┘ └─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ AI Agents │
│ (Claude, GPT, etc.) │
└─────────────────────────────┘
30-minute workflow:
Power BI (.pbix) → Ontology Extractor → OntoGuard → Universal Agent Connector → AI Agent
10 min 10 min 5 min 3 min 2 min
🚀 Quick Start
Installation
# Install from PyPI (recommended)
pip install powerbi-ontology-extractor
Or install from source:
git clone https://github.com/vpakspace/powerbi-ontology-extractor.git
cd powerbi-ontology-extractor
pip install -e .
Basic Usage
from powerbi_ontology import PowerBIExtractor, OntologyGenerator
# Step 1: Extract semantic model from Power BI
extractor = PowerBIExtractor("path/to/dashboard.pbix")
semantic_model = extractor.extract()
# Step 2: Generate formal ontology
generator = OntologyGenerator(semantic_model)
ontology = generator.generate()
print(f"✅ Extracted {len(ontology.entities)} entities")
print(f"✅ Found {len(ontology.relationships)} relationships")
print(f"✅ Generated {len(ontology.business_rules)} business rules")
# Step 3: Export to OWL for OntoGuard
from powerbi_ontology.export import OWLExporter
exporter = OWLExporter(ontology)
exporter.save("ontology.owl")
Visual Ontology Editor (No-Code UI)
# Start Streamlit UI
streamlit run ontology_editor.py --server.port 8503
Features:
- 📂 Load from .pbix files or JSON
- 📦 Edit entities with properties and constraints
- 🔗 Manage relationships between entities
- 🔐 Configure permission matrix (RBAC)
- 📜 Add business rules with classification
- 🦉 Preview and export OWL
- 🔀 Diff & Merge ontology versions
- 💬 AI Chat - Ask questions about your ontology!
🔥 Key Features
1. Automatic Extraction (PBIXRay)
- ✅ Reads Power BI .pbix files (binary DataModel via PBIXRay)
- ✅ Extracts tables, columns, relationships, hierarchies
- ✅ Parses DAX measures and calculated columns
- ✅ Captures Row-Level Security (RLS) rules
- ✅ Fallback to JSON model.bim for legacy files
2. DAX to Business Rules
- ✅ Parses DAX formulas automatically
- ✅ Extracts conditional logic (IF, SWITCH, CALCULATE)
- ✅ Converts filters to business rules
- ✅ Classifies measure types (aggregation, conditional, time intelligence)
3. Ontology Generation (70% Automated)
- ✅ Entities from tables
- ✅ Properties from columns (with data types)
- ✅ Relationships from foreign keys (with cardinality)
- ✅ Business rules from DAX measures
- ✅ Constraints (required, unique, range, regex, enum)
- ✅ Pattern detection (date tables, dimensions, facts)
4. Multi-Format Export
| Format | Use Case |
|---|---|
| OWL/RDF | OntoGuard semantic validation |
| Fabric IQ | Microsoft Fabric deployment |
| JSON | Universal agent connector |
| Semantic Contract | Role-based AI agent permissions |
5. Schema Drift Detection (Prevents $4.6M Mistakes!)
- ✅ Validates schema bindings
- ✅ Detects column renames/deletions
- ✅ Type normalization (varchar→text, int→integer)
- ✅ Severity levels: CRITICAL, WARNING, INFO
- ✅ Auto-fix suggestions
6. Multi-Dashboard Semantic Debt Analysis
- ✅ Analyzes multiple Power BI dashboards
- ✅ Detects conflicting definitions ("Revenue" defined differently)
- ✅ 5 conflict types: MEASURE, TYPE, ENTITY, RELATIONSHIP, RULE
- ✅ Generates consolidation reports
7. Ontology Diff & Merge
- ✅ Git-like diff between ontology versions
- ✅ Detect added/removed/modified elements
- ✅ Three-way merge (base, ours, theirs)
- ✅ Conflict detection and resolution strategies
8. Collaborative Review Workflow
- ✅ Comments on entities/properties/rules
- ✅ Reply and resolve threads
- ✅ Approval workflow: draft → review → approved → published
- ✅ Audit trail of all actions
9. CLI Tool for Automation
# Extract single .pbix file
pbix2owl extract -i dashboard.pbix -o ontology.owl
# Batch process directory (8 parallel workers)
pbix2owl batch -i ./dashboards/ -o ./ontologies/ -w 8 --recursive
# Analyze semantic debt
pbix2owl analyze -i ./ontologies/ -o report.md
# Compare versions (diff)
pbix2owl diff -s v1.json -t v2.json -o changelog.md
10. AI-Powered Ontology Chat 🆕
- ✅ Ask questions about loaded ontology in natural language
- ✅ OpenAI API integration (gpt-4o-mini)
- ✅ Role-based context (Admin/Analyst/Viewer)
- ✅ Bilingual support (Russian/English)
- ✅ Suggested questions based on ontology content
Example questions:
- "What entities exist in the ontology?"
- "How are Customer and Sales related?"
- "Show all DAX measures"
- "What permissions does Analyst role have?"
📊 Real-World Example
Tested with Microsoft official samples:
| File | Size | Entities | Relationships | DAX Measures | OWL Triples |
|---|---|---|---|---|---|
| Sales_Returns_Sample.pbix | 6.3 MB | 15 | 9 | 58 | 1,734 |
| Adventure_Works_DW_2020.pbix | 7.8 MB | 11 | 13 | 0 | 1,083 |
from powerbi_ontology import PowerBIExtractor, OntologyGenerator
from powerbi_ontology.export import OWLExporter
# Extract from Power BI
extractor = PowerBIExtractor("Sales_Returns_Sample.pbix")
model = extractor.extract()
# Generate ontology
ontology = OntologyGenerator(model).generate()
# Export to OWL (for OntoGuard)
exporter = OWLExporter(ontology, default_roles=["Admin", "Analyst", "Viewer"])
exporter.save("sales_ontology.owl")
# Summary
summary = exporter.get_export_summary()
print(f"Classes: {summary['classes']}")
print(f"Properties: {summary['datatype_properties']}")
print(f"Action Rules: {summary['action_rules']}") # CRUD per entity × role
🔗 Integration Ecosystem
OntoGuard (Semantic Firewall)
from powerbi_ontology.export import OWLExporter
exporter = OWLExporter(ontology)
exporter.save("ontology.owl")
# Use with OntoGuard for AI agent validation
# github.com/vpakspace/ontoguard-ai
Universal Agent Connector (MCP)
from powerbi_ontology import ContractBuilder
from powerbi_ontology.export import ContractToOWLConverter
# Create semantic contract for AI agent
builder = ContractBuilder(ontology)
contract = builder.build_contract(
agent_name="SalesAnalyst",
permissions={
"read": ["Customer", "Sales", "Product"],
"write": {"Sales": ["Status"]},
"execute": ["GenerateReport"]
}
)
# Export for MCP
converter = ContractToOWLConverter(contract)
converter.save("sales_agent_contract.owl")
# Use with Universal Agent Connector
# github.com/vpakspace/universal-agent-connector
Microsoft Fabric IQ
from powerbi_ontology.export import FabricIQExporter
exporter = FabricIQExporter(ontology)
fabric_json = exporter.export()
# Deploy as Ontology Item to OneLake
🧪 Testing
# Run all tests (340 tests, 82% coverage)
pytest
# Run with coverage report
pytest --cov=powerbi_ontology --cov-report=html
# Run specific test module
pytest tests/test_owl_exporter.py -v
Test Statistics:
- 340 tests passing
- 82% code coverage
- E2E tests with real .pbix files
- OntoGuard integration tests
📁 Project Structure
powerbi-ontology-extractor/
├── powerbi_ontology/
│ ├── __init__.py
│ ├── extractor.py # PowerBIExtractor
│ ├── ontology_generator.py # OntologyGenerator
│ ├── pbix_reader.py # PBIXRay integration
│ ├── dax_parser.py # DAX formula parsing
│ ├── semantic_debt.py # Multi-dashboard analysis
│ ├── ontology_diff.py # Diff & Merge
│ ├── review.py # Collaborative review
│ ├── chat.py # AI Chat (OpenAI)
│ ├── cli.py # CLI commands
│ ├── mcp_server.py # MCP Server for Claude Code
│ ├── export/
│ │ ├── owl.py # OWL/RDF export
│ │ ├── fabric_iq.py # Fabric IQ export
│ │ ├── fabric_iq_to_owl.py
│ │ └── contract_to_owl.py
│ └── utils/
│ ├── visualizer.py
│ └── validators.py
├── ontology_editor.py # Streamlit UI (1300+ lines)
├── examples/
│ ├── sample_pbix/ # Microsoft official samples
│ └── sample_ontology.json
├── tests/ # 340 tests
├── requirements.txt
└── README.md
📊 Project Status
| Feature | Status | Coverage |
|---|---|---|
| PBIX Extraction (PBIXRay) | ✅ Complete | 51% |
| DAX Parser | ✅ Complete | 73% |
| Ontology Generator | ✅ Complete | 83% |
| OWL Exporter | ✅ Complete | 95% |
| Fabric IQ Exporter | ✅ Complete | 97% |
| Contract Builder | ✅ Complete | 98% |
| Schema Drift Detection | ✅ Complete | 84% |
| Semantic Debt Analysis | ✅ Complete | 84% |
| Ontology Diff & Merge | ✅ Complete | 84% |
| Review Workflow | ✅ Complete | 93% |
| CLI Tool | ✅ Complete | 60% |
| MCP Server (Claude Code) | ✅ Complete | 85% |
| Visual Editor (Streamlit) | ✅ Complete | - |
| AI Chat (OpenAI) | ✅ Complete | - |
Overall: 370 tests, 82% coverage
PyPI: https://pypi.org/project/powerbi-ontology-extractor/
🤖 MCP Server (Claude Code Integration)
Use PowerBI Ontology Extractor directly in Claude Code via MCP protocol.
Setup
- Install the package:
pip install powerbi-ontology-extractor
- Add to
~/.claude.json:
{
"mcpServers": {
"powerbi-ontology": {
"command": "python",
"args": ["-m", "powerbi_ontology.mcp_server"]
}
}
}
Optional: Add
"env": {"OPENAI_API_KEY": "..."}for AI chat feature.
- Restart Claude Code
Available MCP Tools
| Tool | Description |
|---|---|
pbix_extract |
Extract semantic model from .pbix file |
ontology_generate |
Generate ontology from model data |
export_owl |
Export to OWL format (xml/turtle) |
export_json |
Export to JSON format |
analyze_debt |
Analyze semantic debt across ontologies |
ontology_diff |
Compare two ontology versions |
ontology_merge |
Merge ontologies (three-way) |
ontology_chat_ask |
AI Q&A about ontology |
Usage Examples in Claude Code
# Extract and generate ontology
"Extract ontology from sales.pbix and export to OWL"
# Ask questions about ontology
"What entities are in the Sales_Returns ontology?"
# Compare versions
"Compare v1 and v2 ontologies and show differences"
🛠️ Development Setup
# Clone repository
git clone https://github.com/vpakspace/powerbi-ontology-extractor.git
cd powerbi-ontology-extractor
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -e .
# Run tests
pytest
# Start Streamlit UI
streamlit run ontology_editor.py --server.port 8503
Environment Variables
Create .env file for AI Chat:
# Required for Ontology Chat
OPENAI_API_KEY=your-openai-api-key
# Optional: Model selection (default: gpt-4o-mini)
# OPENAI_MODEL=gpt-4o-mini
# Optional: Local models via Ollama
# OLLAMA_BASE_URL=http://localhost:11434/v1
🤝 Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines.
Ways to contribute:
- 🐛 Report bugs via GitHub Issues
- 💡 Suggest features
- 📝 Improve documentation
- 🔧 Submit pull requests
- ⭐ Star the repository
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Related Projects
| Project | Description |
|---|---|
| OntoGuard AI | Semantic Firewall for AI Agents |
| Universal Agent Connector | MCP Infrastructure + Streamlit UI |
📞 Contact
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
Ready to unlock the semantic intelligence in your Power BI dashboards? 🚀
git clone https://github.com/vpakspace/powerbi-ontology-extractor.git
cd powerbi-ontology-extractor
pip install -r requirements.txt
streamlit run ontology_editor.py
Star ⭐ this repo if you find it useful!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file powerbi_ontology_extractor-0.1.1.tar.gz.
File metadata
- Download URL: powerbi_ontology_extractor-0.1.1.tar.gz
- Upload date:
- Size: 14.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45f57fdce31dfa4e2321883c9003fe131553c24cbc31d6ef983cf2992a134ef4
|
|
| MD5 |
ed4ed97986d88780c495324a35643892
|
|
| BLAKE2b-256 |
fb941ba227b4826e56dcf14190ace27cce27753884a0e903cf3b3edc04060e89
|
File details
Details for the file powerbi_ontology_extractor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: powerbi_ontology_extractor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 94.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9558e28d066dfba4ce8e9b294ca5ce187e611d55b556fa1fbd2e37e0d99fc059
|
|
| MD5 |
bea3f27e88db2edabc058fa2233a4b2d
|
|
| BLAKE2b-256 |
2e785fb1e8ac97d47d89a7241735e8d564d92b62cf62a4e929b9f58041ac5f5a
|