Skip to main content

Extract semantic intelligence from Power BI .pbix files and convert to formal ontologies

Project description

PowerBI Ontology Extractor

PowerBI Ontology Extractor

Transform 20 million Power BI dashboards into AI-ready ontologies

Tests Coverage Python 3.9+ License: MIT PyPI version

InstallationQuick StartFeaturesDocumentationContributing


🎯 The Problem

Enterprises have 20+ million Power BI semantic models that are actually informal ontologies trapped in proprietary .pbix files.

  • The Challenge: Each Power BI model contains entities, relationships, and business logic—but AI agents can't access this semantic intelligence
  • The Cost: Enterprises spend $50K-$200K per semantic definition to reconcile conflicts across dashboards
  • The $4.6M Mistake: A logistics company lost $4.6M when an AI agent used a renamed column (Warehouse_LocationFacilityID) because there was no semantic binding validation

💡 The Solution

PowerBI Ontology Extractor unlocks the hidden ontologies in your Power BI dashboards and transforms them into formal, AI-ready ontologies.

┌─────────────────────┐     ┌──────────────────────┐     ┌─────────────────────────────┐
│   Power BI .pbix    │────▶│  Ontology Extractor  │────▶│       OntoGuard             │
│  (20M+ dashboards)  │     │  (this project)      │     │  Semantic Firewall          │
└─────────────────────┘     └──────────────────────┘     └─────────────────────────────┘
                                     │                              │
                                     │ OWL/Fabric IQ                │ Semantic Validation
                                     ▼                              ▼
                            ┌──────────────────────┐     ┌─────────────────────────────┐
                            │   Semantic Contract  │────▶│  Universal Agent Connector  │
                            │   (permissions)      │     │  AI Agent Infrastructure    │
                            └──────────────────────┘     └─────────────────────────────┘
                                                                    │
                                                                    ▼
                                                         ┌─────────────────────────────┐
                                                         │       AI Agents             │
                                                         │  (Claude, GPT, etc.)        │
                                                         └─────────────────────────────┘

30-minute workflow:

Power BI (.pbix) → Ontology Extractor → OntoGuard → Universal Agent Connector → AI Agent
     10 min           10 min            5 min            3 min               2 min

🚀 Quick Start

Installation

# Install from PyPI (recommended)
pip install powerbi-ontology-extractor

Or install from source:

git clone https://github.com/vpakspace/powerbi-ontology-extractor.git
cd powerbi-ontology-extractor
pip install -e .

Basic Usage

from powerbi_ontology import PowerBIExtractor, OntologyGenerator

# Step 1: Extract semantic model from Power BI
extractor = PowerBIExtractor("path/to/dashboard.pbix")
semantic_model = extractor.extract()

# Step 2: Generate formal ontology
generator = OntologyGenerator(semantic_model)
ontology = generator.generate()

print(f"✅ Extracted {len(ontology.entities)} entities")
print(f"✅ Found {len(ontology.relationships)} relationships")
print(f"✅ Generated {len(ontology.business_rules)} business rules")

# Step 3: Export to OWL for OntoGuard
from powerbi_ontology.export import OWLExporter

exporter = OWLExporter(ontology)
exporter.save("ontology.owl")

Visual Ontology Editor (No-Code UI)

# Start Streamlit UI
streamlit run ontology_editor.py --server.port 8503

Features:

  • 📂 Load from .pbix files or JSON
  • 📦 Edit entities with properties and constraints
  • 🔗 Manage relationships between entities
  • 🔐 Configure permission matrix (RBAC)
  • 📜 Add business rules with classification
  • 🦉 Preview and export OWL
  • 🔀 Diff & Merge ontology versions
  • 💬 AI Chat - Ask questions about your ontology!

🔥 Key Features

1. Automatic Extraction (PBIXRay)

  • ✅ Reads Power BI .pbix files (binary DataModel via PBIXRay)
  • ✅ Extracts tables, columns, relationships, hierarchies
  • ✅ Parses DAX measures and calculated columns
  • ✅ Captures Row-Level Security (RLS) rules
  • ✅ Fallback to JSON model.bim for legacy files

2. DAX to Business Rules

  • ✅ Parses DAX formulas automatically
  • ✅ Extracts conditional logic (IF, SWITCH, CALCULATE)
  • ✅ Converts filters to business rules
  • ✅ Classifies measure types (aggregation, conditional, time intelligence)

3. Ontology Generation (70% Automated)

  • ✅ Entities from tables
  • ✅ Properties from columns (with data types)
  • ✅ Relationships from foreign keys (with cardinality)
  • ✅ Business rules from DAX measures
  • ✅ Constraints (required, unique, range, regex, enum)
  • ✅ Pattern detection (date tables, dimensions, facts)

4. Multi-Format Export

Format Use Case
OWL/RDF OntoGuard semantic validation
Fabric IQ Microsoft Fabric deployment
JSON Universal agent connector
Semantic Contract Role-based AI agent permissions

5. Schema Drift Detection (Prevents $4.6M Mistakes!)

  • ✅ Validates schema bindings
  • ✅ Detects column renames/deletions
  • ✅ Type normalization (varchar→text, int→integer)
  • ✅ Severity levels: CRITICAL, WARNING, INFO
  • ✅ Auto-fix suggestions

6. Multi-Dashboard Semantic Debt Analysis

  • ✅ Analyzes multiple Power BI dashboards
  • ✅ Detects conflicting definitions ("Revenue" defined differently)
  • ✅ 5 conflict types: MEASURE, TYPE, ENTITY, RELATIONSHIP, RULE
  • ✅ Generates consolidation reports

7. Ontology Diff & Merge

  • ✅ Git-like diff between ontology versions
  • ✅ Detect added/removed/modified elements
  • ✅ Three-way merge (base, ours, theirs)
  • ✅ Conflict detection and resolution strategies

8. Collaborative Review Workflow

  • ✅ Comments on entities/properties/rules
  • ✅ Reply and resolve threads
  • ✅ Approval workflow: draft → review → approved → published
  • ✅ Audit trail of all actions

9. CLI Tool for Automation

# Extract single .pbix file
pbix2owl extract -i dashboard.pbix -o ontology.owl

# Batch process directory (8 parallel workers)
pbix2owl batch -i ./dashboards/ -o ./ontologies/ -w 8 --recursive

# Analyze semantic debt
pbix2owl analyze -i ./ontologies/ -o report.md

# Compare versions (diff)
pbix2owl diff -s v1.json -t v2.json -o changelog.md

10. AI-Powered Ontology Chat 🆕

  • ✅ Ask questions about loaded ontology in natural language
  • ✅ OpenAI API integration (gpt-4o-mini)
  • ✅ Role-based context (Admin/Analyst/Viewer)
  • ✅ Bilingual support (Russian/English)
  • ✅ Suggested questions based on ontology content

Example questions:

  • "What entities exist in the ontology?"
  • "How are Customer and Sales related?"
  • "Show all DAX measures"
  • "What permissions does Analyst role have?"

📊 Real-World Example

Tested with Microsoft official samples:

File Size Entities Relationships DAX Measures OWL Triples
Sales_Returns_Sample.pbix 6.3 MB 15 9 58 1,734
Adventure_Works_DW_2020.pbix 7.8 MB 11 13 0 1,083
from powerbi_ontology import PowerBIExtractor, OntologyGenerator
from powerbi_ontology.export import OWLExporter

# Extract from Power BI
extractor = PowerBIExtractor("Sales_Returns_Sample.pbix")
model = extractor.extract()

# Generate ontology
ontology = OntologyGenerator(model).generate()

# Export to OWL (for OntoGuard)
exporter = OWLExporter(ontology, default_roles=["Admin", "Analyst", "Viewer"])
exporter.save("sales_ontology.owl")

# Summary
summary = exporter.get_export_summary()
print(f"Classes: {summary['classes']}")
print(f"Properties: {summary['datatype_properties']}")
print(f"Action Rules: {summary['action_rules']}")  # CRUD per entity × role

🔗 Integration Ecosystem

OntoGuard (Semantic Firewall)

from powerbi_ontology.export import OWLExporter

exporter = OWLExporter(ontology)
exporter.save("ontology.owl")

# Use with OntoGuard for AI agent validation
# github.com/vpakspace/ontoguard-ai

Universal Agent Connector (MCP)

from powerbi_ontology import ContractBuilder
from powerbi_ontology.export import ContractToOWLConverter

# Create semantic contract for AI agent
builder = ContractBuilder(ontology)
contract = builder.build_contract(
    agent_name="SalesAnalyst",
    permissions={
        "read": ["Customer", "Sales", "Product"],
        "write": {"Sales": ["Status"]},
        "execute": ["GenerateReport"]
    }
)

# Export for MCP
converter = ContractToOWLConverter(contract)
converter.save("sales_agent_contract.owl")

# Use with Universal Agent Connector
# github.com/vpakspace/universal-agent-connector

Microsoft Fabric IQ

from powerbi_ontology.export import FabricIQExporter

exporter = FabricIQExporter(ontology)
fabric_json = exporter.export()

# Deploy as Ontology Item to OneLake

🧪 Testing

# Run all tests (340 tests, 82% coverage)
pytest

# Run with coverage report
pytest --cov=powerbi_ontology --cov-report=html

# Run specific test module
pytest tests/test_owl_exporter.py -v

Test Statistics:

  • 340 tests passing
  • 82% code coverage
  • E2E tests with real .pbix files
  • OntoGuard integration tests

📁 Project Structure

powerbi-ontology-extractor/
├── powerbi_ontology/
│   ├── __init__.py
│   ├── extractor.py           # PowerBIExtractor
│   ├── ontology_generator.py  # OntologyGenerator
│   ├── pbix_reader.py         # PBIXRay integration
│   ├── dax_parser.py          # DAX formula parsing
│   ├── semantic_debt.py       # Multi-dashboard analysis
│   ├── ontology_diff.py       # Diff & Merge
│   ├── review.py              # Collaborative review
│   ├── chat.py                # AI Chat (OpenAI)
│   ├── cli.py                 # CLI commands
│   ├── mcp_server.py          # MCP Server for Claude Code
│   ├── export/
│   │   ├── owl.py             # OWL/RDF export
│   │   ├── fabric_iq.py       # Fabric IQ export
│   │   ├── fabric_iq_to_owl.py
│   │   └── contract_to_owl.py
│   └── utils/
│       ├── visualizer.py
│       └── validators.py
├── ontology_editor.py         # Streamlit UI (1300+ lines)
├── examples/
│   ├── sample_pbix/           # Microsoft official samples
│   └── sample_ontology.json
├── tests/                     # 340 tests
├── requirements.txt
└── README.md

📊 Project Status

Feature Status Coverage
PBIX Extraction (PBIXRay) ✅ Complete 51%
DAX Parser ✅ Complete 73%
Ontology Generator ✅ Complete 83%
OWL Exporter ✅ Complete 95%
Fabric IQ Exporter ✅ Complete 97%
Contract Builder ✅ Complete 98%
Schema Drift Detection ✅ Complete 84%
Semantic Debt Analysis ✅ Complete 84%
Ontology Diff & Merge ✅ Complete 84%
Review Workflow ✅ Complete 93%
CLI Tool ✅ Complete 60%
MCP Server (Claude Code) ✅ Complete 85%
Visual Editor (Streamlit) ✅ Complete -
AI Chat (OpenAI) ✅ Complete -

Overall: 370 tests, 82% coverage

PyPI: https://pypi.org/project/powerbi-ontology-extractor/


🤖 MCP Server (Claude Code Integration)

Use PowerBI Ontology Extractor directly in Claude Code via MCP protocol.

Setup

  1. Install the package:
pip install powerbi-ontology-extractor
  1. Add to ~/.claude.json:
{
  "mcpServers": {
    "powerbi-ontology": {
      "command": "python",
      "args": ["-m", "powerbi_ontology.mcp_server"]
    }
  }
}

Optional: Add "env": {"OPENAI_API_KEY": "..."} for AI chat feature.

  1. Restart Claude Code

Available MCP Tools

Tool Description
pbix_extract Extract semantic model from .pbix file
ontology_generate Generate ontology from model data
export_owl Export to OWL format (xml/turtle)
export_json Export to JSON format
analyze_debt Analyze semantic debt across ontologies
ontology_diff Compare two ontology versions
ontology_merge Merge ontologies (three-way)
ontology_chat_ask AI Q&A about ontology

Usage Examples in Claude Code

# Extract and generate ontology
"Extract ontology from sales.pbix and export to OWL"

# Ask questions about ontology
"What entities are in the Sales_Returns ontology?"

# Compare versions
"Compare v1 and v2 ontologies and show differences"

🛠️ Development Setup

# Clone repository
git clone https://github.com/vpakspace/powerbi-ontology-extractor.git
cd powerbi-ontology-extractor

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install -e .

# Run tests
pytest

# Start Streamlit UI
streamlit run ontology_editor.py --server.port 8503

Environment Variables

Create .env file for AI Chat:

# Required for Ontology Chat
OPENAI_API_KEY=your-openai-api-key

# Optional: Model selection (default: gpt-4o-mini)
# OPENAI_MODEL=gpt-4o-mini

# Optional: Local models via Ollama
# OLLAMA_BASE_URL=http://localhost:11434/v1

🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Ways to contribute:

  • 🐛 Report bugs via GitHub Issues
  • 💡 Suggest features
  • 📝 Improve documentation
  • 🔧 Submit pull requests
  • ⭐ Star the repository

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🔗 Related Projects

Project Description
OntoGuard AI Semantic Firewall for AI Agents
Universal Agent Connector MCP Infrastructure + Streamlit UI

📞 Contact


Ready to unlock the semantic intelligence in your Power BI dashboards? 🚀

git clone https://github.com/vpakspace/powerbi-ontology-extractor.git
cd powerbi-ontology-extractor
pip install -r requirements.txt
streamlit run ontology_editor.py

Star ⭐ this repo if you find it useful!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

powerbi_ontology_extractor-0.1.1.tar.gz (14.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

powerbi_ontology_extractor-0.1.1-py3-none-any.whl (94.3 kB view details)

Uploaded Python 3

File details

Details for the file powerbi_ontology_extractor-0.1.1.tar.gz.

File metadata

File hashes

Hashes for powerbi_ontology_extractor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 45f57fdce31dfa4e2321883c9003fe131553c24cbc31d6ef983cf2992a134ef4
MD5 ed4ed97986d88780c495324a35643892
BLAKE2b-256 fb941ba227b4826e56dcf14190ace27cce27753884a0e903cf3b3edc04060e89

See more details on using hashes here.

File details

Details for the file powerbi_ontology_extractor-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for powerbi_ontology_extractor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9558e28d066dfba4ce8e9b294ca5ce187e611d55b556fa1fbd2e37e0d99fc059
MD5 bea3f27e88db2edabc058fa2233a4b2d
BLAKE2b-256 2e785fb1e8ac97d47d89a7241735e8d564d92b62cf62a4e929b9f58041ac5f5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page