Business Intelligence documentation tool for Power BI and Tableau
Project description
BI Documentation Tool
A powerful command-line tool for automatically generating comprehensive documentation from Business Intelligence files. Supports Power BI (.pbix) and Tableau (.twb/.twbx) workbooks, extracting detailed metadata to produce professional Markdown and JSON documentation.
๐ Version 1.0.0 - Production Ready! Complete with enterprise integration hooks for tools like Ataccama, Confluence, SharePoint, and more.
๐ Key Features
- Multi-Format Support: Parse Power BI (.pbix) and Tableau (.twb/.twbx) files
- Rich Metadata Extraction: Complete extraction of tables, fields, measures, calculations, data sources, and relationships
- Dual Output Formats: Generate both human-readable Markdown and machine-readable JSON
- Enterprise Integration: Built-in hooks for Ataccama, Confluence, SharePoint, Microsoft Purview, and more
- Docker Ready: Containerized for easy CI/CD integration
- Batch Processing: Process multiple files simultaneously
- Cross-Platform: Works on Windows, macOS, and Linux
- Robust Testing: 48+ comprehensive tests ensuring reliability
- DAX Formatting: Professional formatting of DAX expressions in output
๐ Quick Start
Prerequisites
- Python 3.8 or higher
- pip package manager
Installation
Option 1: Local Installation
# Clone the repository
git clone <repository-url>
cd bi-doc
# Create virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install the package in development mode
pip install -e .
Option 2: Docker (Recommended for CI/CD)
# Build the Docker image
docker build -t bidoc-tool .
# Verify installation
docker run bidoc-tool --help
๐ Enterprise Integration
The BI Documentation Tool provides comprehensive integration capabilities for enterprise data management platforms:
- Data Catalogs: Ataccama DGC, Microsoft Purview, Apache Atlas, DataHub
- Documentation Platforms: Confluence, SharePoint, GitBook, Notion
- CI/CD Pipelines: GitHub Actions, Azure DevOps, Jenkins
- Custom APIs: RESTful endpoints for internal systems
See INTEGRATION_HOOKS.md for detailed implementation examples and best practices.
Basic Usage
# Parse a single Power BI file
python -m bidoc -i report.pbix -o docs/ -f all
# Parse a Tableau workbook with verbose output
python -m bidoc -i dashboard.twbx -o docs/ -f markdown --verbose
# Batch processing multiple files
python -m bidoc -i *.pbix -i *.twbx -o docs/ -f all
# Generate AI-enhanced summaries (when configured)
python -m bidoc -i report.pbix -o docs/ --with-summary
Docker Usage
# Build the Docker image
docker build -t bidoc-tool .
# Run with mounted volumes
docker run -v $(pwd):/data bidoc-tool --input /data/report.pbix --output /data/docs
๐ Documentation
- User Guide - Comprehensive usage guide with examples
- Integration Hooks - Enterprise integration patterns and examples
- Roadmap - Development roadmap and planned features
- Contributing - How to contribute to the project
- Changelog - Version history and changes
๐ What Gets Extracted
Power BI (.pbix)
- Data Model: Tables, columns, data types, relationships
- DAX Measures: All measures with their formulas
- Calculated Columns: Custom calculations and their DAX expressions
- Data Sources: Connection details and source information
- Report Layout: Pages, visuals, and field mappings
- Power Query: M code and transformation steps
Tableau (.twb/.twbx)
- Data Sources: Connection details and database information
- Fields: Dimensions, measures, calculated fields with formulas
- Worksheets: Individual sheet layouts and field usage
- Dashboards: Dashboard structure and contained worksheets
- Parameters: User-defined parameters and default values
๐ Output Examples
Markdown Output
# Documentation for Sales Dashboard
## Data Sources
- **SQL Server**: server01.company.com/SalesDB
- **Excel File**: Q4_Targets.xlsx
## Tables and Fields
### Sales
| Field Name | Type | Description |
|------------|------|-------------|
| SalesAmount | Decimal | Total sales value |
| CustomerID | Integer | Customer identifier |
| TotalSalesYTD* | Decimal | Calculated: `SUM(Sales[SalesAmount])` |
## Visualizations
### Page: Overview
- **Bar Chart**: Sales by Region
- Fields: [Geography.Region], [Sales.TotalSalesYTD]
- **Card**: Total Revenue
- Field: [Sales.TotalSalesYTD]
JSON Output
{
"file": "sales_dashboard.pbix",
"type": "Power BI",
"data_sources": [
{
"name": "SalesDB",
"connection": "sqlserver://server01.company.com/SalesDB",
"tables": [
{
"name": "Sales",
"columns": [
{"name": "SalesAmount", "data_type": "Decimal"},
{"name": "CustomerID", "data_type": "Integer"}
],
"measures": [
{"name": "TotalSalesYTD", "expression": "SUM(Sales[SalesAmount])"}
]
}
]
}
]
}
๐๏ธ Architecture
The tool follows a modular architecture with clear separation of concerns:
bidoc/
โโโ cli.py # Command-line interface
โโโ pbix_parser.py # Power BI parsing logic
โโโ tableau_parser.py # Tableau parsing logic
โโโ markdown_generator.py # Markdown output formatting
โโโ json_generator.py # JSON output formatting
โโโ ai_summary.py # AI integration hooks
โโโ utils.py # Common utilities
๐ Current Status & Roadmap
โ Completed Features
- Multi-format Support: Robust parsing of Power BI (.pbix) and Tableau (.twb/.twbx) files
- Rich Metadata Extraction: Complete extraction of datasets, fields, measures, calculations, and visuals
- Dual Output Formats: High-quality Markdown (markdownlint compliant) and structured JSON
- Production Ready: Robust CLI, Docker support, comprehensive error handling
- Batch Processing: Efficient processing of multiple files with detailed logging
- Modular Architecture: Extensible design ready for future enhancements
๐ Next Priorities
For detailed roadmap and quality-of-life improvements, see QOL_SUGGESTIONS.md:
- Enhanced User Experience: Progress indicators, better error messages, interactive mode
- Performance Optimizations: Parallel processing, incremental updates, caching
- Output Quality: Enhanced Markdown with TOC, collapsible sections, syntax highlighting
- Enterprise Integration: Git hooks, Confluence export, SharePoint integration
- Advanced Analytics: Usage patterns, similarity detection, AI-powered insights
๐ง Dependencies
- pbixray: Power BI file parsing (
>=0.3.3) - tableaudocumentapi: Tableau workbook parsing (
>=0.11) - click: CLI framework (
>=8.0.0) - jinja2: Template rendering (
>=3.1.0) - pandas: Data processing (
>=1.5.0) - lxml: XML processing (
>=4.9.0) - colorama: Cross-platform colored output (
>=0.4.0)
๐ Acknowledgments
We are grateful to the open-source community and the following projects that make this tool possible:
- PBIXRay by Arjen van Stam - Essential Power BI file parsing capabilities
- Tableau Document API by Tableau Software - Comprehensive Tableau workbook analysis
- Microsoft - Power BI sample files for testing and demonstration
- Python Community - The amazing ecosystem of libraries (pandas, click, jinja2, etc.)
For complete attribution and licensing information, see THIRD_PARTY_LICENSES.md.
๐ค Contributing
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes and add tests
- Ensure all tests pass:
python -m pytest tests/ -v - Submit a pull request
Development Setup
# Clone and setup development environment
git clone <repository-url>
cd bi-doc
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
pip install -e .
# Run tests
python -m pytest tests/ -v
# Run with coverage
pip install pytest-cov
python -m pytest --cov=bidoc tests/
๐ License
This project is licensed under the Business Source License 1.1 (BSL) - see the LICENSE file for details.
๐ Support
- Documentation: See USER_GUIDE.md for detailed usage instructions
- Integration Guide: See INTEGRATION_HOOKS.md for enterprise integration patterns
- Issues: Report bugs and feature requests on GitHub
- Discussions: Join community discussions for questions and ideas
๐ Status
- โ Production Ready: All core features implemented and tested
- โ Docker Support: Containerized for easy deployment
- โ CI/CD Ready: Automated testing and deployment pipelines
- โ Enterprise Integration: Hooks for major data platforms
- ๐ Active Development: Regular updates and new features
Made with โค๏ธ for the BI community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bidoc-1.0.0.tar.gz.
File metadata
- Download URL: bidoc-1.0.0.tar.gz
- Upload date:
- Size: 2.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2330b182b45ee0554f749689427221ea1bea22bd7b0345ceb3d536b0c76b1913
|
|
| MD5 |
59ca3627825558d88a378b966a2f07e1
|
|
| BLAKE2b-256 |
82bbcc651f365dd47d3641052992493685e7c0e909cccc513a160f0d769a53fb
|
File details
Details for the file bidoc-1.0.0-py3-none-any.whl.
File metadata
- Download URL: bidoc-1.0.0-py3-none-any.whl
- Upload date:
- Size: 40.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2734009a2fce1d12c232c92fdf72a571867140e7868df68fdad49e81917d0069
|
|
| MD5 |
6a1695e3247f317660e4b3be91baffe8
|
|
| BLAKE2b-256 |
07650a5fef343e19983acfc9111c00410f3fcf03c44ae7077fa709a11503f35f
|