Skip to main content

Business Intelligence documentation tool for Power BI and Tableau

Project description

BI Documentation Tool

Version License Platform Tests

A powerful command-line tool for automatically generating comprehensive documentation from Business Intelligence files. Supports Power BI (.pbix) and Tableau (.twb/.twbx) workbooks, extracting detailed metadata to produce professional Markdown and JSON documentation.

๐ŸŽ‰ Version 1.0.0 - Production Ready! Complete with enterprise integration hooks for tools like Ataccama, Confluence, SharePoint, and more.

๐Ÿš€ Key Features

  • Multi-Format Support: Parse Power BI (.pbix) and Tableau (.twb/.twbx) files
  • Rich Metadata Extraction: Complete extraction of tables, fields, measures, calculations, data sources, and relationships
  • Dual Output Formats: Generate both human-readable Markdown and machine-readable JSON
  • Enterprise Integration: Built-in hooks for Ataccama, Confluence, SharePoint, Microsoft Purview, and more
  • Docker Ready: Containerized for easy CI/CD integration
  • Batch Processing: Process multiple files simultaneously
  • Cross-Platform: Works on Windows, macOS, and Linux
  • Robust Testing: 48+ comprehensive tests ensuring reliability
  • DAX Formatting: Professional formatting of DAX expressions in output

๐Ÿ“‹ Quick Start

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Installation

Option 1: Local Installation

# Clone the repository
git clone <repository-url>
cd bi-doc

# Create virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install the package in development mode
pip install -e .

Option 2: Docker (Recommended for CI/CD)

# Build the Docker image
docker build -t bidoc-tool .

# Verify installation
docker run bidoc-tool --help

๐Ÿ”— Enterprise Integration

The BI Documentation Tool provides comprehensive integration capabilities for enterprise data management platforms:

  • Data Catalogs: Ataccama DGC, Microsoft Purview, Apache Atlas, DataHub
  • Documentation Platforms: Confluence, SharePoint, GitBook, Notion
  • CI/CD Pipelines: GitHub Actions, Azure DevOps, Jenkins
  • Custom APIs: RESTful endpoints for internal systems

See INTEGRATION_HOOKS.md for detailed implementation examples and best practices.

Basic Usage

# Parse a single Power BI file
python -m bidoc -i report.pbix -o docs/ -f all

# Parse a Tableau workbook with verbose output
python -m bidoc -i dashboard.twbx -o docs/ -f markdown --verbose

# Batch processing multiple files
python -m bidoc -i *.pbix -i *.twbx -o docs/ -f all

# Generate AI-enhanced summaries (when configured)
python -m bidoc -i report.pbix -o docs/ --with-summary

Docker Usage

# Build the Docker image
docker build -t bidoc-tool .

# Run with mounted volumes
docker run -v $(pwd):/data bidoc-tool --input /data/report.pbix --output /data/docs

๐Ÿ“– Documentation

๐Ÿ” What Gets Extracted

Power BI (.pbix)

  • Data Model: Tables, columns, data types, relationships
  • DAX Measures: All measures with their formulas
  • Calculated Columns: Custom calculations and their DAX expressions
  • Data Sources: Connection details and source information
  • Report Layout: Pages, visuals, and field mappings
  • Power Query: M code and transformation steps

Tableau (.twb/.twbx)

  • Data Sources: Connection details and database information
  • Fields: Dimensions, measures, calculated fields with formulas
  • Worksheets: Individual sheet layouts and field usage
  • Dashboards: Dashboard structure and contained worksheets
  • Parameters: User-defined parameters and default values

๐Ÿ“„ Output Examples

Markdown Output

# Documentation for Sales Dashboard

## Data Sources
- **SQL Server**: server01.company.com/SalesDB
- **Excel File**: Q4_Targets.xlsx

## Tables and Fields
### Sales
| Field Name | Type | Description |
|------------|------|-------------|
| SalesAmount | Decimal | Total sales value |
| CustomerID | Integer | Customer identifier |
| TotalSalesYTD* | Decimal | Calculated: `SUM(Sales[SalesAmount])` |

## Visualizations
### Page: Overview
- **Bar Chart**: Sales by Region
  - Fields: [Geography.Region], [Sales.TotalSalesYTD]
- **Card**: Total Revenue
  - Field: [Sales.TotalSalesYTD]

JSON Output

{
  "file": "sales_dashboard.pbix",
  "type": "Power BI",
  "data_sources": [
    {
      "name": "SalesDB",
      "connection": "sqlserver://server01.company.com/SalesDB",
      "tables": [
        {
          "name": "Sales",
          "columns": [
            {"name": "SalesAmount", "data_type": "Decimal"},
            {"name": "CustomerID", "data_type": "Integer"}
          ],
          "measures": [
            {"name": "TotalSalesYTD", "expression": "SUM(Sales[SalesAmount])"}
          ]
        }
      ]
    }
  ]
}

๐Ÿ—๏ธ Architecture

The tool follows a modular architecture with clear separation of concerns:

bidoc/
โ”œโ”€โ”€ cli.py              # Command-line interface
โ”œโ”€โ”€ pbix_parser.py      # Power BI parsing logic
โ”œโ”€โ”€ tableau_parser.py   # Tableau parsing logic
โ”œโ”€โ”€ markdown_generator.py  # Markdown output formatting
โ”œโ”€โ”€ json_generator.py   # JSON output formatting
โ”œโ”€โ”€ ai_summary.py       # AI integration hooks
โ””โ”€โ”€ utils.py           # Common utilities

๐Ÿš€ Current Status & Roadmap

โœ… Completed Features

  • Multi-format Support: Robust parsing of Power BI (.pbix) and Tableau (.twb/.twbx) files
  • Rich Metadata Extraction: Complete extraction of datasets, fields, measures, calculations, and visuals
  • Dual Output Formats: High-quality Markdown (markdownlint compliant) and structured JSON
  • Production Ready: Robust CLI, Docker support, comprehensive error handling
  • Batch Processing: Efficient processing of multiple files with detailed logging
  • Modular Architecture: Extensible design ready for future enhancements

๐Ÿ”„ Next Priorities

For detailed roadmap and quality-of-life improvements, see QOL_SUGGESTIONS.md:

  • Enhanced User Experience: Progress indicators, better error messages, interactive mode
  • Performance Optimizations: Parallel processing, incremental updates, caching
  • Output Quality: Enhanced Markdown with TOC, collapsible sections, syntax highlighting
  • Enterprise Integration: Git hooks, Confluence export, SharePoint integration
  • Advanced Analytics: Usage patterns, similarity detection, AI-powered insights

๐Ÿ”ง Dependencies

  • pbixray: Power BI file parsing (>=0.3.3)
  • tableaudocumentapi: Tableau workbook parsing (>=0.11)
  • click: CLI framework (>=8.0.0)
  • jinja2: Template rendering (>=3.1.0)
  • pandas: Data processing (>=1.5.0)
  • lxml: XML processing (>=4.9.0)
  • colorama: Cross-platform colored output (>=0.4.0)

๐Ÿ™ Acknowledgments

We are grateful to the open-source community and the following projects that make this tool possible:

  • PBIXRay by Arjen van Stam - Essential Power BI file parsing capabilities
  • Tableau Document API by Tableau Software - Comprehensive Tableau workbook analysis
  • Microsoft - Power BI sample files for testing and demonstration
  • Python Community - The amazing ecosystem of libraries (pandas, click, jinja2, etc.)

For complete attribution and licensing information, see THIRD_PARTY_LICENSES.md.

๐Ÿค Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature
  3. Make your changes and add tests
  4. Ensure all tests pass: python -m pytest tests/ -v
  5. Submit a pull request

Development Setup

# Clone and setup development environment
git clone <repository-url>
cd bi-doc
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
pip install -e .

# Run tests
python -m pytest tests/ -v

# Run with coverage
pip install pytest-cov
python -m pytest --cov=bidoc tests/

๐Ÿ“„ License

This project is licensed under the Business Source License 1.1 (BSL) - see the LICENSE file for details.

๐Ÿ†˜ Support

  • Documentation: See USER_GUIDE.md for detailed usage instructions
  • Integration Guide: See INTEGRATION_HOOKS.md for enterprise integration patterns
  • Issues: Report bugs and feature requests on GitHub
  • Discussions: Join community discussions for questions and ideas

๐Ÿ“Š Status

  • โœ… Production Ready: All core features implemented and tested
  • โœ… Docker Support: Containerized for easy deployment
  • โœ… CI/CD Ready: Automated testing and deployment pipelines
  • โœ… Enterprise Integration: Hooks for major data platforms
  • ๐Ÿ”„ Active Development: Regular updates and new features

Made with โค๏ธ for the BI community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bidoc-1.0.0.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bidoc-1.0.0-py3-none-any.whl (40.1 kB view details)

Uploaded Python 3

File details

Details for the file bidoc-1.0.0.tar.gz.

File metadata

  • Download URL: bidoc-1.0.0.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for bidoc-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2330b182b45ee0554f749689427221ea1bea22bd7b0345ceb3d536b0c76b1913
MD5 59ca3627825558d88a378b966a2f07e1
BLAKE2b-256 82bbcc651f365dd47d3641052992493685e7c0e909cccc513a160f0d769a53fb

See more details on using hashes here.

File details

Details for the file bidoc-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: bidoc-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 40.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for bidoc-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2734009a2fce1d12c232c92fdf72a571867140e7868df68fdad49e81917d0069
MD5 6a1695e3247f317660e4b3be91baffe8
BLAKE2b-256 07650a5fef343e19983acfc9111c00410f3fcf03c44ae7077fa709a11503f35f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page