A tool for compiling reports from various sources.
Project description
Report Compiler
A Python-based automated DOCX and PDF report compiler for engineering teams. This tool allows engineers to write reports in Word, use placeholders to insert external PDFs, and compile everything into a professional PDF with a single command.
Overview
The Report Compiler automates the creation of comprehensive PDF reports by:
- Finding PDF placeholders in Word documents using two types of tags:
[[OVERLAY: path/to/file.pdf, page=5]]for table-based overlays[[INSERT: path/to/file.pdf]]for paragraph-based insertions
- Modifying the Word document to create markers and page breaks
- Converting to PDF using Word automation (win32com)
- Processing PDF insertions with overlays and merges using PyMuPDF
Features
- ✅ Two insertion types - Table-based overlays and paragraph-based merges
- ✅ Relative path support - PDF paths resolved relative to the input Word document
- ✅ Page selection support - Specify which pages to include from source PDFs using flexible syntax
- ✅ Multi-page PDF support - Automatic cell replication for multi-page table overlays
- ✅ Annotation preservation - PDF annotations automatically baked into content during processing
- ✅ Marker removal - Automatic removal of placement markers from final PDF
- ✅ Robust page breaks - Proper page breaks for paragraph-based insertions
- ✅ Error handling - Comprehensive error reporting and validation
- ✅ Debug support -
--keep-tempflag to retain temporary files for debugging - ✅ Table-based overlay - Precise PDF placement using table dimensions and marker positioning
- ✅ Cell replication - Multi-page PDFs create consecutive table cells automatically
- ✅ Intelligent positioning - Uses table properties for automatic overlay rectangle calculation
- ✅ Modular architecture - Clean separation of concerns with focused classes and modules
Architecture
The Report Compiler uses a modular architecture with clear separation of responsibilities:
Core Modules
-
report_compiler.core- Main orchestration and configurationReportCompiler- Main orchestrator classConfig- Configuration management and constants
-
report_compiler.document- Word document processingPlaceholderParser- Detects and parses PDF placeholdersDocxProcessor- Modifies DOCX files (markers, page breaks, cell replication)WordConverter- Converts DOCX to PDF using Word automation
-
report_compiler.pdf- PDF processing and manipulationContentAnalyzer- Analyzes PDF content and structureOverlayProcessor- Handles table-based PDF overlaysMergeProcessor- Handles paragraph-based PDF mergesMarkerRemover- Removes placement markers from final PDF
-
report_compiler.utils- Utility classes and helpersFileManager- Temporary file management and cleanupValidators- Input validation and PDF verificationPageSelector- Page selection parsing and processing
Usage as a Library
from report_compiler.core.compiler import ReportCompiler
# Basic usage
compiler = ReportCompiler("input.docx", "output.pdf")
compiler.compile()
# With debug mode
compiler = ReportCompiler("input.docx", "output.pdf", keep_temp=True)
compiler.compile()
Quick Start
Installation
pip install -r requirements.txt
Basic Usage
report-compiler compile input_report.docx output_report.pdf
Debug Mode (with temp files)
report-compiler compile input_report.docx output_report.pdf --keep-temp
Placeholder Format
The Report Compiler supports two types of PDF insertion placeholders:
Table-based Overlays (OVERLAY tags)
For inserting PDFs as overlays onto existing pages, preserving the main document's content and layout. Place these in single-cell (1x1) tables:
[[OVERLAY: appendices/sketch.pdf]]
[[OVERLAY: calculations/diagram.pdf, page=2]]
[[OVERLAY: C:\Shared\drawing.pdf, page=1-3]]
[[OVERLAY: diagrams/full_page.pdf, crop=false]]
[[OVERLAY: sketches/detail.pdf, page=2, crop=false]]
OVERLAY Parameters:
page=- Page selection (same format as INSERT)crop=- Content cropping control:crop=true(default): Automatically crops to content bounding box, removing excess whitespacecrop=false: Uses the full page dimensions without cropping
Paragraph-based Merges (INSERT tags)
For inserting entire PDF pages after a marker position. The original paragraph content is preserved, and PDF pages are inserted immediately after it. Place these in standalone paragraphs:
[[INSERT: appendices/structural_analysis.pdf]]
[[INSERT: calculations/load_analysis.pdf:1-5]]
[[INSERT: C:\Shared\external_report.pdf]]
Page Selection
Both OVERLAY and INSERT tags support page selection:
OVERLAY page selection (using page= parameter):
[[OVERLAY: appendices/report.pdf, page=5]] # Page 5 only
[[OVERLAY: appendices/report.pdf, page=1-3]] # Pages 1, 2, and 3
[[OVERLAY: appendices/report.pdf, page=1,3,5]] # Pages 1, 3, and 5
[[OVERLAY: appendices/report.pdf, page=2-]] # Pages 2 to end
INSERT page selection (using : separator):
[[INSERT: appendices/report.pdf:1-3]] # Pages 1, 2, and 3
[[INSERT: appendices/report.pdf:5]] # Page 5 only
[[INSERT: appendices/report.pdf:1,3,5]] # Pages 1, 3, and 5
[[INSERT: appendices/report.pdf:2-]] # Pages 2 to end
[[INSERT: appendices/report.pdf:1-3,7,9-]] # Mixed: pages 1-3, 7, and 9 to end
Page Selection Formats:
5- Single page (page 5)1-3- Range of pages (pages 1, 2, 3)2-- Open-ended range (pages 2 to end of document)1,3,5- Specific pages (pages 1, 3, and 5)1-3,7,9-12- Combined specifications
Note: Page numbers are 1-indexed (first page = 1). Invalid page numbers are automatically filtered out.
Multi-page PDFs: Automatically handled via cell replication (table-based overlays) or sequential page insertion (paragraph-based merges)
Note: Relative paths are resolved relative to the Word document's location.
How It Works
1. Placeholder Detection
- Table scanning - Identifies
[[OVERLAY: ...]]tags in single-cell tables - Paragraph scanning - Identifies
[[INSERT: ...]]tags in standalone paragraphs - Path resolution - Resolves relative paths relative to Word document location
- Page parsing - Parses page selection syntax (e.g.,
:1-3,,page=5) - PDF validation - Validates that referenced PDF files exist and are readable
- Page counting - Counts effective pages after applying page selection filters
- Layout detection - Identifies single-cell tables vs standalone paragraphs
2. Document Modification
- Table placeholders - Replaces with visible red markers (
%%OVERLAY_START_N%%) - Cell replication - Creates additional table cells for multi-page selections
- Paragraph placeholders - Replaces with merge markers and page breaks (
%%MERGE_START_N%%) - Marker placement - Places markers first, then page breaks for correct timing
- Temporary document - Saves modified document for PDF conversion
3. PDF Conversion
- Converts modified Word document to PDF using Word automation
- Preserves formatting and creates base PDF with markers
4. PDF Processing
Paragraph-based Merges (INSERT)
- Marker location - Finds merge markers in the base PDF
- Marker removal - Removes markers using redaction (white fill)
- Page insertion - Inserts PDF pages immediately after marker position
- Content preservation - Original document content remains intact
Table-based Overlays (OVERLAY)
- Page selection - Processes only the specified pages from source PDFs
- Annotation preservation - Automatically bakes PDF annotations into content using
Document.bake() - Multi-page support - Creates additional table cells for multi-page selections
- Precise positioning - Searches for overlay markers in the base PDF
- Rectangle calculation - Uses the marker position as the top-left corner of the overlay area
- Marker removal - Removes markers using redaction (white fill)
- Sequential overlay - Overlays each selected page onto calculated rectangles
- Final assembly - Saves completed PDF with all appendices integrated
Table-Based Overlay System
The Report Compiler uses a precise approach for PDF overlay placement with full support for multi-page PDFs and annotation preservation:
Single-Page PDF Overlay
- Table Detection - Identifies single-cell tables containing
[[OVERLAY: path.pdf]]placeholders - Page Selection - Parses page specifications like
,page=1-3or,page=5if provided - Dimension Extraction - Extracts exact table dimensions from Word document metadata
- Marker Placement - Places a red marker at the top-left of the table cell
- Rectangle Calculation - Uses marker position + table dimensions = overlay area
- Annotation Preservation - Bakes PDF annotations into content before overlay
- Precise Overlay - Places selected PDF pages exactly within the calculated rectangle
Multi-Page PDF Overlay
For multi-page PDFs or page selections, the system automatically replicates table cells:
- Page Detection - Identifies PDFs with multiple pages or page selections
- Cell Replication - Adds consecutive table rows for each selected page
- Marker Generation - Creates unique markers for each cell (
%%OVERLAY_START_00_PAGE_02%%) - Sequential Overlay - Overlays selected pages into consecutive table cells
- Unified Layout - All selected PDF pages appear together in the same table area
Page Selection Examples
[[OVERLAY: report.pdf, page=1-3]] → 3 table cells with pages 1, 2, 3
[[OVERLAY: report.pdf, page=2,5,7]] → 3 table cells with pages 2, 5, 7
[[OVERLAY: report.pdf, page=3-]] → Multiple cells with pages 3 to end
Example Output
Single Table → Page Selection:
┌─────────────────┐
│ PDF Page 2 │ ← Only page 2 (from [[OVERLAY: doc.pdf, page=2]])
└─────────────────┘
Single Table → Multi-Page Selection:
┌─────────────────┐
│ PDF Page 1 │ ← From [[OVERLAY: doc.pdf, page=1,3,5]]
├─────────────────┤
│ PDF Page 3 │ ← Replicated cell
├─────────────────┤
│ PDF Page 5 │ ← Replicated cell
└─────────────────┘
Example Debug Output
📋 Table found: 7.50 x 4.00 inches
📍 Marker at: (0.50, 1.59) inches
📐 Overlay: (0.50, 1.59) to (8.00, 5.59) inches
🔥 Baking annotations: 12 found
✅ PDF positioned perfectly
Key Benefits
- Simple & Reliable - Single marker approach with cell replication
- Flexible Page Selection - Extract exactly the pages you need from large PDFs
- Multi-page Support - Automatic handling of PDFs with any number of pages
- Annotation Preservation - PDF annotations automatically preserved during overlay
- Accurate - Uses Word's own measurements
- Easy to Debug - Clear inch measurements and detailed logging with page selection info
- Consistent - Predictable placement and unified layout
Example Workflow
Input: bridge_report.docx containing [[INSERT: appendices/analysis.pdf:2-4,7]]
↓
Step 1: Find placeholder and validate analysis.pdf (10 pages)
Parse page spec "2-4,7" → pages 2, 3, 4, 7 (4 pages selected)
↓
Step 2: Replace placeholder with marker + replicate table cells for 4 pages
↓
Step 3: Convert modified DOCX to PDF (creates base PDF with 4 table cells)
↓
Step 4: Bake annotations, find markers, overlay pages 2,3,4,7 sequentially
↓
Output: bridge_report.pdf with selected pages integrated in consecutive cells
Requirements
- Windows (for Word automation via win32com)
- Microsoft Word installed and accessible
- Python 3.7+
- Dependencies:
python-docx,pywin32,PyMuPDF
VS Code Debugging
The project includes comprehensive VS Code launch configurations:
- Debug Report Compiler - Example File - Basic debugging with example file
- Debug Report Compiler - Example File (Keep Temp) - Debug with temp files retained
- Debug Report Compiler - Custom Input - Interactive file input debugging
- Debug Report Compiler - Step Into All Code - Detailed debugging with all code
- Debug Report Compiler - Error Testing - Test error handling scenarios
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file report_compiler-0.1.0.tar.gz.
File metadata
- Download URL: report_compiler-0.1.0.tar.gz
- Upload date:
- Size: 39.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65918e6dcfa13d2c3a0067232587a09b122b355c1a44b5d9aa993cb3e2717800
|
|
| MD5 |
78263bca3fb25339c07991633b819628
|
|
| BLAKE2b-256 |
9b6c71ee24d50ecd65f78ec8c68ca73f814b7cc9f161d547568672e4f2b7c7e1
|
Provenance
The following attestation bundles were made for report_compiler-0.1.0.tar.gz:
Publisher:
pypi-publish.yml on Mark-Milkis/report-compiler
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
report_compiler-0.1.0.tar.gz -
Subject digest:
65918e6dcfa13d2c3a0067232587a09b122b355c1a44b5d9aa993cb3e2717800 - Sigstore transparency entry: 283808972
- Sigstore integration time:
-
Permalink:
Mark-Milkis/report-compiler@189ef61e1fa4b07efbc104a15cefa680f235aabf -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Mark-Milkis
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@189ef61e1fa4b07efbc104a15cefa680f235aabf -
Trigger Event:
release
-
Statement type:
File details
Details for the file report_compiler-0.1.0-py3-none-any.whl.
File metadata
- Download URL: report_compiler-0.1.0-py3-none-any.whl
- Upload date:
- Size: 44.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b51cd2196af7efff47fa1ade2215434e8e8a434fdb58897922d1e8698fc54c9
|
|
| MD5 |
fc7372e573fd7d2ad3e3a3f55de968de
|
|
| BLAKE2b-256 |
0b28428ca91eb90fe1c83cab3eb6a379d4721c9437b1ee0f70274608ab6ce5f3
|
Provenance
The following attestation bundles were made for report_compiler-0.1.0-py3-none-any.whl:
Publisher:
pypi-publish.yml on Mark-Milkis/report-compiler
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
report_compiler-0.1.0-py3-none-any.whl -
Subject digest:
1b51cd2196af7efff47fa1ade2215434e8e8a434fdb58897922d1e8698fc54c9 - Sigstore transparency entry: 283809021
- Sigstore integration time:
-
Permalink:
Mark-Milkis/report-compiler@189ef61e1fa4b07efbc104a15cefa680f235aabf -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Mark-Milkis
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@189ef61e1fa4b07efbc104a15cefa680f235aabf -
Trigger Event:
release
-
Statement type: