Excel to PDF converter optimized for Google NotebookLM
Project description
exc-to-pdf
Excel to PDF converter optimized for Google NotebookLM analysis.
๐ฏ Overview
exc-to-pdf is a Python tool that converts Excel files (.xlsx) into PDF documents specifically optimized for AI analysis with Google NotebookLM. The tool preserves all data, maintains structure, and creates navigation-friendly PDFs that AI systems can effectively analyze.
Key Features
- ๐ Multi-sheet Support: Processes all worksheets in Excel files
- ๐ Table Detection: Automatically identifies and preserves table structures
- ๐ PDF Navigation: Creates bookmarks and structured PDF for easy AI navigation
- ๐ฏ NotebookLM Optimized: Text-based PDF output perfect for AI analysis
- โก High Quality: 100% data preservation with structured formatting
- ๐ Python Powered: Built with openpyxl, pandas, and reportlab
๐ Quick Start
Installation
From PyPI (Recommended)
pip install exc-to-pdf
From Source
# Clone the repository
git clone https://github.com/fulvian/exc-to-pdf.git
cd exc-to-pdf
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode
pip install -e .
Basic Usage
# Convert Excel to PDF
exc-to-pdf input.xlsx output.pdf
# With options
exc-to-pdf input.xlsx output.pdf --bookmarks --preserve-formatting
# Python module alternative
python -m exc_to_pdf input.xlsx output.pdf
Python API
from exc_to_pdf.excel_processor import ExcelProcessor
from exc_to_pdf.pdf_generator import PDFGenerator
# Process Excel file
processor = ExcelProcessor("input.xlsx")
sheets_data = processor.extract_all_sheets()
# Generate PDF
generator = PDFGenerator()
generator.create_pdf(sheets_data, "output.pdf")
๐ Requirements
- Python 3.9+
- Dependencies automatically installed with
pip install exc-to-pdf
Core Dependencies
- openpyxl (>=3.1.0) - Excel file parsing
- pandas (>=2.0.0) - Data processing
- reportlab (>=4.0.0) - PDF generation
- Pillow (>=10.0.0) - Image handling
- matplotlib (>=3.7.0) - Chart recreation
- babel (>=2.12.0) - Internationalization
- numpy (>=1.24.0) - Numerical computing
๐๏ธ Project Structure
exc-to-pdf/
โโโ src/ # Source code
โ โโโ excel_processor.py # Excel reading logic
โ โโโ pdf_generator.py # PDF generation
โ โโโ table_detector.py # Table identification
โ โโโ main.py # CLI interface
โโโ tests/ # Test suite
โ โโโ unit/ # Unit tests
โ โโโ integration/ # Integration tests
โ โโโ fixtures/ # Test data
โโโ docs/ # Documentation
โ โโโ idee_fondanti/ # Foundational documents
โ โโโ api/ # API documentation
โโโ scripts/ # Utility scripts
โโโ requirements.txt # Dependencies
๐ Development Workflow
This project follows the DevStream 7-Step Workflow:
- DISCUSS - Requirements analysis and planning
- ANALYZE - Technical analysis and research
- RESEARCH - Context7 and best practices research
- PLAN - Implementation planning
- APPROVE - Architecture validation
- IMPLEMENT - Code development
- VERIFY - Testing and validation
Current Development Phase
Phase: P1 - Project Foundation โ Next: P2 - Excel Processing Engine
See docs/idee_fondanti/piano_fondante_exc-to-pdf.md for complete development plan.
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=src
# Run specific test file
pytest tests/unit/test_excel_processor.py
๐ Architecture
Data Flow
Excel File โ openpyxl parsing โ pandas processing โ reportlab rendering โ PDF Output
Key Components
- ExcelProcessor: Reads and parses Excel files
- TableDetector: Identifies table structures
- PDFGenerator: Creates structured PDF output
- BookmarkManager: Adds navigation elements
๐ฏ Google NotebookLM Optimization
The PDF output is specifically designed for AI analysis:
- Text-based tables (not images)
- Structured navigation with bookmarks
- Accessibility tags for better AI understanding
- Semantic structure preservation
- Metadata inclusion for context
๐ Development Status
- Project Foundation (P1)
- Excel Processing Engine (P2)
- PDF Generation Engine (P3)
- Integration & Pipeline (P4)
- Quality Assurance (P5)
- Optimization (P6)
- Documentation & Release (P7)
๐ค Contributing
- Follow DevStream workflow
- Maintain 95%+ test coverage
- Use type hints and docstrings
- Pass code review validation
๐ License
MIT License - see LICENSE file for details.
๐ Related Projects
- Google NotebookLM - AI-powered notebook
- openpyxl - Excel file library
- reportlab - PDF generation library
- pandas - Data analysis library
Built with โค๏ธ using DevStream framework
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file exc_to_pdf-1.0.0.tar.gz.
File metadata
- Download URL: exc_to_pdf-1.0.0.tar.gz
- Upload date:
- Size: 170.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
571958411f3f266caf8dcf35d5acb72f55660ed86b9da4b1c2b38a968e412093
|
|
| MD5 |
109a5c8b2e24e2b7430025786f8453e8
|
|
| BLAKE2b-256 |
6b14962883997db5315b879fe72d030320896e167f045f8d2b9554871e10cc29
|
Provenance
The following attestation bundles were made for exc_to_pdf-1.0.0.tar.gz:
Publisher:
release.yml on fulvian/exc-to-pdf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
exc_to_pdf-1.0.0.tar.gz -
Subject digest:
571958411f3f266caf8dcf35d5acb72f55660ed86b9da4b1c2b38a968e412093 - Sigstore transparency entry: 633689000
- Sigstore integration time:
-
Permalink:
fulvian/exc-to-pdf@c3553424dc32b417126c40409a44fb577a756549 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/fulvian
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c3553424dc32b417126c40409a44fb577a756549 -
Trigger Event:
push
-
Statement type:
File details
Details for the file exc_to_pdf-1.0.0-py3-none-any.whl.
File metadata
- Download URL: exc_to_pdf-1.0.0-py3-none-any.whl
- Upload date:
- Size: 86.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38e401ddc4185ee2076eb5be9f5f242382cfb84f4ef1154c2c9806ab9772ec23
|
|
| MD5 |
aae9c5efc7ef5e19f926594021101ee2
|
|
| BLAKE2b-256 |
08644d1711f289860e96426a47d4f4e14894fceea3825ed20ad8b1d0e978ddaf
|
Provenance
The following attestation bundles were made for exc_to_pdf-1.0.0-py3-none-any.whl:
Publisher:
release.yml on fulvian/exc-to-pdf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
exc_to_pdf-1.0.0-py3-none-any.whl -
Subject digest:
38e401ddc4185ee2076eb5be9f5f242382cfb84f4ef1154c2c9806ab9772ec23 - Sigstore transparency entry: 633689002
- Sigstore integration time:
-
Permalink:
fulvian/exc-to-pdf@c3553424dc32b417126c40409a44fb577a756549 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/fulvian
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c3553424dc32b417126c40409a44fb577a756549 -
Trigger Event:
push
-
Statement type: