Skip to main content

Excel to PDF converter optimized for Google NotebookLM

Project description

exc-to-pdf

Excel to PDF converter optimized for Google NotebookLM analysis.

๐ŸŽฏ Overview

exc-to-pdf is a Python tool that converts Excel files (.xlsx) into PDF documents specifically optimized for AI analysis with Google NotebookLM. The tool preserves all data, maintains structure, and creates navigation-friendly PDFs that AI systems can effectively analyze.

Key Features

  • ๐Ÿ“Š Multi-sheet Support: Processes all worksheets in Excel files
  • ๐Ÿ” Table Detection: Automatically identifies and preserves table structures
  • ๐Ÿ“‘ PDF Navigation: Creates bookmarks and structured PDF for easy AI navigation
  • ๐ŸŽฏ NotebookLM Optimized: Text-based PDF output perfect for AI analysis
  • โšก High Quality: 100% data preservation with structured formatting
  • ๐Ÿ Python Powered: Built with openpyxl, pandas, and reportlab

๐Ÿš€ Quick Start

Installation

From PyPI (Recommended)

pip install exc-to-pdf

From Source

# Clone the repository
git clone https://github.com/fulvian/exc-to-pdf.git
cd exc-to-pdf

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode
pip install -e .

Basic Usage

# Convert Excel to PDF
exc-to-pdf input.xlsx output.pdf

# With options
exc-to-pdf input.xlsx output.pdf --bookmarks --preserve-formatting

# Python module alternative
python -m exc_to_pdf input.xlsx output.pdf

Python API

from exc_to_pdf.excel_processor import ExcelProcessor
from exc_to_pdf.pdf_generator import PDFGenerator

# Process Excel file
processor = ExcelProcessor("input.xlsx")
sheets_data = processor.extract_all_sheets()

# Generate PDF
generator = PDFGenerator()
generator.create_pdf(sheets_data, "output.pdf")

๐Ÿ“‹ Requirements

  • Python 3.9+
  • Dependencies automatically installed with pip install exc-to-pdf

Core Dependencies

  • openpyxl (>=3.1.0) - Excel file parsing
  • pandas (>=2.0.0) - Data processing
  • reportlab (>=4.0.0) - PDF generation
  • Pillow (>=10.0.0) - Image handling
  • matplotlib (>=3.7.0) - Chart recreation
  • babel (>=2.12.0) - Internationalization
  • numpy (>=1.24.0) - Numerical computing

๐Ÿ—๏ธ Project Structure

exc-to-pdf/
โ”œโ”€โ”€ src/                    # Source code
โ”‚   โ”œโ”€โ”€ excel_processor.py  # Excel reading logic
โ”‚   โ”œโ”€โ”€ pdf_generator.py    # PDF generation
โ”‚   โ”œโ”€โ”€ table_detector.py   # Table identification
โ”‚   โ””โ”€โ”€ main.py            # CLI interface
โ”œโ”€โ”€ tests/                  # Test suite
โ”‚   โ”œโ”€โ”€ unit/              # Unit tests
โ”‚   โ”œโ”€โ”€ integration/       # Integration tests
โ”‚   โ””โ”€โ”€ fixtures/          # Test data
โ”œโ”€โ”€ docs/                  # Documentation
โ”‚   โ”œโ”€โ”€ idee_fondanti/     # Foundational documents
โ”‚   โ””โ”€โ”€ api/               # API documentation
โ”œโ”€โ”€ scripts/               # Utility scripts
โ””โ”€โ”€ requirements.txt       # Dependencies

๐Ÿ”„ Development Workflow

This project follows the DevStream 7-Step Workflow:

  1. DISCUSS - Requirements analysis and planning
  2. ANALYZE - Technical analysis and research
  3. RESEARCH - Context7 and best practices research
  4. PLAN - Implementation planning
  5. APPROVE - Architecture validation
  6. IMPLEMENT - Code development
  7. VERIFY - Testing and validation

Current Development Phase

Phase: P1 - Project Foundation โœ… Next: P2 - Excel Processing Engine

See docs/idee_fondanti/piano_fondante_exc-to-pdf.md for complete development plan.

๐Ÿงช Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src

# Run specific test file
pytest tests/unit/test_excel_processor.py

๐Ÿ“Š Architecture

Data Flow

Excel File โ†’ openpyxl parsing โ†’ pandas processing โ†’ reportlab rendering โ†’ PDF Output

Key Components

  1. ExcelProcessor: Reads and parses Excel files
  2. TableDetector: Identifies table structures
  3. PDFGenerator: Creates structured PDF output
  4. BookmarkManager: Adds navigation elements

๐ŸŽฏ Google NotebookLM Optimization

The PDF output is specifically designed for AI analysis:

  • Text-based tables (not images)
  • Structured navigation with bookmarks
  • Accessibility tags for better AI understanding
  • Semantic structure preservation
  • Metadata inclusion for context

๐Ÿ“ Development Status

  • Project Foundation (P1)
  • Excel Processing Engine (P2)
  • PDF Generation Engine (P3)
  • Integration & Pipeline (P4)
  • Quality Assurance (P5)
  • Optimization (P6)
  • Documentation & Release (P7)

๐Ÿค Contributing

  1. Follow DevStream workflow
  2. Maintain 95%+ test coverage
  3. Use type hints and docstrings
  4. Pass code review validation

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ”— Related Projects


Built with โค๏ธ using DevStream framework

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exc_to_pdf-1.0.0.tar.gz (170.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exc_to_pdf-1.0.0-py3-none-any.whl (86.3 kB view details)

Uploaded Python 3

File details

Details for the file exc_to_pdf-1.0.0.tar.gz.

File metadata

  • Download URL: exc_to_pdf-1.0.0.tar.gz
  • Upload date:
  • Size: 170.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for exc_to_pdf-1.0.0.tar.gz
Algorithm Hash digest
SHA256 571958411f3f266caf8dcf35d5acb72f55660ed86b9da4b1c2b38a968e412093
MD5 109a5c8b2e24e2b7430025786f8453e8
BLAKE2b-256 6b14962883997db5315b879fe72d030320896e167f045f8d2b9554871e10cc29

See more details on using hashes here.

Provenance

The following attestation bundles were made for exc_to_pdf-1.0.0.tar.gz:

Publisher: release.yml on fulvian/exc-to-pdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exc_to_pdf-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: exc_to_pdf-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 86.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for exc_to_pdf-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38e401ddc4185ee2076eb5be9f5f242382cfb84f4ef1154c2c9806ab9772ec23
MD5 aae9c5efc7ef5e19f926594021101ee2
BLAKE2b-256 08644d1711f289860e96426a47d4f4e14894fceea3825ed20ad8b1d0e978ddaf

See more details on using hashes here.

Provenance

The following attestation bundles were made for exc_to_pdf-1.0.0-py3-none-any.whl:

Publisher: release.yml on fulvian/exc-to-pdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page