No project description provided

These details have not been verified by PyPI

Project links

Project description

this_file: README.md

Vexy PDF Werk

Transform PDFs into high-quality, accessible formats with AI-enhanced processing

Vexy PDF Werk (VPW) is a Python package that converts PDF documents into multiple high-quality formats using modern tools and optional AI enhancement. Transform your PDFs into PDF/A archives, paginated Markdown, ePub books, and structured bibliographic metadata.

Features

🔧 Modern PDF Processing

PDF/A conversion for long-term archival
OCR enhancement using OCRmyPDF
Quality optimization with qpdf

📚 Multiple Output Formats

Paginated Markdown documents with smart naming
ePub generation from Markdown
Structured bibliographic YAML metadata
Preserves original PDF alongside enhanced versions

🤖 Optional AI Enhancement

Text correction using Claude or Gemini CLI
Content structure optimization
Fallback to proven traditional methods

⚙️ Flexible Architecture

Multiple conversion backends (Marker, MarkItDown, Docling, basic)
Platform-appropriate configuration storage
Robust error handling with graceful fallbacks

Quick Start

Installation

# Install from PyPI
pip install vexy-pdf-werk

# Or install in development mode
git clone https://github.com/vexyart/vexy-pdf-werk
cd vexy-pdf-werk
pip install -e .

Basic Usage

import vexy_pdf_werk

# Process a PDF with default settings
config = vexy_pdf_werk.Config(name="default", value="process")
result = vexy_pdf_werk.process_data(["document.pdf"], config=config)

CLI Usage (Coming Soon)

# Process a PDF into all formats
vpw process document.pdf

# Process with specific formats only
vpw process document.pdf --formats pdfa,markdown

# Enable AI enhancement
vpw process document.pdf --ai-enabled --ai-provider claude

Output Structure

VPW creates organized output with consistent naming:

output/
├── document_enhanced.pdf    # PDF/A version
├── 000--introduction.md     # Paginated Markdown files
├── 001--chapter-one.md
├── 002--conclusions.md
├── document.epub            # Generated ePub
└── metadata.yaml            # Bibliographic data

System Requirements

Required Dependencies

Python 3.10+
tesseract-ocr
qpdf
ghostscript

Optional Dependencies

pandoc (for ePub generation)
marker-pdf (advanced PDF conversion)
markitdown (Microsoft's document converter)
docling (IBM's document understanding)

Installation Commands

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install tesseract-ocr tesseract-ocr-eng qpdf ghostscript pandoc

macOS:

brew install tesseract tesseract-lang qpdf ghostscript pandoc

Windows:

choco install tesseract qpdf ghostscript pandoc

Configuration

VPW stores configuration in platform-appropriate directories:

Linux/macOS: ~/.config/vexy-pdf-werk/config.toml
Windows: %APPDATA%\\vexy-pdf-werk\\config.toml

Example Configuration

[processing]
ocr_language = "eng"
pdf_quality = "high"
force_ocr = false

[conversion]
markdown_backend = "auto"  # auto, marker, markitdown, docling, basic
paginate_markdown = true
include_images = true

[ai]
enabled = false
provider = "claude"  # claude, gemini
correction_enabled = false

[output]
formats = ["pdfa", "markdown", "epub", "yaml"]
preserve_original = true
output_directory = "./output"

Development

This project uses modern Python tooling:

Package Management: uv + hatch
Code Quality: ruff + mypy
Testing: pytest
Version Control: git-tag-based semver with hatch-vcs

Development Setup

# Install uv and hatch
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install hatch

# Clone and setup
git clone https://github.com/vexyart/vexy-pdf-werk
cd vexy-pdf-werk

# Run tests using hatch (automatically manages environment)
hatch run test

# Run linting and formatting
hatch run lint

# Type checking
hatch run type-check

# Or run individual commands
hatch run python -c "import vexy_pdf_werk; print(vexy_pdf_werk.__version__)"

Architecture

VPW follows a modular pipeline architecture:

PDF Input → Analysis → OCR Enhancement → Content Extraction → Format Generation → Multi-Format Output
                          ↓
                   Optional AI Enhancement

Core Components

PDF Processor: Handles OCR and PDF/A conversion
Content Extractors: Multiple backends for PDF-to-Markdown
Format Generators: Creates ePub and metadata outputs
AI Integrations: Optional LLM enhancement services
Configuration System: Platform-aware settings management

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes following the code quality standards
Run tests and linting
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Authors

Fontlab Ltd - Initial work - Vexy Art

Acknowledgments

Built on proven tools: qpdf, OCRmyPDF, tesseract
Integration with cutting-edge AI services
Inspired by the need for better PDF accessibility and archival

Project Status: Under active development

For detailed implementation specifications, see the spec/ directory.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.9

Sep 15, 2025

1.1.8.dev0 pre-release

Sep 15, 2025

1.1.7.dev0 pre-release

Sep 15, 2025

1.1.5.dev0 pre-release

Sep 15, 2025

1.1.4

Sep 14, 2025

1.1.4.dev0 pre-release

Sep 14, 2025

This version

1.1.2.dev0 pre-release

Sep 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vexy_pdf_werk-1.1.2.dev0.tar.gz (10.1 kB view details)

Uploaded Sep 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vexy_pdf_werk-1.1.2.dev0-py3-none-any.whl (7.1 kB view details)

Uploaded Sep 14, 2025 Python 3

File details

Details for the file vexy_pdf_werk-1.1.2.dev0.tar.gz.

File metadata

Download URL: vexy_pdf_werk-1.1.2.dev0.tar.gz
Upload date: Sep 14, 2025
Size: 10.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for vexy_pdf_werk-1.1.2.dev0.tar.gz
Algorithm	Hash digest
SHA256	`ac2f6e3f9834006a88e5361630513d75008ded30568d47d9e4573ef160797ac2`
MD5	`36a9e7584942832e8d2352966ee08030`
BLAKE2b-256	`7eff67d82732918efc81e785c9f176a5d2e3f631bb429dfc46560c0bd50e0415`

See more details on using hashes here.

File details

Details for the file vexy_pdf_werk-1.1.2.dev0-py3-none-any.whl.

File metadata

Download URL: vexy_pdf_werk-1.1.2.dev0-py3-none-any.whl
Upload date: Sep 14, 2025
Size: 7.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for vexy_pdf_werk-1.1.2.dev0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bd696d9115d3421c09b9f126a18d06a25c2c84414324ae20709c2dafdcec52d3`
MD5	`3079d082db06db262b256b31d648a4e5`
BLAKE2b-256	`a46068b6dc60c41f51c00e7eab0a492e93fbf5c952bf29141371d8257b72f1a3`

See more details on using hashes here.

vexy-pdf-werk 1.1.2.dev0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

this_file: README.md

Vexy PDF Werk

Features

Quick Start

Installation

Basic Usage

CLI Usage (Coming Soon)

Output Structure

System Requirements

Required Dependencies

Optional Dependencies

Installation Commands

Configuration

Example Configuration

Development

Development Setup

Architecture

Core Components

Contributing

License

Authors

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes