A modular document processing pipeline for Markdown to PDF/SVG/PNG conversion with OCR

These details have not been verified by PyPI

Project links

Homepage

Project description

Enclose

A comprehensive document processing pipeline for Markdown to PDF/SVG/PNG conversion with OCR capabilities.

graph LR
    A[Markdown] -->|Parse| B[HTML]
    B -->|Convert| C[PDF]
    C -->|Embed| D[SVG]
    D -->|Extract| E[PNG]
    E -->|Process| F[OCR]
    F -->|Index| G[Search]
    G -->|Visualize| H[Dashboard]

🚀 Features

graph LR
    A[Input Formats] --> B[Markdown]
    A --> C[PDF]
    
    B --> D[Converters]
    C --> D
    
    D --> E[Output Formats]
    E --> F[PDF]
    E --> G[SVG]
    E --> H[PNG]
    E --> I[HTML]
    
    style A fill:#f9f,stroke:#333
    style E fill:#9f9,stroke:#333

Multi-format conversion: Convert between Markdown, PDF, SVG, and PNG
SVG embedding: Embed PDFs as base64 data URIs in SVG containers
Image extraction: Extract high-quality images from PDFs
OCR processing: Extract text with confidence scoring
Metadata tracking: Preserve and enhance metadata throughout processing
Interactive dashboard: View and search processed documents

✅ File Format Validation

Enclose includes comprehensive file format validation to ensure the integrity and correctness of all converted files:

graph TD
    A[Input Validation] --> B[Conversion]
    B --> C[Output Validation]
    C --> D[Verification]
    
    style A fill:#d4f1f9,stroke:#333
    style C fill:#d4f1f9,stroke:#333
    style D fill:#d4f1f9,stroke:#333

Validation Checks

PDF Files

✅ Valid PDF signature (%PDF header)
✅ Correct MIME type (application/pdf)
✅ File integrity verification

SVG Files

✅ Valid XML structure
✅ Correct MIME type (image/svg+xml)
✅ Basic SVG tag validation

PNG Files

✅ Valid PNG signature (magic bytes)
✅ Correct MIME type (image/png)
✅ Image data integrity check
✅ PIL verification of image data

Example Validation Output

# When running tests, you'll see validation output like:
PASSED tests/test_file_formats.py::test_pdf_conversion
PASSED tests/test_file_formats.py::test_svg_conversion
PASSED tests/test_file_formats.py::test_png_conversion

📚 Documentation

For complete documentation, please visit our documentation site.

🛠️ Quick Start

Prerequisites

Python 3.8+
Poetry (for development)

Installation

Clone the repository:

git clone https://github.com/yourusername/enclose.git
cd enclose

Install dependencies:
```
poetry install
```
Install the package in development mode:
```
poetry install
```

Basic Usage

List supported formats:
```
enclose --list
```

Convert a markdown file to another format:

# Basic conversion (outputs to current directory with default name)
enclose example.md pdf

# Specify output filename
enclose example.md pdf -o output.pdf

# Convert to SVG
enclose example.md svg -o output.svg

# Convert to PNG
enclose example.md png -o output.png

# Convert to HTML
enclose example.md html -o output.html

Example

First, create a test markdown file or use the provided example.md

Convert it to different formats:

# Convert to PDF
enclose example.md pdf -o example.pdf

# Convert to SVG
enclose example.md svg -o example.svg

Important Notes

The -o or --output flag requires a full file path with extension (e.g., output.pdf, ./output.svg)
If no output is specified, the output will be saved in the current directory with a default name based on the input file
The output directory must exist before running the command

The output will be saved to output/example.pdf

Command Line Options

usage: enclose [-h] [--version] [--list] [-o OUTPUT] [input] [{pdf,png,svg,html}]

A document processing tool for format conversion.

positional arguments:
  input                 Input file path (markdown, pdf, etc.)
  {pdf,png,svg,html}    Output format

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --list                List supported formats and conversions
  -o OUTPUT, --output OUTPUT
                        Output directory (default: current directory)

Development

To run tests:

make test

To run linting:

make lint

To run type checking:

make typecheck

Prerequisites

Python 3.8.1+
Poetry for dependency management
System dependencies (see Installation Guide)

Installation

# Clone the repository
git clone https://github.com/veridock/enclose.git
cd enclose

# Install the package
make install

Basic Usage

# Process a document
enclose process example.md -o output/

# View the results
open output/dashboard.html  # macOS
# or
xdg-open output/dashboard.html  # Linux

📖 Documentation Structure

Getting Started - Installation and setup
User Guide - Command reference and usage examples
Architecture - System design and components
Development - Contributing and development setup

🌟 Features in Detail

Document Conversion

Markdown to PDF with custom styling
PDF to SVG with embedded fonts
High-quality image extraction

Advanced Processing

OCR text extraction with confidence scoring
Metadata extraction and management
Batch processing support

Command Line Interface

Intuitive command structure
Configurable output formats
Progress tracking

📊 Example Workflow

sequenceDiagram
    participant User
    participant CLI
    participant Processor
    
    User->>CLI: enclose process doc.md
    CLI->>Processor: Process document
    Processor->>Processor: Convert Markdown to PDF
    Processor->>Processor: Generate SVG with embedded PDF
    Processor->>Processor: Extract images
    Processor->>Processor: Process OCR
    Processor-->>CLI: Processing complete
    CLI-->>User: Results in output/

📦 Project Structure

enclose/
├── docs/                   # Documentation
├── processor/              # Main package
│   ├── __init__.py
│   ├── __main__.py         # CLI entry point
│   ├── core/               # Core processing logic
│   ├── converters/         # Format converters
│   └── utils/              # Utility functions
├── scripts/                # Helper scripts
├── tests/                  # Test suite
└── pyproject.toml          # Project configuration

🤝 Contributing

Contributions are welcome! Please see our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🧪 Testing

To run the test suite:

make test
make lint

🔄 Development Workflow

Set up development environment
```
make install
```
Run tests
```
make test
```
Format and check code
```
make format
make lint
```
Run the development server
```
make dev
```
- Dashboard opens automatically in browser
- Access: output/dashboard.html

🛠️ CLI Commands

Basic Usage

# Convert a document
enclose input.md pdf -o output.pdf

# List available formats
enclose --list
# Show help
enclose --help

Command Structure

flowchart TD
    A[enclose] --> B[input_file output_format options]
    A --> C[--list]
    A --> D[--help]
    A --> E[--version]
    
    B --> F[input_file]
    B --> G[output_format]
    B --> H[options]
    
    H --> I[-o/--output]
    H --> J[--dpi]
    H --> K[--quality]
    
    style B fill:#9f9,stroke:#333
    style C fill:#99f,stroke:#333
    style D fill:#99f,stroke:#333

Common Examples

# Convert Markdown to PDF
enclose document.md pdf -o output.pdf

# Convert PDF to high-quality PNG
enclose document.pdf png --dpi 300 -o output.png


# List all supported formats
enclose --list

Advanced Options

# Set output DPI for images
enclose convert input.pdf png --dpi 150

# Set image quality (1-100)
enclose convert input.pdf jpg --quality 90

# Process multiple files
for f in *.md; do enclose convert "$f" pdf -o output/; done

📁 Project Structure

graph TD
    A[Project Root] --> B[Source Code]
    A --> C[Documentation]
    A --> D[Build System]
    A --> E[Tests]
    
    B --> F[enclose/]:::dir
    F --> G[__init__.py]:::file
    F --> H[__main__.py]:::file
    F --> I[core/]:::dir
    F --> J[converters/]:::dir
    F --> K[utils/]:::dir
    
    C --> L[docs/]:::dir
    C --> M[README.md]:::file
    
    D --> N[pyproject.toml]:::file
    D --> O[Makefile]:::file
    
    E --> P[tests/]:::dir
    
    classDef dir fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef file fill:#e8f5e9,stroke:#2e7d32,stroke-width:1.5px

Key Directories

enclose/ - Main Python package
- core/ - Core processing logic and document handling
- converters/ - File format conversion modules
- utils/ - Utility functions and helpers
- __main__.py - CLI entry point
docs/ - Comprehensive documentation
- getting-started/ - Installation and setup guides
- architecture/ - System design and components
- usage/ - User guides and examples
tests/ - Test suite
- Unit tests
- Integration tests
- Test fixtures

Key Files

pyproject.toml - Project configuration and dependencies
Makefile - Common development tasks
scripts/enclose - Global CLI wrapper script
.github/workflows/ - CI/CD pipelines

🔄 Workflow

flowchart LR
    A[Input] -->|Markdown/PDF| B(enclose)
    B --> C{Format?}
    
    C -->|Markdown| D[Parse Markdown]
    D --> E[Generate HTML]
    E --> F[Convert to PDF]
    
    C -->|PDF| G[Process PDF]
    G --> H[Extract Content]
    
    F & H --> I[Generate Outputs]
    I --> J[SVG/PNG/HTML]
    I --> K[Metadata]
    J --> L[Dashboard]
    
    style A fill:#e3f2fd,stroke:#1565c0
    style B fill:#e8f5e9,stroke#2e7d32
    style L fill:#fff3e0,stroke:#e65100

Processing Steps

Input Handling
- Accepts Markdown or PDF files
- Validates input format and content
Conversion
- Markdown → HTML → PDF
- PDF → Images/Text
Output Generation
- Generate SVG/PNG/HTML outputs
- Extract and process metadata
Visualization
- Create interactive dashboard
- Enable search and filtering

🔄 Pipeline Workflow

Step 1: CREATE
├── Generate example markdown file (invoice)
└── Output: invoice_example.md

Step 2: MARKDOWN → PDF
├── Convert markdown to styled HTML
├── Generate PDF with CSS styling
└── Output: invoice_example.pdf

Step 3: PDF → SVG
├── Embed PDF as base64 data URI
├── Add SVG metadata (RDF/Dublin Core)
└── Output: invoice_example.svg + metadata.json

Step 4: PDF → PNG
├── Extract PDF pages as PNG images
├── Convert PNG to base64 encoding
└── Output: page_*.png + updated metadata

Step 5: OCR PROCESSING
├── Extract text from PNG images
├── Calculate confidence scores
└── Output: updated metadata with OCR data

Step 6: FILESYSTEM SEARCH
├── Scan for all SVG files
├── Parse SVG metadata
└── Output: svg_search_results.json

Step 7: DASHBOARD CREATION
├── Generate HTML table with thumbnails
├── Embed SVG previews
└── Output: dashboard.html (opens in browser)

📊 Output Files

Metadata Structure

{
  "file": "path/to/file.svg",
  "type": "svg_with_pdf",
  "created": "2025-06-25T10:30:00",
  "pdf_embedded": true,
  "total_pages": 1,
  "pages": [
    {
      "page": 1,
      "file": "page_1.png",
      "base64": "iVBORw0KGgoAAAANSU...",
      "ocr_text": "Invoice #INV-2025-001...",
      "ocr_confidence": 95.7,
      "word_count": 45
    }
  ]
}

Dashboard Features

SVG Thumbnails: Direct embedding of SVG files
File Information: Path, size, modification date
PDF Detection: Indicates embedded PDF data
Metadata Status: Shows RDF metadata presence
Interactive Links: Click to open files

🛠️ Makefile Targets

Target	Description
`install`	Install dependencies in virtual environment
`create`	Create example markdown file
`process`	Run conversion pipeline (steps 2-5)
`search`	Search filesystem for SVG files
`enclose`	Create HTML dashboard
`clean`	Remove generated files
`clean-all`	Remove everything including venv
`help`	Show available commands

🔧 Configuration

OCR Language Support

# Install additional languages
sudo apt-get install tesseract-ocr-pol  # Polish
sudo apt-get install tesseract-ocr-deu  # German

# Configure in processor.py
pytesseract.image_to_string(image, lang='pol+eng')

PDF Styling

Modify CSS in markdown_to_pdf() method:

styled_html = f"""
<style>
    body {{ font-family: 'Your Font', sans-serif; }}
    /* Add custom styles */
</style>
"""

🐛 Troubleshooting

Common Issues

OCR Not Working:

# Check tesseract installation
tesseract --version

# Install language packs
sudo apt-get install tesseract-ocr-eng

PDF Conversion Fails:

# Check weasyprint dependencies
pip install --upgrade weasyprint

SVG Rendering Issues:

# Install cairo development libraries
sudo apt-get install libcairo2-dev

Debug Mode

# Enable verbose output
python processor.py --step process --verbose

📝 License

This project is open source. See LICENSE file for details.

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

📞 Support

Issues: GitHub Issues
Documentation: This README
Examples: Check output/ directory after running pipeline

🎉 Quick Demo

# Complete setup and demo
make install
make all

# View results
open output/dashboard.html  # macOS
xdg-open output/dashboard.html  # Linux

The dashboard will show your processed documents with interactive thumbnails and metadata!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.9

Jun 27, 2025

This version

1.0.8

Jun 26, 2025

1.0.7

Jun 26, 2025

1.0.6

Jun 26, 2025

1.0.5

Jun 26, 2025

1.0.4

Jun 26, 2025

1.0.3

Jun 26, 2025

1.0.2

Jun 26, 2025

1.0.1

Jun 26, 2025

0.1.7

Jun 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enclose-1.0.8.tar.gz (25.3 kB view details)

Uploaded Jun 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

enclose-1.0.8-py3-none-any.whl (25.6 kB view details)

Uploaded Jun 26, 2025 Python 3

File details

Details for the file enclose-1.0.8.tar.gz.

File metadata

Download URL: enclose-1.0.8.tar.gz
Upload date: Jun 26, 2025
Size: 25.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.15.3-200.fc42.x86_64

File hashes

Hashes for enclose-1.0.8.tar.gz
Algorithm	Hash digest
SHA256	`c70f20052bbba7fdee716fcb145a65068ee908cd85f498e56814142243a5caff`
MD5	`6023e7c1a1b2193ab61bfbe4537ff6e6`
BLAKE2b-256	`ceb9456b01506349b1cf83fe1a114233cf432ae8e423e3b957d0d3825a270bc0`

See more details on using hashes here.

File details

Details for the file enclose-1.0.8-py3-none-any.whl.

File metadata

Download URL: enclose-1.0.8-py3-none-any.whl
Upload date: Jun 26, 2025
Size: 25.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.15.3-200.fc42.x86_64

File hashes

Hashes for enclose-1.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`754064374435bbb49cd92769a592fb36bcc04c5e857704d0cfcac8eb281e6ad4`
MD5	`477792e65d11515fb8cb85212c53f435`
BLAKE2b-256	`9c96944340040c72b71f3674f687c65dc9a88826f95e9c6629f39e22f0bcb167`

See more details on using hashes here.

enclose 1.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Enclose

🚀 Features

✅ File Format Validation

Validation Checks

PDF Files

SVG Files

PNG Files

Example Validation Output

📚 Documentation

🛠️ Quick Start

Prerequisites

Installation

Basic Usage

Example

Important Notes

Command Line Options

Development

Prerequisites

Installation

Basic Usage

📖 Documentation Structure

🌟 Features in Detail

Document Conversion

Advanced Processing

Command Line Interface

📊 Example Workflow

📦 Project Structure

🤝 Contributing

📄 License

🧪 Testing

🔄 Development Workflow

🛠️ CLI Commands

Basic Usage

Command Structure

Common Examples

Advanced Options

📁 Project Structure

Key Directories

Key Files

🔄 Workflow

Processing Steps

🔄 Pipeline Workflow

📊 Output Files

Metadata Structure

Dashboard Features

🛠️ Makefile Targets

🔧 Configuration

OCR Language Support

PDF Styling

🐛 Troubleshooting

Common Issues

Debug Mode

📝 License

🤝 Contributing

📞 Support

🎉 Quick Demo

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details