Multi-Level Docuemtn converter from pdf to xml or html and json , from json+html to xml or pdf or doc or epub, with OCR and Generator powered by Ollama Mistral:7b

These details have not been verified by PyPI

Project links

Project description

Redoc - Universal Document Converter

Redoc is a powerful, modular document conversion framework that enables seamless transformation between various document formats including PDF, HTML, XML, JSON, DOCX, and EPUB. It features OCR capabilities and AI-powered content generation using Ollama Mistral:7b.

🌟 Features

Multi-format Support: Convert between PDF, HTML, XML, JSON, DOCX, and EPUB
Template-based Processing: Use JSON+HTML templates for dynamic document generation
OCR Integration: Extract text from scanned documents and images
Modular Architecture: Easily extendable with custom converters and processors
AI-Powered: Leverage Ollama Mistral:7b for intelligent content generation
Batch Processing: Process multiple documents efficiently
CLI & API: Command-line interface and Python API for easy integration

🚀 Quick Start

Installation

# Install with pip
pip install redoc

# Or install from source
git clone https://github.com/text2doc/redoc.git
cd redoc
pip install -e .

Basic Usage

from redoc import Redoc

# Initialize the converter
converter = Redoc()

# Convert PDF to JSON
result = converter.convert('document.pdf', 'json')

# Convert HTML+JSON template to PDF
template = {
    "template": "invoice.html",
    "data": {
        "invoice_number": "INV-2023-001",
        "date": "2023-11-15",
        "total": "$1,200.00"
    }
}
converter.convert(template, 'pdf', output_file='invoice.pdf')

📚 Supported Conversions

From \ To	PDF	HTML	XML	JSON	DOCX	EPUB
PDF	❌	✅	✅	✅	✅	✅
HTML	✅	❌	✅	✅	✅	✅
XML	✅	✅	❌	✅	✅	✅
JSON	✅	✅	✅	❌	✅	✅
DOCX	✅	✅	✅	✅	❌	✅
EPUB	✅	✅	✅	✅	✅	❌

🏗️ Project Structure

redoc/
├── src/
│   └── redoc/
│       ├── __init__.py          # Package initialization
│       ├── core.py             # Core conversion logic
│       ├── converters/         # Format-specific converters
│       │   ├── base.py         # Base converter class
│       │   ├── pdf_converter.py
│       │   ├── html_converter.py
│       │   ├── xml_converter.py
│       │   ├── json_converter.py
│       │   ├── docx_converter.py
│       │   └── epub_converter.py
│       ├── ocr/                # OCR functionality
│       ├── templates/          # Default templates
│       └── utils/              # Utility functions
├── tests/                      # Test suite
├── examples/                   # Usage examples
├── docs/                       # Documentation
├── pyproject.toml              # Project configuration
└── README.md                   # This file

🔧 Advanced Usage

Using Templates

from redoc import Redoc

converter = Redoc()

# Convert JSON+HTML template to PDF
converter.convert(
    {
        "template": "invoice.html",
        "data": {
            "invoice_number": "INV-2023-001",
            "date": "2023-11-15",
            "items": [
                {"description": "Web Design", "quantity": 1, "price": 1200}
            ],
            "total": 1200
        }
    },
    'pdf',
    output_file='invoice.pdf'
)

OCR Processing

from redoc import Redoc

converter = Redoc()

# Extract text from scanned PDF with OCR
result = converter.ocr('scanned_document.pdf')
print(result['text'])

# Convert scanned document to searchable PDF
converter.ocr('scanned_document.pdf', output_file='searchable.pdf')

AI-Powered Content Generation

from redoc import Redoc

converter = Redoc()

# Generate document using AI
result = converter.generate(
    "Create a professional invoice for web design services",
    format='pdf',
    style='professional',
    output_file='ai_invoice.pdf'
)

🤝 Contributing

Contributions are welcome! Please read our Contributing Guidelines for details on how to contribute to this project.

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

📧 Contact

For any questions or suggestions, please contact info@softreck.dev.

Made with ❤️ by Text2Doc Team

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.3

Jun 8, 2025

0.2.2

Jun 8, 2025

0.2.1

Jun 8, 2025

This version

0.1.7

Jun 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redoc-0.1.7.tar.gz (10.5 kB view details)

Uploaded Jun 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

redoc-0.1.7-py3-none-any.whl (10.7 kB view details)

Uploaded Jun 8, 2025 Python 3

File details

Details for the file redoc-0.1.7.tar.gz.

File metadata

Download URL: redoc-0.1.7.tar.gz
Upload date: Jun 8, 2025
Size: 10.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.14.9-300.fc42.x86_64

File hashes

Hashes for redoc-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`40659a70741b4da6644184ef7dd2737c6c1ec653fc968bd9be252599fbf26a50`
MD5	`0de3a2a858ac17a902b2ec6cfb79207d`
BLAKE2b-256	`3177b2770f963fa1e524c3c145ced07ced4d59155a31daa8b2545ab48927a2d1`

See more details on using hashes here.

File details

Details for the file redoc-0.1.7-py3-none-any.whl.

File metadata

Download URL: redoc-0.1.7-py3-none-any.whl
Upload date: Jun 8, 2025
Size: 10.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.14.9-300.fc42.x86_64

File hashes

Hashes for redoc-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca731bd130a4c009cf62c94226598f1ef6f6753d95a0ff2fae104227b954d158`
MD5	`368668831b4a3d577706ee8c102c8e58`
BLAKE2b-256	`5bb7c3a7a376495ee40513c92e9bd64bf9349033470e063a22898965d62fe4bc`

See more details on using hashes here.

redoc 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Redoc - Universal Document Converter

🌟 Features

🚀 Quick Start

Installation

Basic Usage

📚 Supported Conversions

🏗️ Project Structure

🔧 Advanced Usage

Using Templates

OCR Processing

AI-Powered Content Generation

🤝 Contributing

📄 License

📧 Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes