Skip to main content

Parse PDF documents into markdown formatted content using Vision LLMs

Project description

Vision Parse

License: MIT Author: Arun Brahma PyPI version

🚀 Transform PDF documents into beautifully formatted markdown using state-of-the-art Vision Language Models - all with just a few lines of code!

🎯 Introduction

Vision Parse harnesses the power of Vision Language Models to revolutionize document processing:

  • 📝 Smart Content Extraction: Intelligently identifies and extracts text, tables, and images with high precision
  • Markdown Magic: Converts complex document layouts into clean, well-structured markdown
  • 🎨 Format Preservation: Maintains document hierarchy, styling, and visual elements
  • 🤖 AI-Powered Analysis: Leverages cutting-edge vision models for superior accuracy
  • 🔄 Batch Processing: Handle multi-page documents effortlessly

🚀 Getting Started

Prerequisites

  • 🐍 Python >= 3.8
  • 🖥️ Ollama (for local model hosting)

Installation

Install the package using pip:

pip install vision-parse

Setting up Ollama

  1. Install Ollama based on your operating system:

    Linux:

    curl -fsSL https://ollama.com/install.sh | sh
    

    MacOS:

    brew install ollama
    

    Windows: Download and install from Ollama Website

  2. Pull and start the Ollama server:

    ollama pull llama3.2-vision:11b
    ollama serve
    
  3. Verify server status:

    curl http://localhost:11434/api/version
    

⌛️ Usage

Basic Example

from vision_parse import VisionParser

# Initialize parser
parser = VisionParser(
    model_name="llama3.2-vision:11b",
    temperature=0.7,
    top_p=0.7
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)

# Process results
for i, page_content in enumerate(markdown_pages, 1):
    print(f"\n--- Page {i} ---\n{page_content}")

Custom Configuration

from vision_parse import VisionParser, PDFPageConfig

# Configure PDF processing settings
page_config = PDFPageConfig(
    dpi=400,
    color_space="RGB",
    include_annotations=True,
    preserve_transparency=False
)

# Initialize parser with custom config
parser = VisionParser(
    model_name="llama3.2-vision:11b",
    temperature=0.7,
    top_p=0.7,
    page_config=page_config
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)

🛠️ Development

Setting Up Development Environment

  1. Clone and Setup:

    # Clone the repository
    git clone https://github.com/iamarunbrahma/vision-parse.git
    cd vision-parse
    
  2. Install Dependencies:

    # Install uv (Mac and Linux)
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Install uv (Windows)
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
    
    # Install dependencies
    uv sync --all-extras && source .venv/bin/activate
    
  3. Quality Checks:

    # Run test suite
    make test
    
    # Code quality
    make lint    # Run code linting
    make format  # Format code
    

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_parse-0.1.1.tar.gz (42.4 kB view details)

Uploaded Source

Built Distribution

vision_parse-0.1.1-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file vision_parse-0.1.1.tar.gz.

File metadata

  • Download URL: vision_parse-0.1.1.tar.gz
  • Upload date:
  • Size: 42.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for vision_parse-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f92d396b6b3d3fb9542e54699cb94c538277ebd83d59fd5acf3def86715b8e10
MD5 f7f99ebb89c547c184c9b6c4578c9bc0
BLAKE2b-256 e9bb54c1f61f53080807e62b6701158acf57a3c30a65c4609b797a44a74d260e

See more details on using hashes here.

File details

Details for the file vision_parse-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for vision_parse-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b5bbcab827f13e5b2abdac0c8d9ce00b02833a6cf815ca0ba8ed3cf87bd1f5c7
MD5 d70709bce3a34ffe4fe7c9bad65dee75
BLAKE2b-256 a985276583000f416728c6e43fe175abbfdf5a783c74d77e60ef0c7fc6fa3377

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page