Skip to main content

Parse PDF documents into markdown formatted content using Vision LLMs

Project description

Vision Parse

License: MIT Author: Arun Brahma PyPI version

🚀 Transform PDF documents into beautifully formatted markdown using state-of-the-art Vision Language Models - all with just a few lines of code!

🎯 Introduction

Vision Parse harnesses the power of Vision Language Models to revolutionize document processing:

  • 📝 Smart Content Extraction: Intelligently identifies and extracts text, tables, and images with high precision
  • Markdown Magic: Converts complex document layouts into clean, well-structured markdown
  • 🎨 Format Preservation: Maintains document hierarchy, styling, and visual elements
  • 🤖 AI-Powered Analysis: Leverages cutting-edge vision models for superior accuracy
  • 🔄 Batch Processing: Handle multi-page documents effortlessly

🚀 Getting Started

Prerequisites

  • 🐍 Python >= 3.8
  • 🖥️ Ollama (for local model hosting)

Installation

Install the package using pip:

pip install vision-parse

Setting up Ollama

  1. Install Ollama based on your operating system:

    Linux:

    curl -fsSL https://ollama.com/install.sh | sh
    

    MacOS:

    brew install ollama
    

    Windows: Download and install from Ollama Website

  2. Pull and start the Ollama server:

    ollama pull llama3.2-vision:11b
    ollama serve
    
  3. Verify server status:

    curl http://localhost:11434/api/version
    

⌛️ Usage

Basic Example

from vision_parse import VisionParser

# Initialize parser
parser = VisionParser(
    model_name="llama3.2-vision:11b",
    temperature=0.7,
    top_p=0.7
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)

# Process results
for i, page_content in enumerate(markdown_pages, 1):
    print(f"\n--- Page {i} ---\n{page_content}")

Custom Configuration

from vision_parse import VisionParser, PDFPageConfig

# Configure PDF processing settings
page_config = PDFPageConfig(
    dpi=400,
    color_space="RGB",
    include_annotations=True,
    preserve_transparency=False
)

# Initialize parser with custom config
parser = VisionParser(
    model_name="llama3.2-vision:11b",
    temperature=0.7,
    top_p=0.7,
    page_config=page_config
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)

🛠️ Development

Setting Up Development Environment

  1. Clone and Setup:

    # Clone the repository
    git clone https://github.com/iamarunbrahma/vision-parse.git
    cd vision-parse
    
  2. Install Dependencies:

    # Install uv (Mac and Linux)
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Install uv (Windows)
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
    
    # Install dependencies
    uv sync --all-extras && source .venv/bin/activate
    
  3. Quality Checks:

    # Run test suite
    make test
    
    # Code quality
    make lint    # Run code linting
    make format  # Format code
    

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_parse-0.1.2.tar.gz (43.4 kB view details)

Uploaded Source

Built Distribution

vision_parse-0.1.2-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file vision_parse-0.1.2.tar.gz.

File metadata

  • Download URL: vision_parse-0.1.2.tar.gz
  • Upload date:
  • Size: 43.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for vision_parse-0.1.2.tar.gz
Algorithm Hash digest
SHA256 87602b0a6b63f40cacb790f4810e5ef95636696a549015bf5488cdc15d92a2bf
MD5 5854ea60e0b4a92be05ef58b9b2c34a8
BLAKE2b-256 d743a94fb0dc22ebcaffa1cf225c98311dc4f8abacf71df287e63497be0c9ee4

See more details on using hashes here.

File details

Details for the file vision_parse-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for vision_parse-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 40e50eb50182be2471bb1db7d856ad97796549586fc405dd1578ed4aab2acca4
MD5 e7e89be80465093c12433136cb335f32
BLAKE2b-256 2566ad0f9e8f7f7ade2765db7880b0f8a0085754a1505e08092ba7732b8351e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page