Skip to main content

Extract content from PDF documents using Vision LLMs in markdown format

Project description

Vision Parse

License: MIT Author: Arun Brahma PyPI - Downloads

🚀 Transform PDF documents into beautifully formatted markdown using state-of-the-art Vision Language Models - all with just a few lines of code!

🎯 Introduction

Vision Parse harnesses the power of Vision Language Models to revolutionize document processing:

  • 📝 Smart Content Extraction: Intelligently identifies and extracts text, tables, and images with high precision
  • Markdown Magic: Converts complex document layouts into clean, well-structured markdown
  • 🎨 Format Preservation: Maintains document hierarchy, styling, and visual elements
  • 🤖 AI-Powered Analysis: Leverages cutting-edge vision models for superior accuracy
  • 🔄 Batch Processing: Handle multi-page documents effortlessly

🚀 Getting Started

Prerequisites

  • 🐍 Python >= 3.8
  • 🖥️ Ollama (for local model hosting)

Installation

Install the package using pip:

pip install vision-parse

Setting up Ollama

  1. Install Ollama based on your operating system:

    Linux:

    curl -fsSL https://ollama.com/install.sh | sh
    

    MacOS:

    brew install ollama
    

    Windows: Download and install from Ollama Website

  2. Pull and start the Ollama server:

    ollama pull llama3.2-vision:11b
    ollama serve
    
  3. Verify server status:

    curl http://localhost:11434/api/version
    

⌛️ Usage

Basic Example

from vision_parse import VisionParser

# Initialize parser
parser = VisionParser(
    model_name="llama3.2-vision:11b",
    temperature=0.7,
    top_p=0.7
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)

# Process results
for i, page_content in enumerate(markdown_pages, 1):
    print(f"\n--- Page {i} ---\n{page_content}")

Custom Configuration

from vision_parse import VisionParser, PDFPageConfig

# Configure PDF processing settings
page_config = PDFPageConfig(
    dpi=400,
    color_space="RGB",
    include_annotations=True,
    preserve_transparency=False
)

# Initialize parser with custom config
parser = VisionParser(
    model_name="llama3.2-vision:11b",
    temperature=0.7,
    top_p=0.7,
    page_config=page_config
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)

🛠️ Development

Setting Up Development Environment

  1. Clone and Setup:

    # Clone the repository
    git clone https://github.com/iamarunbrahma/vision-parse.git
    cd vision-parse
    
  2. Install Dependencies:

    # Install uv (Mac and Linux)
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Install uv (Windows)
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
    
    # Install dependencies
    uv sync --all-extras && source .venv/bin/activate
    
  3. Quality Checks:

    # Run test suite
    make test
    
    # Code quality
    make lint    # Run code linting
    make format  # Format code
    

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_parse-0.1.0.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

vision_parse-0.1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file vision_parse-0.1.0.tar.gz.

File metadata

  • Download URL: vision_parse-0.1.0.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for vision_parse-0.1.0.tar.gz
Algorithm Hash digest
SHA256 911e209e15dd92b57515ba8c4c42ef395a4bab66b3a820391e3446a8b0a153f4
MD5 0f2e2c9eca4d78eec89e12493693dd24
BLAKE2b-256 3000601ed42d74c30a27268b535e41170dfbf2504f8418dba076a312f2a55ac2

See more details on using hashes here.

File details

Details for the file vision_parse-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for vision_parse-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8c4949aa3086254b92da4635c92c95917985742a36c77d2688ed44adad5b513
MD5 4429ac5d381e01b43fe2fc8516b04227
BLAKE2b-256 756b61d2541c432650a2bb33f998dbf9742673de38889a0b92ecbd34d9335325

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page