Skip to main content

Parse PDF documents into markdown formatted content using Vision LLMs

Project description

Vision Parse

License: MIT Author: Arun Brahma PyPI version

🚀 Parse PDF documents into beautifully formatted markdown content using state-of-the-art Vision Language Models - all with just a few lines of code!

🎯 Introduction

Vision Parse harnesses the power of Vision Language Models to revolutionize document processing:

  • 📝 Smart Content Extraction: Intelligently identifies and extracts text and tables with high precision
  • 🎨 Content Formatting: Preserves document hierarchy, styling, and indentation for markdown formatted content
  • 🤖 Multi-LLM Support: Supports multiple Vision LLM providers i.e. OpenAI, LLama, Gemini etc. for accuracy and speed
  • 🔄 PDF Document Support: Handle multi-page PDF documents effortlessly by converting each page into byte64 encoded images
  • 📁 Local Model Hosting: Supports local model hosting using Ollama for secure document processing and for offline use

🚀 Getting Started

Prerequisites

  • 🐍 Python >= 3.9
  • 🖥️ Ollama (if you want to use local models)
  • 🤖 API Key for OpenAI or Google Gemini (if you want to use OpenAI or Google Gemini)

Installation

Install the package using pip (Recommended):

pip install vision-parse

Install the optional dependencies for OpenAI or Gemini:

pip install 'vision-parse[openai]'
pip install 'vision-parse[gemini]'

Setting up Ollama (Optional)

See examples/ollama_setup.md on how to setup Ollama locally.

⌛️ Usage

Basic Example Usage

from vision_parse import VisionParser

# Initialize parser
parser = VisionParser(
    model_name="llama3.2-vision:11b", # For local models, you don't need to provide the api key
    temperature=0.7,
    top_p=0.4,
    extraction_complexity=False # Set to True for more detailed extraction
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)

# Process results
for i, page_content in enumerate(markdown_pages):
    print(f"\n--- Page {i+1} ---\n{page_content}")

PDF Page Configuration

from vision_parse import VisionParser, PDFPageConfig

# Configure PDF processing settings
page_config = PDFPageConfig(
    dpi=400,
    color_space="RGB",
    include_annotations=True,
    preserve_transparency=False
)

# Initialize parser with custom page config
parser = VisionParser(
    model_name="llama3.2-vision:11b",
    temperature=0.7,
    top_p=0.4,
    extraction_complexity=True,
    page_config=page_config
)

# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)

OpenAI or Gemini Model Usage

from vision_parse import VisionParser

# Initialize parser with OpenAI model
parser = VisionParser(
    model_name="gpt-4o",
    api_key="your-openai-api-key", # Get the OpenAI API key from https://platform.openai.com/api-keys
    temperature=0.9,
    top_p=0.4,
    extraction_complexity=False # Set to True for more detailed extraction
)

# Initialize parser with Google Gemini model
parser = VisionParser(
    model_name="gemini-1.5-flash",
    api_key="your-gemini-api-key", # Get the Gemini API key from https://aistudio.google.com/app/apikey
    temperature=0.9,
    top_p=0.4,
    extraction_complexity=False # Set to True for more detailed extraction
)

Supported Models

This package supports the following Vision LLM models:

  • OpenAI: gpt-4o, gpt-4o-mini
  • Google Gemini: gemini-1.5-flash, gemini-2.0-flash-exp, gemini-1.5-pro
  • Meta Llama and LLava from Ollama: llava:13b, llava:34b, llama3.2-vision:11b, llama3.2-vision:70b

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_parse-0.1.6.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

vision_parse-0.1.6-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file vision_parse-0.1.6.tar.gz.

File metadata

  • Download URL: vision_parse-0.1.6.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for vision_parse-0.1.6.tar.gz
Algorithm Hash digest
SHA256 5f56654c560cbe5a7f8d970ca4f793da680f997ff93495e8854d2b2a125f3843
MD5 9547c7e56069f52f079ffec6e87adb56
BLAKE2b-256 b3c6b8b29550fb40b7d83e0a19a304659b034c5106563814ca434ed13fe2c22f

See more details on using hashes here.

File details

Details for the file vision_parse-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for vision_parse-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b7ce77b599faf8b28083be9bcadf4b4de57f43d6c92271e7c5bb6eb23a35ca09
MD5 8ef7997aef6fe5f60ee0e398289710ab
BLAKE2b-256 9bda168905b36e56efbfae76d9556515ca64d1c62b6915b66b5e4ef8d50e0806

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page