Parse PDF documents into markdown formatted content using Vision LLMs
Project description
Vision Parse
🚀 Transform PDF documents into beautifully formatted markdown using state-of-the-art Vision Language Models - all with just a few lines of code!
🎯 Introduction
Vision Parse harnesses the power of Vision Language Models to revolutionize document processing:
- 📝 Smart Content Extraction: Intelligently identifies and extracts text, tables, and images with high precision
- ✨ Markdown Magic: Converts complex document layouts into clean, well-structured markdown
- 🎨 Format Preservation: Maintains document hierarchy, styling, and visual elements
- 🤖 AI-Powered Analysis: Leverages cutting-edge vision models for superior accuracy
- 🔄 Batch Processing: Handle multi-page documents effortlessly
🚀 Getting Started
Prerequisites
- 🐍 Python >= 3.8
- 🖥️ Ollama (for local model hosting)
Installation
Install the package using pip:
pip install vision-parse
Setting up Ollama
-
Install Ollama based on your operating system:
Linux:
curl -fsSL https://ollama.com/install.sh | sh
MacOS:
brew install ollama
Windows: Download and install from Ollama Website
-
Pull and start the Ollama server:
ollama pull llama3.2-vision:11b ollama serve
-
Verify server status:
curl http://localhost:11434/api/version
⌛️ Usage
Basic Example
from vision_parse import VisionParser
# Initialize parser
parser = VisionParser(
model_name="llama3.2-vision:11b",
temperature=0.7,
top_p=0.7
)
# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)
# Process results
for i, page_content in enumerate(markdown_pages, 1):
print(f"\n--- Page {i} ---\n{page_content}")
Custom Configuration
from vision_parse import VisionParser, PDFPageConfig
# Configure PDF processing settings
page_config = PDFPageConfig(
dpi=400,
color_space="RGB",
include_annotations=True,
preserve_transparency=False
)
# Initialize parser with custom config
parser = VisionParser(
model_name="llama3.2-vision:11b",
temperature=0.7,
top_p=0.7,
page_config=page_config
)
# Convert PDF to markdown
pdf_path = "path/to/your/document.pdf"
markdown_pages = parser.convert_pdf(pdf_path)
🛠️ Development
Setting Up Development Environment
-
Clone and Setup:
# Clone the repository git clone https://github.com/iamarunbrahma/vision-parse.git cd vision-parse
-
Install Dependencies:
# Install uv (Mac and Linux) curl -LsSf https://astral.sh/uv/install.sh | sh # Install uv (Windows) powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Install dependencies uv sync --all-extras && source .venv/bin/activate
-
Quality Checks:
# Run test suite make test # Code quality make lint # Run code linting make format # Format code
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vision_parse-0.1.2.tar.gz
.
File metadata
- Download URL: vision_parse-0.1.2.tar.gz
- Upload date:
- Size: 43.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87602b0a6b63f40cacb790f4810e5ef95636696a549015bf5488cdc15d92a2bf |
|
MD5 | 5854ea60e0b4a92be05ef58b9b2c34a8 |
|
BLAKE2b-256 | d743a94fb0dc22ebcaffa1cf225c98311dc4f8abacf71df287e63497be0c9ee4 |
File details
Details for the file vision_parse-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: vision_parse-0.1.2-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40e50eb50182be2471bb1db7d856ad97796549586fc405dd1578ed4aab2acca4 |
|
MD5 | e7e89be80465093c12433136cb335f32 |
|
BLAKE2b-256 | 2566ad0f9e8f7f7ade2765db7880b0f8a0085754a1505e08092ba7732b8351e8 |