Unified Python toolkit for visual document processing - think Transformers for document AI

These details have not been verified by PyPI

Project links

Project description

OmniDocs

OmniDocs Banner

Unified Python toolkit for visual document understanding

Documentation • Installation • Quick Start • Tasks • Contributing

OmniDocs provides a single, consistent API for document AI tasks: layout detection, OCR, text extraction, table parsing, structured extraction, and reading order. Swap models and backends without changing your code.

result = extractor.extract(image)

Why OmniDocs?

One API — .extract() for every task
Multi-backend — PyTorch, VLLM, MLX, API
VLM API — Use any cloud VLM (Gemini, OpenRouter, Azure, OpenAI) with zero GPU
Type-safe — Pydantic configs and outputs
Structured extraction — Extract data into Pydantic schemas
Production-ready — Modal deployment, batch processing

Installation

pip install omnidocs

Or with uv:

uv pip install omnidocs

Cloud API access (Gemini, OpenRouter, Azure, OpenAI, ANANNAS AI) works out of the box — LiteLLM is included as a core dependency.

Install extras

pip install omnidocs[pytorch]   # Local GPU inference
pip install omnidocs[vllm]      # High-throughput production
pip install omnidocs[mlx]       # Apple Silicon
pip install omnidocs[ocr]       # Tesseract, EasyOCR, PaddleOCR
pip install omnidocs[all]       # Everything

From source

git clone https://github.com/adithya-s-k/Omnidocs.git
cd Omnidocs
uv sync

Flash Attention (optional, for PyTorch VLMs)

Download pre-built wheel from Flash Attention Releases:

# Example: Python 3.12, CUDA 12, PyTorch 2.5
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.5cxx11abiFALSE-cp312-cp312-linux_x86_64.whl

Quick Start

VLM API (No GPU Required)

Use any cloud VLM through a single, provider-agnostic API:

from omnidocs.vlm import VLMAPIConfig
from omnidocs.tasks.text_extraction import VLMTextExtractor

# Just set your env var: OPENROUTER_API_KEY, GOOGLE_API_KEY, etc.
config = VLMAPIConfig(model="openrouter/qwen/qwen3-vl-8b-instruct")

extractor = VLMTextExtractor(config=config)
result = extractor.extract("document.png", output_format="markdown")
print(result.content)

Works with any provider: OpenRouter, Gemini, Azure, OpenAI, ANANNAS AI, self-hosted VLLM — if it speaks the OpenAI API, it works.

Structured Extraction

Extract typed data directly into Pydantic schemas:

from pydantic import BaseModel
from omnidocs.vlm import VLMAPIConfig
from omnidocs.tasks.structured_extraction import VLMStructuredExtractor

class Invoice(BaseModel):
    vendor: str
    total: float
    items: list[str]

config = VLMAPIConfig(model="gemini/gemini-2.5-flash")
extractor = VLMStructuredExtractor(config=config)

result = extractor.extract(
    image="invoice.png",
    schema=Invoice,
    prompt="Extract invoice details from this document.",
)
print(result.data.vendor, result.data.total)

Text Extraction (Local GPU)

from omnidocs import Document
from omnidocs.tasks.text_extraction import QwenTextExtractor
from omnidocs.tasks.text_extraction.qwen import QwenTextVLLMConfig

doc = Document.from_pdf("report.pdf")

extractor = QwenTextExtractor(
    backend=QwenTextVLLMConfig(model="Qwen/Qwen3-VL-8B-Instruct")
)

result = extractor.extract(doc.get_page(0), output_format="markdown")
print(result.content)

Layout Detection

from omnidocs import Document
from omnidocs.tasks.layout_extraction import DocLayoutYOLO, DocLayoutYOLOConfig

doc = Document.from_pdf("paper.pdf")

detector = DocLayoutYOLO(config=DocLayoutYOLOConfig(device="cuda"))
result = detector.extract(doc.get_page(0))

for box in result.bboxes:
    print(f"{box.label.value}: {box.confidence:.2f}")

Table Extraction

from omnidocs.tasks.table_extraction import TableFormerExtractor, TableFormerConfig

extractor = TableFormerExtractor(config=TableFormerConfig(device="cuda"))
result = extractor.extract(table_image)

df = result.to_dataframe()
html = result.to_html()

Supported Tasks

Task	Description	Output
Text Extraction	Convert documents to Markdown/HTML	Formatted text
Layout Analysis	Detect titles, tables, figures, etc.	Bounding boxes + labels
OCR	Extract text with coordinates	Text blocks + positions
Table Extraction	Parse table structure	Cells, rows, columns
Structured Extraction	Extract typed data into Pydantic schemas	Validated model instances
Reading Order	Determine logical reading sequence	Ordered elements

Supported Models

Text Extraction

Model	Backends	Notes
VLM API	Any cloud API	Provider-agnostic via LiteLLM
Qwen3-VL	PyTorch, VLLM, MLX, API	Best quality
MinerU VL	PyTorch, VLLM, API	Layout-aware extraction
Nanonets OCR2	PyTorch, VLLM, MLX	Fast, accurate
Granite Docling	PyTorch, VLLM, MLX, API	IBM research model
DotsOCR	PyTorch, VLLM, API	Layout-aware

Layout Analysis

Model	Backends	Notes
VLM API	Any cloud API	Custom labels support
DocLayoutYOLO	PyTorch	Fast (0.1s/page)
RT-DETR	PyTorch	Transformer-based
Qwen Layout	PyTorch, VLLM, MLX, API	Custom labels
MinerU VL Layout	PyTorch, VLLM, API	High accuracy

Structured Extraction

Model	Backends	Notes
VLM API	Any cloud API	Pydantic schema output

OCR

Model	Backends	Notes
Tesseract	CPU	100+ languages
EasyOCR	PyTorch	80+ languages
PaddleOCR	PaddlePaddle	CJK optimized

Table Extraction

Model	Backends	Notes
TableFormer	PyTorch	Structure + content

Reading Order

Model	Backends	Notes
Rule-based	CPU	R-tree indexing

VLM API Providers

Use any VLM through a single config — just change the model string:

from omnidocs.vlm import VLMAPIConfig

# OpenRouter (100+ vision models)
config = VLMAPIConfig(model="openrouter/qwen/qwen3-vl-8b-instruct")

# Google Gemini
config = VLMAPIConfig(model="gemini/gemini-2.5-flash")

# Azure OpenAI
config = VLMAPIConfig(model="azure/gpt-5-mini", api_version="2024-12-01-preview")

# OpenAI
config = VLMAPIConfig(model="openai/gpt-4o")

# Any OpenAI-compatible API (ANANNAS AI, self-hosted VLLM, etc.)
config = VLMAPIConfig(
    model="openai/model-name",
    api_base="https://your-provider.com/v1",
)

See the VLM API docs for full provider setup and model lists.

Multi-Backend Support

All VLM models support multiple inference backends:

# PyTorch (local GPU)
from omnidocs.tasks.text_extraction.qwen import QwenTextPyTorchConfig
config = QwenTextPyTorchConfig(model="Qwen/Qwen3-VL-8B-Instruct", device="cuda")

# VLLM (high-throughput)
from omnidocs.tasks.text_extraction.qwen import QwenTextVLLMConfig
config = QwenTextVLLMConfig(model="Qwen/Qwen3-VL-8B-Instruct", tensor_parallel_size=2)

# MLX (Apple Silicon)
from omnidocs.tasks.text_extraction.qwen import QwenTextMLXConfig
config = QwenTextMLXConfig(model="mlx-community/Qwen3-VL-8B-Instruct-4bit")

# API (provider-agnostic via litellm)
from omnidocs.tasks.text_extraction.qwen import QwenTextAPIConfig
config = QwenTextAPIConfig(model="openrouter/qwen/qwen3-vl-8b-instruct")

Document Loading

from omnidocs import Document

# From file
doc = Document.from_pdf("file.pdf", page_range=(0, 9))

# From URL
doc = Document.from_url("https://arxiv.org/pdf/1706.03762")

# From images
doc = Document.from_images(["page1.png", "page2.png"])

# Access pages
image = doc.get_page(0)  # PIL Image

Roadmap

See the full Roadmap for planned features.

Coming soon:

Math Recognition (LaTeX extraction)
Chart Understanding
Surya OCR + Layout

Contributing

Contributions are welcome! See our Contributing Guide to get started.

# Setup
git clone https://github.com/adithya-s-k/Omnidocs.git
cd Omnidocs && uv sync

# Test
uv run pytest tests/ -v

# Lint
uv run ruff check . && uv run ruff format .

# Docs
uv run mkdocs serve

Resources:

License

Apache 2.0 — See LICENSE for details.

Docs • Issues • PyPI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.9

Mar 19, 2026

0.2.8

Feb 10, 2026

0.2.7

Feb 7, 2026

0.2.6

Feb 3, 2026

0.2.5

Feb 1, 2026

0.2.4

Jan 30, 2026

0.2.3

Jan 29, 2026

0.2.2

Jan 19, 2026

0.2.1

Jan 19, 2026

0.1.5

Aug 6, 2025

0.1.1

Nov 26, 2024

0.1.0

Nov 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnidocs-0.2.9.tar.gz (9.5 MB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omnidocs-0.2.9-py3-none-any.whl (193.1 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file omnidocs-0.2.9.tar.gz.

File metadata

Download URL: omnidocs-0.2.9.tar.gz
Upload date: Mar 19, 2026
Size: 9.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnidocs-0.2.9.tar.gz
Algorithm	Hash digest
SHA256	`0670991cb528425f2cbd6a6c01e2f0dbd6da96a65117d576b23cf365db1daa94`
MD5	`6e7531e1dd32b92dc2a725db6238ce01`
BLAKE2b-256	`3d304c61259775d24c470daea2d14cd096649b700bb1b1d77eadb02c78de6578`

See more details on using hashes here.

File details

Details for the file omnidocs-0.2.9-py3-none-any.whl.

File metadata

Download URL: omnidocs-0.2.9-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 193.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnidocs-0.2.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c6a5d37e8b7cc346e94d08c6dddc9edf58bc5f5839a1b9ad3c847e041122908a`
MD5	`869bda57b85c4ddaf170d084c2b03282`
BLAKE2b-256	`f8fbfd0b57624e0cc0e68b028db4d23e62cff3f7208898957743c2264829bdf9`

See more details on using hashes here.

omnidocs 0.2.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OmniDocs

Installation

Quick Start

VLM API (No GPU Required)

Structured Extraction

Text Extraction (Local GPU)

Layout Detection

Table Extraction

Supported Tasks

Supported Models

Text Extraction

Layout Analysis

Structured Extraction

OCR

Table Extraction

Reading Order

VLM API Providers

Multi-Backend Support

Document Loading

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes