256M-param document AI: OCR, VQA, form extraction, table parsing. Runs on CPU. Apache 2.0.

These details have not been verified by PyPI

Project description

TinyDoc-VLM

256M-Parameter Document-Specialist Vision-Language Model

SigLIP-B/16 + PixelShuffle 3× + SmolLM2-135M · OCR, VQA, Form Extraction, Table Parsing

Apache 2.0 · Runs on CPU · <1GB VRAM · LoRA Fine-tuning

Built by eulogik — AI infrastructure for document intelligence.

PyPI · Model Hub · LoRA Checkpoint · Live Demo

What is TinyDoc-VLM?

TinyDoc-VLM is an open-source document understanding AI that reads invoices, receipts, forms, tables, and charts. At just 256M parameters, it runs on a MacBook, Raspberry Pi 5, or any CPU — no GPU required.

Use cases: Invoice processing, receipt scanning, form data extraction, table parsing, document Q&A, OCR, visual question answering.

Highlights

256M params — SigLIP-B/16 vision encoder (93M) + PixelShuffle 3× compressor + SmolLM2-135M decoder
<1GB VRAM — Runs on MacBook Air, Raspberry Pi 5, or any CPU with ONNX
Structured output — JSON extraction, key-value pairs, table parsing, OCR, VQA
LoRA fine-tuning — Train on your own docs with 2.7M trainable params (0.93% of total)
Apache 2.0 — Fully open-source, free for commercial use
ONNX export — Deploy anywhere with ONNX Runtime

Quick Start

Install

pip install tinydoc

Python SDK

from PIL import Image
from tinydoc import TinyDocExtractor

extractor = TinyDocExtractor(device="cpu")

# Ask a question about a document
img = Image.open("invoice.png")
result = extractor.ask(img, "What is the total?")
print(result.answer)  # "$1,234.56"

# Extract structured JSON
result = extractor.extract(img, output_format="json")
print(result.fields)  # {"total": "$1,234.56", "date": "2024-01-15", ...}

# Extract tables
result = extractor.extract_table(img)
print(result.markdown)  # Markdown-formatted table

Direct Model Access

from tinydoc_vlm import TinyDocVLMForConditionalGeneration, TinyDocVLMProcessor

model = TinyDocVLMForConditionalGeneration.from_pretrained("eulogik/TinyDoc-VLM-256M")
processor = TinyDocVLMProcessor()

Model Architecture

Image (384×384)
    ↓
SigLIP Vision Encoder (93M)          ← 576 patches × 768 dim
    ↓
Pixel-Shuffle Compressor (scale=3)   ← 9× compression → 64 tokens
    ↓
Visual Position Embeddings
    ↓
SmolLM2 Decoder (135M)               ← 30 layers, GQA (9:3 heads), 8192 ctx
    ↓
Multi-Task Output Heads
    ↓
JSON / KV Extraction / Table / OCR / QA

Total: 256M parameters | Vision: 93M | Compressor: 3M | Decoder: 135M | Heads: 25M

LoRA Fine-tuning

Train TinyDoc-VLM on your own documents using LoRA. Only 2.7M params (0.93%) are trained.

M4 Mac (overnight run)

# Generate 3K synthetic documents
python data/synthetic/generator.py --num-docs 3000 --output-dir data/synthetic/output

# Train for 17K steps (~15 hours on M4)
python training/fast_train.py \
    --manifest data/synthetic/output/manifest.jsonl \
    --data-root data/synthetic \
    --steps 17000 --batch-size 1 --grad-accum 4 --device mps

# Or use the one-liner
bash training/m4_train.sh 17000

Colab Free T4

Open training/colab_train.ipynb — complete pipeline in one notebook (~1 hour for 5K steps).

Training Results

Metric	Value
Best checkpoint	Step 14,000 (loss: 15.0)
Training data	3,000 synthetic docs (6,815 QA pairs)
Training time	15.1 hours on M4
LoRA rank	16 (alpha: 32)

Deployment

ONNX Export

python export/export_onnx.py --model-path eulogik/TinyDoc-VLM-256M --output model.onnx

ONNX models on HF Hub:

tinydoc-vlm-vision.onnx — Vision encoder (33KB)
tinydoc-vlm-compressor.onnx — Token compressor (31KB)
tinydoc-vlm-decoder.onnx — Language decoder (59MB)

HuggingFace Spaces

Live demo: huggingface.co/spaces/eulogik/TinyDoc-VLM

Benchmarks

Benchmark	Status	Target
OCRBench	In progress	>75%
DocVQA	Pending	>85%
FUNSD	Pending	>95%
CORD	Pending	>95%

Full analysis in docs/BENCHMARKS.md.

Package Structure

Package	Location	Description
`tinydoc`	PyPI	Python SDK — `TinyDocExtractor.ask()`, `.extract()`, `.extract_table()`
`tinydoc-vlm`	GitHub	Full model code, training pipeline, synthetic data engine, evaluation suite
`TinyDoc-VLM-256M`	HF Hub	Pre-trained weights — 1.1GB, loads via `from_pretrained()`
`TinyDoc-VLM-LoRA`	HF Hub	LoRA adapter — 10MB, merge with base model

Links

Resource	URL
GitHub	github.com/eulogik/TinyDoc-VLM
PyPI	pypi.org/project/tinydoc
Model Hub	huggingface.co/eulogik/TinyDoc-VLM-256M
LoRA Checkpoint	huggingface.co/eulogik/TinyDoc-VLM-LoRA
Live Demo	huggingface.co/spaces/eulogik/TinyDoc-VLM
Documentation	eulogik.github.io/TinyDoc-VLM

Launch Assets

Document	Description
HN Post	Hacker News Show HN draft
Reddit Post	r/LocalLLaMA, r/MachineLearning
Twitter Thread	7-tweet launch thread
Pitch Deck	Enterprise one-pager

Citation

@software{eulogik_tinydoc_vlm_2026,
  author = {eulogik},
  title = {TinyDoc-VLM: 256M-Param Document-Specialist Vision-Language Model},
  year = {2026},
  url = {https://github.com/eulogik/TinyDoc-VLM}
}

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

Apache 2.0. See LICENSE for details.

Made with by eulogik

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinydoc_vlm-0.2.0.tar.gz (30.1 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tinydoc_vlm-0.2.0-py3-none-any.whl (27.9 kB view details)

Uploaded Jun 27, 2026 Python 3

File details

Details for the file tinydoc_vlm-0.2.0.tar.gz.

File metadata

Download URL: tinydoc_vlm-0.2.0.tar.gz
Upload date: Jun 27, 2026
Size: 30.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for tinydoc_vlm-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`26fc858ed53fd5488db63b4c9a63474f5042dc2c63b49bc6863a795a8d95eaac`
MD5	`954fa52e01e050e62071c6a3e2025bd7`
BLAKE2b-256	`81bbf4f667c44aba4dc3c9a883538e2f0ab1dd475c9535d3d8431c9cefb5a313`

See more details on using hashes here.

File details

Details for the file tinydoc_vlm-0.2.0-py3-none-any.whl.

File metadata

Download URL: tinydoc_vlm-0.2.0-py3-none-any.whl
Upload date: Jun 27, 2026
Size: 27.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for tinydoc_vlm-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dced6548cb7556acef6579f15d6b9424354215b3bd0ec1150cc345ceab8f7663`
MD5	`5664b1cb6eff608cd231db1a5f15089f`
BLAKE2b-256	`b0e6187d7ac4b5215bd5c57e7fbdfd652ce7c007336691a02724bac1800decf1`

See more details on using hashes here.

tinydoc-vlm 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

TinyDoc-VLM

What is TinyDoc-VLM?

Highlights

Quick Start

Install

Python SDK

Direct Model Access

Model Architecture

LoRA Fine-tuning

M4 Mac (overnight run)

Colab Free T4

Training Results

Deployment

ONNX Export

HuggingFace Spaces

Benchmarks

Package Structure

Links

Launch Assets

Citation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes