256M-param document AI: OCR, VQA, form extraction, table parsing. Runs on CPU. Apache 2.0.
Project description
TinyDoc-VLM
256M-Parameter Document-Specialist Vision-Language Model
SigLIP-B/16 + PixelShuffle 3× + SmolLM2-135M · OCR, VQA, Form Extraction, Table Parsing
Apache 2.0 · Runs on CPU · <1GB VRAM · LoRA Fine-tuning
Built by eulogik — AI infrastructure for document intelligence.
What is TinyDoc-VLM?
TinyDoc-VLM is an open-source document understanding AI that reads invoices, receipts, forms, tables, and charts. At just 256M parameters, it runs on a MacBook, Raspberry Pi 5, or any CPU — no GPU required.
Use cases: Invoice processing, receipt scanning, form data extraction, table parsing, document Q&A, OCR, visual question answering.
Highlights
- 256M params — SigLIP-B/16 vision encoder (93M) + PixelShuffle 3× compressor + SmolLM2-135M decoder
- <1GB VRAM — Runs on MacBook Air, Raspberry Pi 5, or any CPU with ONNX
- Structured output — JSON extraction, key-value pairs, table parsing, OCR, VQA
- LoRA fine-tuning — Train on your own docs with 2.7M trainable params (0.93% of total)
- Apache 2.0 — Fully open-source, free for commercial use
- ONNX export — Deploy anywhere with ONNX Runtime
Quick Start
Install
pip install tinydoc
Python SDK
from PIL import Image
from tinydoc import TinyDocExtractor
extractor = TinyDocExtractor(device="cpu")
# Ask a question about a document
img = Image.open("invoice.png")
result = extractor.ask(img, "What is the total?")
print(result.answer) # "$1,234.56"
# Extract structured JSON
result = extractor.extract(img, output_format="json")
print(result.fields) # {"total": "$1,234.56", "date": "2024-01-15", ...}
# Extract tables
result = extractor.extract_table(img)
print(result.markdown) # Markdown-formatted table
Direct Model Access
from tinydoc_vlm import TinyDocVLMForConditionalGeneration, TinyDocVLMProcessor
model = TinyDocVLMForConditionalGeneration.from_pretrained("eulogik/TinyDoc-VLM-256M")
processor = TinyDocVLMProcessor()
Model Architecture
Image (384×384)
↓
SigLIP Vision Encoder (93M) ← 576 patches × 768 dim
↓
Pixel-Shuffle Compressor (scale=3) ← 9× compression → 64 tokens
↓
Visual Position Embeddings
↓
SmolLM2 Decoder (135M) ← 30 layers, GQA (9:3 heads), 8192 ctx
↓
Multi-Task Output Heads
↓
JSON / KV Extraction / Table / OCR / QA
Total: 256M parameters | Vision: 93M | Compressor: 3M | Decoder: 135M | Heads: 25M
LoRA Fine-tuning
Train TinyDoc-VLM on your own documents using LoRA. Only 2.7M params (0.93%) are trained.
M4 Mac (overnight run)
# Generate 3K synthetic documents
python data/synthetic/generator.py --num-docs 3000 --output-dir data/synthetic/output
# Train for 17K steps (~15 hours on M4)
python training/fast_train.py \
--manifest data/synthetic/output/manifest.jsonl \
--data-root data/synthetic \
--steps 17000 --batch-size 1 --grad-accum 4 --device mps
# Or use the one-liner
bash training/m4_train.sh 17000
Colab Free T4
Open training/colab_train.ipynb — complete pipeline in one notebook (~1 hour for 5K steps).
Training Results
| Metric | Value |
|---|---|
| Best checkpoint | Step 14,000 (loss: 15.0) |
| Training data | 3,000 synthetic docs (6,815 QA pairs) |
| Training time | 15.1 hours on M4 |
| LoRA rank | 16 (alpha: 32) |
Deployment
ONNX Export
python export/export_onnx.py --model-path eulogik/TinyDoc-VLM-256M --output model.onnx
ONNX models on HF Hub:
tinydoc-vlm-vision.onnx— Vision encoder (33KB)tinydoc-vlm-compressor.onnx— Token compressor (31KB)tinydoc-vlm-decoder.onnx— Language decoder (59MB)
HuggingFace Spaces
Live demo: huggingface.co/spaces/eulogik/TinyDoc-VLM
Benchmarks
| Benchmark | Status | Target |
|---|---|---|
| OCRBench | In progress | >75% |
| DocVQA | Pending | >85% |
| FUNSD | Pending | >95% |
| CORD | Pending | >95% |
Full analysis in docs/BENCHMARKS.md.
Package Structure
| Package | Location | Description |
|---|---|---|
tinydoc |
PyPI | Python SDK — TinyDocExtractor.ask(), .extract(), .extract_table() |
tinydoc-vlm |
GitHub | Full model code, training pipeline, synthetic data engine, evaluation suite |
TinyDoc-VLM-256M |
HF Hub | Pre-trained weights — 1.1GB, loads via from_pretrained() |
TinyDoc-VLM-LoRA |
HF Hub | LoRA adapter — 10MB, merge with base model |
Links
| Resource | URL |
|---|---|
| GitHub | github.com/eulogik/TinyDoc-VLM |
| PyPI | pypi.org/project/tinydoc |
| Model Hub | huggingface.co/eulogik/TinyDoc-VLM-256M |
| LoRA Checkpoint | huggingface.co/eulogik/TinyDoc-VLM-LoRA |
| Live Demo | huggingface.co/spaces/eulogik/TinyDoc-VLM |
| Documentation | eulogik.github.io/TinyDoc-VLM |
Launch Assets
| Document | Description |
|---|---|
| HN Post | Hacker News Show HN draft |
| Reddit Post | r/LocalLLaMA, r/MachineLearning |
| Twitter Thread | 7-tweet launch thread |
| Pitch Deck | Enterprise one-pager |
Citation
@software{eulogik_tinydoc_vlm_2026,
author = {eulogik},
title = {TinyDoc-VLM: 256M-Param Document-Specialist Vision-Language Model},
year = {2026},
url = {https://github.com/eulogik/TinyDoc-VLM}
}
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
License
Apache 2.0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinydoc_vlm-0.2.0.tar.gz.
File metadata
- Download URL: tinydoc_vlm-0.2.0.tar.gz
- Upload date:
- Size: 30.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26fc858ed53fd5488db63b4c9a63474f5042dc2c63b49bc6863a795a8d95eaac
|
|
| MD5 |
954fa52e01e050e62071c6a3e2025bd7
|
|
| BLAKE2b-256 |
81bbf4f667c44aba4dc3c9a883538e2f0ab1dd475c9535d3d8431c9cefb5a313
|
File details
Details for the file tinydoc_vlm-0.2.0-py3-none-any.whl.
File metadata
- Download URL: tinydoc_vlm-0.2.0-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dced6548cb7556acef6579f15d6b9424354215b3bd0ec1150cc345ceab8f7663
|
|
| MD5 |
5664b1cb6eff608cd231db1a5f15089f
|
|
| BLAKE2b-256 |
b0e6187d7ac4b5215bd5c57e7fbdfd652ce7c007336691a02724bac1800decf1
|