Skip to main content

256M-param document AI: OCR, VQA, form extraction, table parsing. Runs on CPU. Apache 2.0.

Project description

TinyDoc-VLM

256M-Parameter Document-Specialist Vision-Language Model

SigLIP-B/16 + PixelShuffle 3× + SmolLM2-135M · OCR, VQA, Form Extraction, Table Parsing

Apache 2.0 · Runs on CPU · <1GB VRAM · LoRA Fine-tuning

PyPI HF Model HF LoRA HF Space GitHub License CI


Built by eulogik — AI infrastructure for document intelligence.

PyPI · Model Hub · LoRA Checkpoint · Live Demo

What is TinyDoc-VLM?

TinyDoc-VLM is an open-source document understanding AI that reads invoices, receipts, forms, tables, and charts. At just 256M parameters, it runs on a MacBook, Raspberry Pi 5, or any CPU — no GPU required.

Use cases: Invoice processing, receipt scanning, form data extraction, table parsing, document Q&A, OCR, visual question answering.

Highlights

  • 256M params — SigLIP-B/16 vision encoder (93M) + PixelShuffle 3× compressor + SmolLM2-135M decoder
  • <1GB VRAM — Runs on MacBook Air, Raspberry Pi 5, or any CPU with ONNX
  • Structured output — JSON extraction, key-value pairs, table parsing, OCR, VQA
  • LoRA fine-tuning — Train on your own docs with 2.7M trainable params (0.93% of total)
  • Apache 2.0 — Fully open-source, free for commercial use
  • ONNX export — Deploy anywhere with ONNX Runtime

Quick Start

Install

pip install tinydoc

Python SDK

from PIL import Image
from tinydoc import TinyDocExtractor

extractor = TinyDocExtractor(device="cpu")

# Ask a question about a document
img = Image.open("invoice.png")
result = extractor.ask(img, "What is the total?")
print(result.answer)  # "$1,234.56"

# Extract structured JSON
result = extractor.extract(img, output_format="json")
print(result.fields)  # {"total": "$1,234.56", "date": "2024-01-15", ...}

# Extract tables
result = extractor.extract_table(img)
print(result.markdown)  # Markdown-formatted table

Direct Model Access

from tinydoc_vlm import TinyDocVLMForConditionalGeneration, TinyDocVLMProcessor

model = TinyDocVLMForConditionalGeneration.from_pretrained("eulogik/TinyDoc-VLM-256M")
processor = TinyDocVLMProcessor()

Model Architecture

Image (384×384)
    ↓
SigLIP Vision Encoder (93M)          ← 576 patches × 768 dim
    ↓
Pixel-Shuffle Compressor (scale=3)   ← 9× compression → 64 tokens
    ↓
Visual Position Embeddings
    ↓
SmolLM2 Decoder (135M)               ← 30 layers, GQA (9:3 heads), 8192 ctx
    ↓
Multi-Task Output Heads
    ↓
JSON / KV Extraction / Table / OCR / QA

Total: 256M parameters | Vision: 93M | Compressor: 3M | Decoder: 135M | Heads: 25M

LoRA Fine-tuning

Train TinyDoc-VLM on your own documents using LoRA. Only 2.7M params (0.93%) are trained.

M4 Mac (overnight run)

# Generate 3K synthetic documents
python data/synthetic/generator.py --num-docs 3000 --output-dir data/synthetic/output

# Train for 17K steps (~15 hours on M4)
python training/fast_train.py \
    --manifest data/synthetic/output/manifest.jsonl \
    --data-root data/synthetic \
    --steps 17000 --batch-size 1 --grad-accum 4 --device mps

# Or use the one-liner
bash training/m4_train.sh 17000

Colab Free T4

Open training/colab_train.ipynb — complete pipeline in one notebook (~1 hour for 5K steps).

Training Results

Metric Value
Best checkpoint Step 14,000 (loss: 15.0)
Training data 3,000 synthetic docs (6,815 QA pairs)
Training time 15.1 hours on M4
LoRA rank 16 (alpha: 32)

Deployment

ONNX Export

python export/export_onnx.py --model-path eulogik/TinyDoc-VLM-256M --output model.onnx

ONNX models on HF Hub:

  • tinydoc-vlm-vision.onnx — Vision encoder (33KB)
  • tinydoc-vlm-compressor.onnx — Token compressor (31KB)
  • tinydoc-vlm-decoder.onnx — Language decoder (59MB)

HuggingFace Spaces

Live demo: huggingface.co/spaces/eulogik/TinyDoc-VLM

Benchmarks

Benchmark Status Target
OCRBench In progress >75%
DocVQA Pending >85%
FUNSD Pending >95%
CORD Pending >95%

Full analysis in docs/BENCHMARKS.md.

Package Structure

Package Location Description
tinydoc PyPI Python SDK — TinyDocExtractor.ask(), .extract(), .extract_table()
tinydoc-vlm GitHub Full model code, training pipeline, synthetic data engine, evaluation suite
TinyDoc-VLM-256M HF Hub Pre-trained weights — 1.1GB, loads via from_pretrained()
TinyDoc-VLM-LoRA HF Hub LoRA adapter — 10MB, merge with base model

Links

Resource URL
GitHub github.com/eulogik/TinyDoc-VLM
PyPI pypi.org/project/tinydoc
Model Hub huggingface.co/eulogik/TinyDoc-VLM-256M
LoRA Checkpoint huggingface.co/eulogik/TinyDoc-VLM-LoRA
Live Demo huggingface.co/spaces/eulogik/TinyDoc-VLM
Documentation eulogik.github.io/TinyDoc-VLM

Launch Assets

Document Description
HN Post Hacker News Show HN draft
Reddit Post r/LocalLLaMA, r/MachineLearning
Twitter Thread 7-tweet launch thread
Pitch Deck Enterprise one-pager

Citation

@software{eulogik_tinydoc_vlm_2026,
  author = {eulogik},
  title = {TinyDoc-VLM: 256M-Param Document-Specialist Vision-Language Model},
  year = {2026},
  url = {https://github.com/eulogik/TinyDoc-VLM}
}

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

Apache 2.0. See LICENSE for details.


Made with by eulogik

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinydoc_vlm-0.2.0.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinydoc_vlm-0.2.0-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file tinydoc_vlm-0.2.0.tar.gz.

File metadata

  • Download URL: tinydoc_vlm-0.2.0.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for tinydoc_vlm-0.2.0.tar.gz
Algorithm Hash digest
SHA256 26fc858ed53fd5488db63b4c9a63474f5042dc2c63b49bc6863a795a8d95eaac
MD5 954fa52e01e050e62071c6a3e2025bd7
BLAKE2b-256 81bbf4f667c44aba4dc3c9a883538e2f0ab1dd475c9535d3d8431c9cefb5a313

See more details on using hashes here.

File details

Details for the file tinydoc_vlm-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tinydoc_vlm-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for tinydoc_vlm-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dced6548cb7556acef6579f15d6b9424354215b3bd0ec1150cc345ceab8f7663
MD5 5664b1cb6eff608cd231db1a5f15089f
BLAKE2b-256 b0e6187d7ac4b5215bd5c57e7fbdfd652ce7c007336691a02724bac1800decf1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page