Skip to main content

Local-first CLI that converts academic PDFs into LLM-friendly text-only Markdown using llama.cpp (DeepSeek-OCR + Gemma 4).

Project description

inscriber

Convert academic PDFs into clean, LLM-friendly text-only Markdown — entirely on your own machine.

inscriber is a cross-platform command-line tool that runs OCR and figure description locally with llama.cpp: DeepSeek-OCR reads the text and locates the figures, and a Gemma 4 vision model turns each figure into a textual description and restructures tables. It is the local, offline-first reimagining of the cloud web app paper2llm.

What it does

Given a PDF (local file, or a URL from a supported paper repository), inscriber writes:

  • A full Markdown file — the paper's text, equations, and tables, with each figure replaced by a generated textual description (or kept alongside it with --figure-mode describe-and-keep).
  • Clean tables. DeepSeek-OCR emits tables as degenerate HTML; by default the VLM restructures each one into a Markdown pipe table, reading the true layout from a cropped image of the table (or the full page image when no table box was detected) while preserving the OCR values. On any failure the raw OCR table is kept. Disable with --no-table-refine.
  • Split files — the document divided into main, appendix, and backmatter parts (disable with --no-split).
  • A BibTeX entry for the paper, when the document is judged citable (default auto mode; for arXiv inputs it prefers the published version of the preprint when one exists). Online lookups send only the extracted title or arXiv ID — never the document — and under --offline it degrades to a clearly-marked, fully-local best-effort entry. --bibtex-mode off disables it; --bibtex forces the classic always-look-up mode.

⚠️ Accuracy. The output is a best-effort machine transcription, not a faithful copy, and will contain errors. Body text and table values are generally reliable; errors take the form of missing/repeated sentences near non-standard page formatting, typos, missing pieces and structural issues in complex equations or tables, or image descriptions missing or misrepresenting parts of the figure. An LLM consuming the Markdown tolerates this noise well; for critical use, you must verify against the PDF — the /inscribe skill automates that step. Every generated file ends with a short notice saying it was machine-transcribed.

Results are cached (content-addressed, per page / figure / table), so re-running with different output options takes seconds. Cache keys cover the model files, prompts, and the llama.cpp build, so swapping or upgrading any of them recomputes instead of serving stale results. A two-step mode runs OCR once and lets you compare different VLMs on the identical OCR text and figure crops (see Usage).

Requirements

  • Python 3.10+ on Windows, Linux, or macOS
  • llama.cpp (the llama-server binary)
  • Two multimodal GGUF model pairs, ~9–12 GB total depending on quant (download links below)
  • A GPU helps a lot but is not required. Reference setup: a laptop RTX 4060 with 8 GB VRAM.

Speed. On the reference setup (laptop RTX 4060, 8 GB VRAM), OCR takes ~20–25 seconds per page (a 39-page paper ≈ 15 minutes), and the VLM passes then take ~20–40 seconds per table and per figure — so a long paper can run 30–40 minutes end to end. Runtime has not been thoroughly optimized; the Q8_0 DeepSeek-OCR quant might speed OCR up at the risk of some quality loss, and --ocr-resolution large is a faster OCR setting for simple documents (see Options). Cached re-runs take seconds.

Install

pip install inscriber

Or install the latest development version from source:

pip install git+https://github.com/lacerbi/inscriber.git

Setup

Quick path: install llama.cpp (step 1 below), then let inscriber setup do steps 2–3 for you —

inscriber setup --llama-bin-dir /path/to/llama.cpp/bin

downloads the recommended models below (~12 GB; --deepseek-quant q8_0 picks the smaller DeepSeek pair, ~9 GB) into the platform data dir, verifies each file against pinned checksums, and writes a ready-to-run config to the platform config dir. Interrupted downloads resume on re-run; already-complete files are verified and skipped. Prefer manual control? Steps 2–3 below do the same by hand.

1. llama.cpp

Download a prebuilt release from github.com/ggml-org/llama.cpp/releases/latest — pick the variant matching your GPU backend (CUDA / Vulkan / Metal / CPU) — or build from source. inscriber only needs the directory containing llama-server (llama-server.exe on Windows).

⚠️ Use build 9587 or newer — older builds handle DeepSeek-OCR images differently and would misplace figure crops, so inscriber refuses to run OCR against them.

2. Models

A multimodal model in llama.cpp is two GGUF files: the text/decoder model and a multimodal projector (mmproj). inscriber uses two such pairs — DeepSeek-OCR for OCR + figure grounding, Gemma 4 for figure descriptions and table restructuring:

model role download
DeepSeek-OCR BF16 (recommended) OCR + figure grounding model (5.9 GB) · mmproj (0.8 GB)
DeepSeek-OCR Q8_0 (smaller, also verified) OCR + figure grounding model (3.1 GB) · mmproj (0.4 GB)
Gemma 4 E4B QAT Q4_K_XL figure description + tables model (4.2 GB) · mmproj (1.0 GB)

(Sizes are decimal GB, matching what inscriber setup prints while downloading.)

⚠️ Keep DeepSeek-OCR at BF16 or Q8_0 — Q4_K_M causes runaway repetition loops. The Gemma side has no such restriction: any reasonable quant works, and larger Gemma 4 variants are fine if you have the hardware.

Sources: sabafallah/DeepSeek-OCR-GGUF, ggml-org/DeepSeek-OCR-GGUF, unsloth/gemma-4-E4B-it-qat-GGUF. unsloth's projector file is literally named mmproj-BF16.gguf — consider renaming it to something Gemma-specific if you keep models from several families in one folder (inscriber setup does this automatically, saving it as mmproj-gemma-4-E4B-it-qat-BF16.gguf).

3. Configuration

Copy config.example.toml to config.toml in the directory you run from (or the platform config dir, e.g. %APPDATA%\inscriber\config.toml on Windows, ~/.config/inscriber/config.toml on Linux) and fill in the llama.cpp bin_dir and the four model paths (inscriber setup writes these for you). Every config field is also overridable from the CLI (precedence: CLI flag > config file > built-in default).

Usage

inscriber paper.pdf -o out/                          # end-to-end
inscriber https://arxiv.org/abs/2510.18234 -o out/   # paper-repository URL

URL input supports arXiv, OpenReview, ACL Anthology, bioRxiv, medRxiv, NeurIPS, and PMLR; for anything else, download the PDF and pass the local path.

Outputs in out/ (here for a paper whose BibTeX citation key is chang2025amortized):

chang2025amortized_full.md        # full document
chang2025amortized_main.md        # split parts (as detected)
chang2025amortized_appendix.md
chang2025amortized_backmatter.md
chang2025amortized.bib            # when judged citable (--bibtex-mode off disables)
figures/                          # with --figure-mode describe-and-keep

Naming: when a BibTeX entry is produced (the default auto mode does this for any citable paper), its citation key — authorYEARfirstword, e.g. chang2025amortized — becomes the base name, giving you library-style filenames for free. Otherwise the name derives from the source (PDF stem or repository filename). --name NAME pins an explicit base name; --no-bibtex-name disables key-derived naming; --no-full-suffix writes the full document as chang2025amortized.md instead of …_full.md (handy with --no-split or a one-file-per-paper library). Every run logs which name was chosen.

Two-step: compare VLMs on identical OCR

inscriber ocr paper.pdf -o out/                # OCR once → portable bundle
inscriber describe out/paper.inscriber-ocr     # VLM passes + assembly + write
inscriber describe out/paper.inscriber-ocr --vlm-model other.gguf --vlm-mmproj other-mmproj.gguf

ocr writes an inspectable bundle (manifest.json, cropped figures/, and pages/ rasters for pages with tables); describe runs the VLM stages from it with no OCR model loaded, so each VLM sees the identical input. The bundle's per-page markdown is hand-editable — fix an OCR glitch once, then try N VLMs.

Fix the splits, regenerate the full document

All outputs are plain Markdown. To correct an OCR/VLM slip after the fact, edit the split files and rebuild the full document from them — no models needed:

inscriber join out/paper      # paper_main/_appendix/_backmatter.md → paper_full.md

join strips each split's footer notice (and the BibTeX block, if injected), concatenates main → appendix → backmatter, and re-applies the framing — so each fix is made once, not once per file. Note the regenerated {base}_full.md uses the combined ordering (appendix before backmatter, under # Title - Appendix- style headings), which may differ from the original document order.

Convert + verify with agent skills

The repo ships assistant skills for the convert-and-verify workflow:

When run inside this repository, the skill takes a PDF path or URL (plus any options in plain words), runs inscriber, then checks the transcription against the source PDF in ≤10-page chunks with parallel subagents briefed on the known failure modes (table cells, subscripts, equations, truncated pages, figure descriptions), applies the fixes that matter to the split files, and rejoins them with inscriber join. Say "no verification" to stop after the conversion.

Options

inscriber --help shows the full surface; every config.example.toml field has a matching flag. Highlights:

flag meaning
--ocr-resolution {tiny,small,base,large,gundam} OCR render quality (default gundam, 2048px; large is faster)
--figure-mode {describe-only,describe-and-keep,placeholder} how figures render
--no-figures skip figure detection and description entirely
--no-table-refine keep raw OCR tables (skip VLM restructuring)
--name NAME / --no-bibtex-name explicit output base name / never name by BibTeX citation key
--no-full-suffix full document as {base}.md instead of {base}_full.md
--no-split / --page-numbers / --page-separators output options
--pages RANGE page selection, e.g. "1-10", "3", "5-"
--bibtex-mode {off,on,auto} / --bibtex-in-doc BibTeX mode (default auto; --bibtexon)
--offline no network: URL input + online BibTeX sources disabled
--mode {sequential,concurrent} one model resident at a time (default) vs. both (needs VRAM)
--no-cache / --refresh cache control

GPU offload is automatic by default (n_gpu_layers = "auto" lets llama.cpp fit as many layers into VRAM as it can); override per server with --ocr-ngl / --vlm-ngl (all, 0 for CPU, or a layer count). --ocr-resolution is the main speed/quality lever: the gundam default renders pages at 2048px, which measurably reduces OCR misreads of small subscripts and digits; --ocr-resolution large is ~20% faster and fine for simple documents. --ctx (default 16384) sizes the context window that prompt and generation share — complex tables need headroom for the VLM's reasoning, so don't shrink it without reason.

Privacy / offline

Your documents and figures never leave your machine — they go only to your own llama.cpp server on 127.0.0.1, never to any cloud model. The only features that touch the network are (1) downloading a PDF when the input is a URL and (2) the BibTeX citation lookups (on by default), which send only the extracted title or arXiv ID to citation APIs — never the document. Both are hard-disabled by --offline (BibTeX then degrades to a clearly-marked, fully-local best-effort entry). No telemetry, no persisted secrets.

Development

git clone https://github.com/lacerbi/inscriber && cd inscriber
python -m venv .venv
.venv/Scripts/activate          # Windows;  source .venv/bin/activate elsewhere
pip install -e ".[dev]"
pytest                          # mocked inference — no GPU or models needed
ruff check

Contributor guidance lives in AGENTS.md; the full technical specification is DESIGN.md.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inscriber-0.1.0.tar.gz (249.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inscriber-0.1.0-py3-none-any.whl (121.0 kB view details)

Uploaded Python 3

File details

Details for the file inscriber-0.1.0.tar.gz.

File metadata

  • Download URL: inscriber-0.1.0.tar.gz
  • Upload date:
  • Size: 249.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inscriber-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e3145fe88e83962d5cf553d55de914c6e459179a6aedd927bb24158b4ccd134a
MD5 049f9d37cd1a51653562c5e29f45fd85
BLAKE2b-256 4ee41421577683a9230aa58cd0497be0a95f867704424a0f1e554934e56ace0a

See more details on using hashes here.

Provenance

The following attestation bundles were made for inscriber-0.1.0.tar.gz:

Publisher: release.yml on lacerbi/inscriber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file inscriber-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: inscriber-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 121.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inscriber-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 739f4630e48fb3bc55327f1568f68e95353109452768424b6ea11b52c2d34e55
MD5 7942bfb85893f0ab7063f732ae1b0600
BLAKE2b-256 205898b7ff05c936d80ae6f0b1fa8464013eaa0dbc621388fc8bcfda690c8c15

See more details on using hashes here.

Provenance

The following attestation bundles were made for inscriber-0.1.0-py3-none-any.whl:

Publisher: release.yml on lacerbi/inscriber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page