Local-first CLI that converts academic PDFs into LLM-friendly text-only Markdown using llama.cpp (DeepSeek-OCR + Gemma 4).
Project description
inscriber
Convert academic PDFs into clean, LLM-friendly text-only Markdown — entirely on your own machine.
inscriber is a cross-platform command-line tool that runs OCR and figure
description locally with llama.cpp:
DeepSeek-OCR reads the text and locates the figures, and a Gemma 4 vision model
turns each figure into a textual description and restructures tables. It is the
local, offline-first reimagining of the cloud web app
paper2llm.
What it does
Given a PDF (local file, or a URL from a supported paper repository),
inscriber writes:
- A full Markdown file — the paper's text, equations, and tables, with each
figure replaced by a generated textual description (or kept alongside it with
--figure-mode describe-and-keep). - Clean tables. DeepSeek-OCR emits tables as degenerate HTML; by default the
VLM restructures each one into a Markdown pipe table, reading the true layout
from a cropped image of the table (or the full page image when no table box
was detected) while preserving the OCR values. On any failure the raw OCR
table is kept. Disable with
--no-table-refine. - Split files — the document divided into
main,appendix, andbackmatterparts (disable with--no-split). - A BibTeX entry for the paper, when the document is judged citable
(default
automode; for arXiv inputs it prefers the published version of the preprint when one exists). Online lookups send only the extracted title or arXiv ID — never the document — and under--offlineit degrades to a clearly-marked, fully-local best-effort entry.--bibtex-mode offdisables it;--bibtexforces the classic always-look-up mode.
⚠️ Accuracy. The output is a best-effort machine transcription, not a faithful copy, and will contain errors. Body text and table values are generally reliable; errors take the form of missing/repeated sentences near non-standard page formatting, typos, missing pieces and structural issues in complex equations or tables, or image descriptions missing or misrepresenting parts of the figure. An LLM consuming the Markdown tolerates this noise well; for critical use, you must verify against the PDF — the
/inscribeskill automates that step. Every generated file ends with a short notice saying it was machine-transcribed.
Results are cached (content-addressed, per page / figure / table), so re-running with different output options takes seconds. Cache keys cover the model files, prompts, and the llama.cpp build, so swapping or upgrading any of them recomputes instead of serving stale results. A two-step mode runs OCR once and lets you compare different VLMs on the identical OCR text and figure crops (see Usage).
Requirements
- Python 3.10+ on Windows, Linux, or macOS
- llama.cpp (the
llama-serverbinary) - Two multimodal GGUF model pairs, ~9–12 GB total depending on quant (download links below)
- A GPU helps a lot but is not required. Reference setup: a laptop RTX 4060 with 8 GB VRAM.
Speed. On the reference setup (laptop RTX 4060, 8 GB VRAM), OCR takes ~20–25 seconds per page (a 39-page paper ≈ 15 minutes), and the VLM passes then take ~20–40 seconds per table and per figure — so a long paper can run 30–40 minutes end to end. Runtime has not been thoroughly optimized; the Q8_0 DeepSeek-OCR quant might speed OCR up at the risk of some quality loss, and
--ocr-resolution largeis a faster OCR setting for simple documents (see Options). Cached re-runs take seconds.
Install
pip install inscriber
Or install the latest development version from source:
pip install git+https://github.com/lacerbi/inscriber.git
Setup
Quick path: install llama.cpp (step 1 below), then let inscriber setup do
steps 2–3 for you —
inscriber setup --llama-bin-dir /path/to/llama.cpp/bin
downloads the recommended models below (~12 GB; --deepseek-quant q8_0 picks
the smaller DeepSeek pair, ~9 GB) into the platform data dir, verifies each
file against pinned checksums, and writes a ready-to-run config to the
platform config dir. Interrupted downloads resume on re-run; already-complete
files are verified and skipped. Prefer manual control? Steps 2–3 below do the
same by hand.
1. llama.cpp
Download a prebuilt release from
github.com/ggml-org/llama.cpp/releases/latest
— pick the variant matching your GPU backend (CUDA / Vulkan / Metal / CPU) — or
build from source. inscriber only needs the directory containing
llama-server (llama-server.exe on Windows).
⚠️ Use build 9587 or newer — older builds handle DeepSeek-OCR images differently and would misplace figure crops, so
inscriberrefuses to run OCR against them.
2. Models
A multimodal model in llama.cpp is two GGUF files: the text/decoder model
and a multimodal projector (mmproj). inscriber uses two such pairs —
DeepSeek-OCR for OCR + figure grounding, Gemma 4 for figure descriptions and
table restructuring:
| model | role | download |
|---|---|---|
| DeepSeek-OCR BF16 (recommended) | OCR + figure grounding | model (5.9 GB) · mmproj (0.8 GB) |
| DeepSeek-OCR Q8_0 (smaller, also verified) | OCR + figure grounding | model (3.1 GB) · mmproj (0.4 GB) |
| Gemma 4 E4B QAT Q4_K_XL | figure description + tables | model (4.2 GB) · mmproj (1.0 GB) |
(Sizes are decimal GB, matching what inscriber setup prints while downloading.)
⚠️ Keep DeepSeek-OCR at BF16 or Q8_0 — Q4_K_M causes runaway repetition loops. The Gemma side has no such restriction: any reasonable quant works, and larger Gemma 4 variants are fine if you have the hardware.
Sources: sabafallah/DeepSeek-OCR-GGUF,
ggml-org/DeepSeek-OCR-GGUF,
unsloth/gemma-4-E4B-it-qat-GGUF.
unsloth's projector file is literally named mmproj-BF16.gguf — consider
renaming it to something Gemma-specific if you keep models from several
families in one folder (inscriber setup does this automatically, saving it
as mmproj-gemma-4-E4B-it-qat-BF16.gguf).
3. Configuration
Copy config.example.toml to config.toml in the
directory you run from (or the platform config dir, e.g.
%APPDATA%\inscriber\config.toml on Windows, ~/.config/inscriber/config.toml
on Linux) and fill in the llama.cpp bin_dir and the four model paths
(inscriber setup writes these for you). Every config field is also
overridable from the CLI (precedence: CLI flag > config file > built-in
default).
Usage
inscriber paper.pdf -o out/ # end-to-end
inscriber https://arxiv.org/abs/2510.18234 -o out/ # paper-repository URL
URL input supports arXiv, OpenReview, ACL Anthology, bioRxiv, medRxiv, NeurIPS, and PMLR; for anything else, download the PDF and pass the local path.
Outputs in out/ (here for a paper whose BibTeX citation key is
chang2025amortized):
chang2025amortized_full.md # full document
chang2025amortized_main.md # split parts (as detected)
chang2025amortized_appendix.md
chang2025amortized_backmatter.md
chang2025amortized.bib # when judged citable (--bibtex-mode off disables)
figures/ # with --figure-mode describe-and-keep
Naming: when a BibTeX entry is produced (the default auto mode does this
for any citable paper), its citation key — authorYEARfirstword, e.g.
chang2025amortized — becomes the base name, giving you library-style
filenames for free. Otherwise the name derives from the source (PDF stem or
repository filename). --name NAME pins an explicit base name;
--no-bibtex-name disables key-derived naming; --no-full-suffix writes the
full document as chang2025amortized.md instead of …_full.md (handy with
--no-split or a one-file-per-paper library). Every run logs which name was
chosen.
Two-step: compare VLMs on identical OCR
inscriber ocr paper.pdf -o out/ # OCR once → portable bundle
inscriber describe out/paper.inscriber-ocr # VLM passes + assembly + write
inscriber describe out/paper.inscriber-ocr --vlm-model other.gguf --vlm-mmproj other-mmproj.gguf
ocr writes an inspectable bundle (manifest.json, cropped figures/, and
pages/ rasters for pages with tables); describe runs the VLM stages from it
with no OCR model loaded, so each VLM sees the identical input. The bundle's
per-page markdown is hand-editable — fix an OCR glitch once, then try N VLMs.
Fix the splits, regenerate the full document
All outputs are plain Markdown. To correct an OCR/VLM slip after the fact, edit the split files and rebuild the full document from them — no models needed:
inscriber join out/paper # paper_main/_appendix/_backmatter.md → paper_full.md
join strips each split's footer notice (and the BibTeX block, if injected),
concatenates main → appendix → backmatter, and re-applies the framing — so each
fix is made once, not once per file. Note the regenerated {base}_full.md uses
the combined ordering (appendix before backmatter, under # Title - Appendix-
style headings), which may differ from the original document order.
Convert + verify with agent skills
The repo ships assistant skills for the convert-and-verify workflow:
- Claude Code:
/inscribe - Codex:
$inscribe
When run inside this repository, the skill takes a PDF path or URL (plus any
options in plain words), runs inscriber, then checks the transcription against
the source PDF in ≤10-page chunks with parallel subagents briefed on the known
failure modes (table cells, subscripts, equations, truncated pages, figure
descriptions), applies the fixes that matter to the split files, and rejoins
them with inscriber join. Say "no verification" to stop after the conversion.
Options
inscriber --help shows the full surface; every config.example.toml field
has a matching flag. Highlights:
| flag | meaning |
|---|---|
--ocr-resolution {tiny,small,base,large,gundam} |
OCR render quality (default gundam, 2048px; large is faster) |
--figure-mode {describe-only,describe-and-keep,placeholder} |
how figures render |
--no-figures |
skip figure detection and description entirely |
--no-table-refine |
keep raw OCR tables (skip VLM restructuring) |
--name NAME / --no-bibtex-name |
explicit output base name / never name by BibTeX citation key |
--no-full-suffix |
full document as {base}.md instead of {base}_full.md |
--no-split / --page-numbers / --page-separators |
output options |
--pages RANGE |
page selection, e.g. "1-10", "3", "5-" |
--bibtex-mode {off,on,auto} / --bibtex-in-doc |
BibTeX mode (default auto; --bibtex ⇒ on) |
--offline |
no network: URL input + online BibTeX sources disabled |
--mode {sequential,concurrent} |
one model resident at a time (default) vs. both (needs VRAM) |
--no-cache / --refresh |
cache control |
GPU offload is automatic by default (n_gpu_layers = "auto" lets llama.cpp fit
as many layers into VRAM as it can); override per server with --ocr-ngl /
--vlm-ngl (all, 0 for CPU, or a layer count). --ocr-resolution is the
main speed/quality lever: the gundam default renders pages at 2048px, which
measurably reduces OCR misreads of small subscripts and digits; --ocr-resolution large is ~20% faster and fine for simple documents. --ctx (default 16384)
sizes the context window that prompt and generation share — complex tables need
headroom for the VLM's reasoning, so don't shrink it without reason.
Privacy / offline
Your documents and figures never leave your machine — they go only to your
own llama.cpp server on 127.0.0.1, never to any cloud model. The only
features that touch the network are (1) downloading a PDF when the input is a
URL and (2) the BibTeX citation lookups (on by default), which send only the
extracted title or arXiv ID to citation APIs — never the document. Both are
hard-disabled by --offline (BibTeX then degrades to a clearly-marked,
fully-local best-effort entry). No telemetry, no persisted secrets.
Development
git clone https://github.com/lacerbi/inscriber && cd inscriber
python -m venv .venv
.venv/Scripts/activate # Windows; source .venv/bin/activate elsewhere
pip install -e ".[dev]"
pytest # mocked inference — no GPU or models needed
ruff check
Contributor guidance lives in
AGENTS.md; the
full technical specification is
DESIGN.md.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inscriber-0.1.0.tar.gz.
File metadata
- Download URL: inscriber-0.1.0.tar.gz
- Upload date:
- Size: 249.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3145fe88e83962d5cf553d55de914c6e459179a6aedd927bb24158b4ccd134a
|
|
| MD5 |
049f9d37cd1a51653562c5e29f45fd85
|
|
| BLAKE2b-256 |
4ee41421577683a9230aa58cd0497be0a95f867704424a0f1e554934e56ace0a
|
Provenance
The following attestation bundles were made for inscriber-0.1.0.tar.gz:
Publisher:
release.yml on lacerbi/inscriber
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inscriber-0.1.0.tar.gz -
Subject digest:
e3145fe88e83962d5cf553d55de914c6e459179a6aedd927bb24158b4ccd134a - Sigstore transparency entry: 1789590271
- Sigstore integration time:
-
Permalink:
lacerbi/inscriber@e14ff26517bf427460c244eae80a2752532ec998 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/lacerbi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e14ff26517bf427460c244eae80a2752532ec998 -
Trigger Event:
push
-
Statement type:
File details
Details for the file inscriber-0.1.0-py3-none-any.whl.
File metadata
- Download URL: inscriber-0.1.0-py3-none-any.whl
- Upload date:
- Size: 121.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
739f4630e48fb3bc55327f1568f68e95353109452768424b6ea11b52c2d34e55
|
|
| MD5 |
7942bfb85893f0ab7063f732ae1b0600
|
|
| BLAKE2b-256 |
205898b7ff05c936d80ae6f0b1fa8464013eaa0dbc621388fc8bcfda690c8c15
|
Provenance
The following attestation bundles were made for inscriber-0.1.0-py3-none-any.whl:
Publisher:
release.yml on lacerbi/inscriber
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inscriber-0.1.0-py3-none-any.whl -
Subject digest:
739f4630e48fb3bc55327f1568f68e95353109452768424b6ea11b52c2d34e55 - Sigstore transparency entry: 1789590384
- Sigstore integration time:
-
Permalink:
lacerbi/inscriber@e14ff26517bf427460c244eae80a2752532ec998 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/lacerbi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e14ff26517bf427460c244eae80a2752532ec998 -
Trigger Event:
push
-
Statement type: