Local-first CLI that converts academic PDFs into LLM-friendly text-only Markdown using llama.cpp (DeepSeek-OCR + Gemma 4).

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lacerbi

These details have not been verified by PyPI

Project description

inscriber

Convert academic PDFs into clean, LLM-friendly text-only Markdown — entirely on your own machine.

inscriber is a cross-platform command-line tool that runs OCR and figure description locally with llama.cpp: DeepSeek-OCR reads the text and locates the figures, and a Gemma 4 vision model turns each figure into a textual description and restructures tables. It is the local, offline-first reimagining of the cloud web app paper2llm.

What it does

Given a PDF (local file, or a URL from a supported paper repository), inscriber writes:

A full Markdown file — the paper's text, equations, and tables, with each figure replaced by a generated textual description (or kept alongside it with --figure-mode describe-and-keep).
Clean tables. DeepSeek-OCR emits tables as degenerate HTML; by default the VLM restructures each one into a Markdown pipe table, reading the true layout from a cropped image of the table (or the full page image when no table box was detected) while preserving the OCR values. On any failure the raw OCR table is kept. Disable with --no-table-refine.
Split files — the document divided into main, appendix, and backmatter parts (disable with --no-split).
A BibTeX entry for the paper, when the document is judged citable (default auto mode; for arXiv inputs it prefers the published version of the preprint when one exists). Online lookups send only the extracted title or arXiv ID — never the document — and under --offline it degrades to a clearly-marked, fully-local best-effort entry. --bibtex-mode off disables it; --bibtex forces the classic always-look-up mode.

⚠️ Accuracy. The output is a best-effort machine transcription, not a faithful copy, and will contain errors. Body text and table values are generally reliable; errors take the form of missing/repeated sentences near non-standard page formatting, typos, missing pieces and structural issues in complex equations or tables, or image descriptions missing or misrepresenting parts of the figure. An LLM consuming the Markdown tolerates this noise well; for critical use, you must verify against the PDF — the /inscribe skill automates that step. Every generated file ends with a short notice saying it was machine-transcribed.

Results are cached (content-addressed, per page / figure / table), so re-running with different output options takes seconds. Cache keys cover the model files, prompts, and the llama.cpp build, so swapping or upgrading any of them recomputes instead of serving stale results. A two-step mode runs OCR once and lets you compare different VLMs on the identical OCR text and figure crops (see Usage).

Requirements

Python 3.10+ on Windows, Linux, or macOS
llama.cpp (the llama-server binary)
Two multimodal GGUF model pairs, ~9–12 GB total depending on quant (download links below)
A GPU helps a lot but is not required. Reference setup: a laptop RTX 4060 with 8 GB VRAM.

Speed. On the reference setup (laptop RTX 4060, 8 GB VRAM), OCR takes ~20–25 seconds per page (a 39-page paper ≈ 15 minutes), and the VLM passes then take ~20–40 seconds per table and per figure — so a long paper can run 30–40 minutes end to end. Runtime has not been thoroughly optimized; the Q8_0 DeepSeek-OCR quant might speed OCR up at the risk of some quality loss, and --ocr-resolution large is a faster OCR setting for simple documents (see Options). Cached re-runs take seconds.

Install

pip install inscriber

Or install the latest development version from source:

pip install git+https://github.com/lacerbi/inscriber.git

Setup

Quick path: install llama.cpp (step 1 below), then let inscriber setup do steps 2–3 for you —

inscriber setup --llama-bin-dir /path/to/llama.cpp/bin

downloads the recommended models below (~12 GB; --deepseek-quant q8_0 picks the smaller DeepSeek pair, ~9 GB) into the platform data dir, verifies each file against pinned checksums, and writes a ready-to-run config to the platform config dir. Interrupted downloads resume on re-run; already-complete files are verified and skipped. Prefer manual control? Steps 2–3 below do the same by hand.

1. llama.cpp

Download a prebuilt release from github.com/ggml-org/llama.cpp/releases/latest — pick the variant matching your GPU backend (CUDA / Vulkan / Metal / CPU) — or build from source. inscriber only needs the directory containing llama-server (llama-server.exe on Windows).

⚠️ Use build 9587 or newer — older builds handle DeepSeek-OCR images differently and would misplace figure crops, so inscriber refuses to run OCR against them.

2. Models

A multimodal model in llama.cpp is two GGUF files: the text/decoder model and a multimodal projector (mmproj). inscriber uses two such pairs — DeepSeek-OCR for OCR + figure grounding, Gemma 4 for figure descriptions and table restructuring:

model	role	download
DeepSeek-OCR BF16 (recommended)	OCR + figure grounding	model (5.9 GB) · mmproj (0.8 GB)
DeepSeek-OCR Q8_0 (smaller, also verified)	OCR + figure grounding	model (3.1 GB) · mmproj (0.4 GB)
Gemma 4 E4B QAT Q4_K_XL	figure description + tables	model (4.2 GB) · mmproj (1.0 GB)

(Sizes are decimal GB, matching what inscriber setup prints while downloading.)

⚠️ Keep DeepSeek-OCR at BF16 or Q8_0 — Q4_K_M causes runaway repetition loops. The Gemma side has no such restriction: any reasonable quant works, and larger Gemma 4 variants are fine if you have the hardware.

Sources: sabafallah/DeepSeek-OCR-GGUF, ggml-org/DeepSeek-OCR-GGUF, unsloth/gemma-4-E4B-it-qat-GGUF. unsloth's projector file is literally named mmproj-BF16.gguf — consider renaming it to something Gemma-specific if you keep models from several families in one folder (inscriber setup does this automatically, saving it as mmproj-gemma-4-E4B-it-qat-BF16.gguf).

3. Configuration

Copy config.example.toml to config.toml in the directory you run from (or the platform config dir, e.g. %APPDATA%\inscriber\config.toml on Windows, ~/.config/inscriber/config.toml on Linux) and fill in the llama.cpp bin_dir and the four model paths (inscriber setup writes these for you). Every config field is also overridable from the CLI (precedence: CLI flag > config file > built-in default).

Usage

inscriber paper.pdf -o out/                          # end-to-end
inscriber https://arxiv.org/abs/2510.18234 -o out/   # paper-repository URL

URL input supports arXiv, OpenReview, ACL Anthology, bioRxiv, medRxiv, NeurIPS, and PMLR; for anything else, download the PDF and pass the local path.

Outputs in out/ (here for a paper whose BibTeX citation key is chang2025amortized):

chang2025amortized_full.md        # full document
chang2025amortized_main.md        # split parts (as detected)
chang2025amortized_appendix.md
chang2025amortized_backmatter.md
chang2025amortized.bib            # when judged citable (--bibtex-mode off disables)
figures/                          # with --figure-mode describe-and-keep

Naming: when a BibTeX entry is produced (the default auto mode does this for any citable paper), its citation key — authorYEARfirstword, e.g. chang2025amortized — becomes the base name, giving you library-style filenames for free. Otherwise the name derives from the source (PDF stem or repository filename). --name NAME pins an explicit base name; --no-bibtex-name disables key-derived naming; --no-full-suffix writes the full document as chang2025amortized.md instead of …_full.md (handy with --no-split or a one-file-per-paper library). Every run logs which name was chosen.

Two-step: compare VLMs on identical OCR

inscriber ocr paper.pdf -o out/                # OCR once → portable bundle
inscriber describe out/paper.inscriber-ocr     # VLM passes + assembly + write
inscriber describe out/paper.inscriber-ocr --vlm-model other.gguf --vlm-mmproj other-mmproj.gguf

ocr writes an inspectable bundle (manifest.json, cropped figures/, and pages/ rasters for pages with tables); describe runs the VLM stages from it with no OCR model loaded, so each VLM sees the identical input. The bundle's per-page markdown is hand-editable — fix an OCR glitch once, then try N VLMs.

Fix the splits, regenerate the full document

All outputs are plain Markdown. To correct an OCR/VLM slip after the fact, edit the split files and rebuild the full document from them — no models needed:

inscriber join out/paper      # paper_main/_appendix/_backmatter.md → paper_full.md

join strips each split's footer notice (and the BibTeX block, if injected), concatenates main → appendix → backmatter, and re-applies the framing — so each fix is made once, not once per file. Note the regenerated {base}_full.md uses the combined ordering (appendix before backmatter, under # Title - Appendix- style headings), which may differ from the original document order.

Convert + verify with agent skills

The repo ships assistant skills for the convert-and-verify workflow:

Claude Code: /inscribe
Codex: $inscribe

When run inside this repository, the skill takes a PDF path or URL (plus any options in plain words), runs inscriber, then checks the transcription against the source PDF in ≤10-page chunks with parallel subagents briefed on the known failure modes (table cells, subscripts, equations, truncated pages, figure descriptions), applies the fixes that matter to the split files, and rejoins them with inscriber join. Say "no verification" to stop after the conversion.

Options

inscriber --help shows the full surface; every config.example.toml field has a matching flag. Highlights:

flag	meaning
`--ocr-resolution {tiny,small,base,large,gundam}`	OCR render quality (default `gundam`, 2048px; `large` is faster)
`--figure-mode {describe-only,describe-and-keep,placeholder}`	how figures render
`--no-figures`	skip figure detection and description entirely
`--no-table-refine`	keep raw OCR tables (skip VLM restructuring)
`--name NAME` / `--no-bibtex-name`	explicit output base name / never name by BibTeX citation key
`--no-full-suffix`	full document as `{base}.md` instead of `{base}_full.md`
`--no-split` / `--page-numbers` / `--page-separators`	output options
`--pages RANGE`	page selection, e.g. `"1-10"`, `"3"`, `"5-"`
`--bibtex-mode {off,on,auto}` / `--bibtex-in-doc`	BibTeX mode (default `auto`; `--bibtex` ⇒ `on`)
`--offline`	no network: URL input + online BibTeX sources disabled
`--mode {sequential,concurrent}`	one model resident at a time (default) vs. both (needs VRAM)
`--no-cache` / `--refresh`	cache control

GPU offload is automatic by default (n_gpu_layers = "auto" lets llama.cpp fit as many layers into VRAM as it can); override per server with --ocr-ngl / --vlm-ngl (all, 0 for CPU, or a layer count). --ocr-resolution is the main speed/quality lever: the gundam default renders pages at 2048px, which measurably reduces OCR misreads of small subscripts and digits; --ocr-resolution large is ~20% faster and fine for simple documents. --ctx (default 16384) sizes the context window that prompt and generation share — complex tables need headroom for the VLM's reasoning, so don't shrink it without reason.

Privacy / offline

Your documents and figures never leave your machine — they go only to your own llama.cpp server on 127.0.0.1, never to any cloud model. The only features that touch the network are (1) downloading a PDF when the input is a URL and (2) the BibTeX citation lookups (on by default), which send only the extracted title or arXiv ID to citation APIs — never the document. Both are hard-disabled by --offline (BibTeX then degrades to a clearly-marked, fully-local best-effort entry). No telemetry, no persisted secrets.

Development

git clone https://github.com/lacerbi/inscriber && cd inscriber
python -m venv .venv
.venv/Scripts/activate          # Windows;  source .venv/bin/activate elsewhere
pip install -e ".[dev]"
pytest                          # mocked inference — no GPU or models needed
ruff check

Contributor guidance lives in AGENTS.md; the full technical specification is DESIGN.md.

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lacerbi

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inscriber-0.1.0.tar.gz (249.4 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

inscriber-0.1.0-py3-none-any.whl (121.0 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file inscriber-0.1.0.tar.gz.

File metadata

Download URL: inscriber-0.1.0.tar.gz
Upload date: Jun 11, 2026
Size: 249.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inscriber-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e3145fe88e83962d5cf553d55de914c6e459179a6aedd927bb24158b4ccd134a`
MD5	`049f9d37cd1a51653562c5e29f45fd85`
BLAKE2b-256	`4ee41421577683a9230aa58cd0497be0a95f867704424a0f1e554934e56ace0a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inscriber-0.1.0.tar.gz:

Publisher: release.yml on lacerbi/inscriber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inscriber-0.1.0.tar.gz
- Subject digest: e3145fe88e83962d5cf553d55de914c6e459179a6aedd927bb24158b4ccd134a
- Sigstore transparency entry: 1789590271
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: lacerbi/inscriber@e14ff26517bf427460c244eae80a2752532ec998
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/lacerbi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e14ff26517bf427460c244eae80a2752532ec998
- Trigger Event: push

File details

Details for the file inscriber-0.1.0-py3-none-any.whl.

File metadata

Download URL: inscriber-0.1.0-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 121.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for inscriber-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`739f4630e48fb3bc55327f1568f68e95353109452768424b6ea11b52c2d34e55`
MD5	`7942bfb85893f0ab7063f732ae1b0600`
BLAKE2b-256	`205898b7ff05c936d80ae6f0b1fa8464013eaa0dbc621388fc8bcfda690c8c15`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inscriber-0.1.0-py3-none-any.whl:

Publisher: release.yml on lacerbi/inscriber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inscriber-0.1.0-py3-none-any.whl
- Subject digest: 739f4630e48fb3bc55327f1568f68e95353109452768424b6ea11b52c2d34e55
- Sigstore transparency entry: 1789590384
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: lacerbi/inscriber@e14ff26517bf427460c244eae80a2752532ec998
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/lacerbi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e14ff26517bf427460c244eae80a2752532ec998
- Trigger Event: push

inscriber 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

inscriber

What it does

Requirements

Install

Setup

1. llama.cpp

2. Models

3. Configuration

Usage

Two-step: compare VLMs on identical OCR

Fix the splits, regenerate the full document

Convert + verify with agent skills

Options

Privacy / offline

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance