Local-first conversion of native scientific PDFs into editable Beamer decks.
Project description
pdf2beamer
The project is intentionally structured around inspectable intermediate representations:
PDF -> PaperIR -> ArgumentGraph -> DeckPlan -> SlideIR -> Beamer
The package defines strict Pydantic v2 data models and keeps real Docling, PyMuPDF, Nemotron generation, validation, rendering, and compilation behind local integration points.
Constraints
- No external API calls.
- No OCR or scanned-PDF fallback.
- Local Nemotron generation plus Qwen embedding/reranking adapters are dependency-injected.
- LLM components generate structured JSON only.
- Beamer is rendered deterministically from
SlideIR.
Public API Sketch
from pdf2beamer import PdfToBeamerPipeline, PipelineConfig
config = PipelineConfig(
model_path="./models/nemotron-3-nano-4b-gguf/NVIDIA-Nemotron-3-Nano-4B-Q4_K_M.gguf",
embedding_model_path="./models/Qwen3-Embedding-0.6B",
reranker_model_path="./models/Qwen3-Reranker-0.6B",
duration_minutes=10,
audience="technical",
theme="clean",
)
pipeline = PdfToBeamerPipeline(config)
result = pipeline.generate("paper.pdf", "out/")
Local Models
Real model and PDF backends are optional. The base package imports without installing heavy extraction or model dependencies, and the library never downloads model files at runtime.
Install the base package:
pip install pdf2beamer
With local model download and inference support only:
pip install "pdf2beamer[models]"
With the full local pipeline for native PDFs and real local models:
pip install "pdf2beamer[models,pdf,docling]"
Download default models into ./models/:
pdf2beamer download-models .
Expected local files, auto-detected by --real-models when present:
- Generation:
models/nemotron-3-nano-4b-gguf/NVIDIA-Nemotron-3-Nano-4B-Q4_K_M.gguf - Embedding:
models/Qwen3-Embedding-0.6B - Reranking:
models/Qwen3-Reranker-0.6B
You can override paths with --model, --embedding, or --reranker.
Model files are local assets and should not be committed. Store them under
models/; .gitignore excludes models/ and common model-weight formats.
Download Models From Hugging Face
The models extra includes Hugging Face download tooling. If your Hugging Face account needs access to a model, authenticate once:
hf auth login
Download the expected local model layout:
pdf2beamer download-models .
Equivalent manual Hugging Face commands:
mkdir -p models/nemotron-3-nano-4b-gguf \
models/Qwen3-Embedding-0.6B \
models/Qwen3-Reranker-0.6B
hf download nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF \
NVIDIA-Nemotron-3-Nano-4B-Q4_K_M.gguf \
--local-dir models/nemotron-3-nano-4b-gguf
hf download Qwen/Qwen3-Embedding-0.6B \
--local-dir models/Qwen3-Embedding-0.6B
hf download Qwen/Qwen3-Reranker-0.6B \
--local-dir models/Qwen3-Reranker-0.6B
Quick local check:
test -f models/nemotron-3-nano-4b-gguf/NVIDIA-Nemotron-3-Nano-4B-Q4_K_M.gguf
test -d models/Qwen3-Embedding-0.6B
test -d models/Qwen3-Reranker-0.6B
git check-ignore -v models/nemotron-3-nano-4b-gguf/NVIDIA-Nemotron-3-Nano-4B-Q4_K_M.gguf
Then run with real local models. Use --no-compile if you only want the editable out/main.tex file:
uv run --extra pdf --extra docling --extra models pdf2beamer generate paper.pdf --real-models --no-compile
LaTeX Compilation
pdf2beamer always writes out/main.tex. To also produce out/main.pdf, install a TeX distribution that provides the latexmk command, then run without --no-compile.
Debian/Ubuntu:
sudo apt update
sudo apt install latexmk texlive-latex-recommended texlive-latex-extra texlive-fonts-recommended
Windows:
winget install --id MiKTeX.MiKTeX --exact
winget install --id StrawberryPerl.StrawberryPerl --exact
MiKTeX provides the TeX toolchain, and latexmk needs Perl on Windows. Restart the terminal after installation so the updated PATH is visible.
macOS:
brew install --cask mactex-no-gui
Check that latexmk is available:
latexmk --version
Compile during generation:
uv run --extra pdf --extra docling --extra models pdf2beamer generate paper.pdf --real-models
Fake-model command for lightweight local development:
pdf2beamer generate paper.pdf \
--use-fake-models \
--duration 10 \
--output out/
Real local-model command:
pdf2beamer generate paper.pdf \
--real-models \
--duration 10 \
--audience technical \
--output out/
Structured GGUF Output
The GGUF generator is exposed as LocalNemotronGenerator and loads a local Nemotron instruct GGUF through llama-cpp-python.
For ArgumentGraph and SlideIR, it first tries Instructor's local
llama-cpp-python integration:
instructor.patch(
create=llama.create_chat_completion_openai_v1,
mode=instructor.Mode.JSON_SCHEMA,
)
This path returns Pydantic response models directly and retries validation
failures. If Instructor or the OpenAI-compatible llama.cpp method is unavailable,
the generator falls back to llama.cpp response_format JSON schema/object mode,
then to strict JSON parsing.
CLI controls:
pdf2beamer generate paper.pdf \
--real-models \
--instructor \
--instructor-max-retries 2 \
--no-compile \
--output out/
Disable Instructor and use llama.cpp response format fallback:
pdf2beamer generate paper.pdf --real-models --no-instructor --output out/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2beamer-0.1.1.tar.gz.
File metadata
- Download URL: pdf2beamer-0.1.1.tar.gz
- Upload date:
- Size: 456.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92a7f27a318dbb642e5d3a24d6887d6f513a76f34d828b5fa74892de69576143
|
|
| MD5 |
2b74bd7cb92a402758aa6faea861e40b
|
|
| BLAKE2b-256 |
f27d61999a4f9fd5d4a22c3ef2be84d2fe74fca3c7bbbc2f6fe98dd88193d146
|
Provenance
The following attestation bundles were made for pdf2beamer-0.1.1.tar.gz:
Publisher:
publish.yml on LelioG/pdf2beamer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdf2beamer-0.1.1.tar.gz -
Subject digest:
92a7f27a318dbb642e5d3a24d6887d6f513a76f34d828b5fa74892de69576143 - Sigstore transparency entry: 1667237128
- Sigstore integration time:
-
Permalink:
LelioG/pdf2beamer@028908aead3bc760fc96fa200141c3fc99154897 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/LelioG
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@028908aead3bc760fc96fa200141c3fc99154897 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pdf2beamer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pdf2beamer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 92.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e1ffc357efc2eca1cd3473175db80814272f5a6e4b65ce5ca2ba81e15838b56
|
|
| MD5 |
ac0406341beb493ced61b33328944426
|
|
| BLAKE2b-256 |
b52aa4b73ef3320d858fc7009cd77d21e0095cc7c6f6ed510983cca9927bf468
|
Provenance
The following attestation bundles were made for pdf2beamer-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on LelioG/pdf2beamer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdf2beamer-0.1.1-py3-none-any.whl -
Subject digest:
9e1ffc357efc2eca1cd3473175db80814272f5a6e4b65ce5ca2ba81e15838b56 - Sigstore transparency entry: 1667237273
- Sigstore integration time:
-
Permalink:
LelioG/pdf2beamer@028908aead3bc760fc96fa200141c3fc99154897 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/LelioG
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@028908aead3bc760fc96fa200141c3fc99154897 -
Trigger Event:
workflow_dispatch
-
Statement type: