Validates arXiv compatibility and cleans a LaTeX .zip project in one command
Project description
latex2arxiv
Validates arXiv compatibility and cleans your LaTeX project in one command.
If you submit papers to arXiv — especially from Overleaf — this tool is for you. Drop in a .zip, get an arXiv-ready .zip back, with pre-flight checks that catch submission-blocking issues before you upload.
On a real statistics paper: 950 → 40 files, 82 MB → 3 MB.
latex2arxiv paper.zip --compile
main tex: paper.tex
remove: .DS_Store
remove: cover_letter.md
remove: paper.aux
remove: figures/old_unused.pdf
... (43 more)
[warn] \today used in \date — arXiv may rebuild the PDF and the date will change
Done → paper_arxiv.zip
Summary: 47 removed, 12 kept | 79.1 MB → 3.2 MB | 0 errors, 1 warning
Compiling paper.tex ...
PDF → paper_arxiv.pdf
The cleaned demo's PDF is attached to every GitHub Release as demo_project_arxiv.pdf — see the output without installing.
What it does
- File pruning — keeps only files reachable from your main
.tex; removes everything else (build artifacts, editor files, cover letters, unused figures) - arXiv compatibility checks —
[error]for shell-escape packages (minted,pythontex);[warn]for biblatex without.bbl, output > 50 MB, problematic filenames, and other gotchas - Comment + draft cleanup — strips
% ...comments; removes\todo{},\hl{},\note{},\fixme{},\begin{comment}blocks,\iffalse...\fiblocks, and draft-only packages \pdfoutput=1auto-injection — arXiv requires it; easy to forget- BibTeX normalization — canonical field ordering, deduplication, private-field strip (requires
bibtexparser) - Image resizing (optional) — caps longest side at N pixels via Pillow
- Custom revision-markup rules (optional) — YAML config; brace-balanced matcher correctly handles
\deleted{see \cite{x}} --compile— runspdflatexand opens the cleaned PDF for review
Dependency tracking respects \input, \include, \subfile, \includegraphics, \graphicspath, and \bibliography. Commented-out commands are ignored.
How does this compare to arxiv_latex_cleaner?
arxiv_latex_cleaner is the established tool in this space — Google-backed, ~5k★, years of usage. If you want the most battle-tested option, use it.
Where latex2arxiv is different
- Pre-flight checks with severity levels.
[error]blocks submission and exits non-zero (CI-friendly);[warn]flags risk. Nothing else in this space does this. - One-command zip-in / zip-out workflow. Drop a
.zip, get a.zipback, optionally compile and preview the PDF. No directory dance, no manual repack. - Brace-balanced config matcher.
\deleted{see \cite{x}}and\added{some \emph{nested} text}work correctly — naive regex-based cleaners silently leave nested content behind. \pdfoutput=1auto-injection and BibTeX normalization out of the box.
Where arxiv_latex_cleaner is stronger
- Maturity — thousands of papers cleaned, larger contributor pool, more edge cases discovered.
- Ghostscript-based PDF compression — we don't bundle this.
- PNG → JPG conversion — we don't do this.
Full feature comparison
latex2arxiv |
arxiv_latex_cleaner |
|
|---|---|---|
| Output format | .zip → .zip |
Cleaned directory |
Pre-flight [error] / [warn] |
✅ | ❌ |
| Non-zero exit on errors | ✅ | ❌ |
--compile preview |
✅ | ❌ |
Auto-detect main .tex |
✅ | ❌ |
| Brace-balanced config | ✅ | ❌ |
| BibTeX normalization | ✅ | ❌ |
\pdfoutput=1 injection |
✅ | ❌ |
--dry-run |
✅ | ❌ |
Built-in --demo |
✅ | ❌ |
| Image resizing (Pillow) | ✅ | ✅ |
| PDF compression (Ghostscript) | ❌ | ✅ |
| PNG → JPG conversion | ❌ | ✅ |
| Maturity | New (0.5.0) | ~5k★, years |
Installation
pip install latex2arxiv
On macOS, if you get an externally-managed-environment error, use pipx:
brew install pipx
pipx install latex2arxiv
Or from source:
git clone https://github.com/YuZh98/latex2arxiv
cd latex2arxiv
pip install .
pdflatex is required only for --compile (install via TeX Live or MacTeX).
Once installed, try the built-in demo to see the tool in action — no input file needed:
latex2arxiv --demo --compile
This processes a bundled self-documenting paper and opens the cleaned PDF.
Usage
latex2arxiv input.zip [output.zip] [--main MAIN_TEX] [--resize PX] [--config FILE] [--compile]
| Flag | Description |
|---|---|
--main FILENAME |
Specify the main .tex file (e.g. JASA_main.tex). Auto-detected via \documentclass if omitted. |
--resize PX |
Resize images so longest side ≤ PX pixels (e.g. --resize 1600). Requires Pillow. |
--config FILE |
YAML config file for custom removal rules (see below). |
--compile |
Run pdflatex on the output and open the resulting PDF. |
--dry-run |
Preview what would be removed/processed without writing any output. |
--demo |
Run the built-in demo project (no input file needed). |
Examples
latex2arxiv paper.zip # auto-detect main, basic conversion
latex2arxiv paper.zip out.zip --main main.tex --compile
latex2arxiv paper.zip --resize 1600 --compile # shrink images
latex2arxiv paper.zip --config arxiv_config.yaml # custom rules
latex2arxiv paper.zip --dry-run # preview without writing
latex2arxiv --demo --compile # run the built-in demo
The tool exits non-zero if any pre-flight error fires (e.g. \usepackage{minted}) — useful for CI gating. Warnings do not affect the exit code.
Custom removal rules (--config)
For revision markup and other project-specific cleanup, create a YAML config file. A template is in arxiv_config.yaml.
# Remove command AND its argument (text is lost)
commands_to_delete:
- \deleted
- \revision
# Remove command but KEEP its argument text
commands_to_unwrap:
- \color{red} # \color{red}text → text
- \textcolor{red} # \textcolor{red}{text} → text
- \added # \added{new text} → new text
# Remove entire environments
environments_to_delete:
- response
# Raw regex replacements
replacements:
- pattern: '\\added\{([^}]*)\}'
replacement: '\1'
The config parser is built in (no extra dependencies). The brace-balanced matcher correctly handles nested commands like \deleted{see \cite{x}}.
Known limitations
Dynamically constructed filenames — \includegraphics{\figpath/fig1} cannot be resolved statically and the image will be deleted. Expand path macros before running.
\subfile vs \input path resolution — \input/\include paths resolve relative to the project root; \subfile paths resolve relative to the subfile's own directory. Unusual nested setups may cause images to be incorrectly pruned; use --compile to verify.
Inline \verb|...| — comment-stripping and draft-removal don't currently protect inline \verb|...|. A % or \todo{...} inside \verb|...| may get mangled. Standard verbatim, lstlisting, and minted block environments are protected.
--compile is a local sanity check — a successful local compile doesn't guarantee arXiv will compile it. arXiv pins specific TeX Live versions. Always check the arXiv submission preview after uploading.
Project structure
converter.py # CLI entry point
pipeline/
tex.py # Comment stripping, draft annotation removal
bibtex.py # BibTeX normalization
deps.py # Dependency graph (tex includes, images, bib files)
images.py # Image resizing
config.py # User-defined removal rules
arxiv_config.yaml # Sample config file
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file latex2arxiv-0.5.1.tar.gz.
File metadata
- Download URL: latex2arxiv-0.5.1.tar.gz
- Upload date:
- Size: 32.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af4fdad34d00f7f95e46ec4dccf3f02e70d5395fce0064d0900c59322cdfcefc
|
|
| MD5 |
ad0542abe22523da470dc7ccf4aae8d4
|
|
| BLAKE2b-256 |
5583c0b9d9c8264755640aac8a5a0eb936d7cb2cd5367ba68033d95573e6450d
|
Provenance
The following attestation bundles were made for latex2arxiv-0.5.1.tar.gz:
Publisher:
publish.yml on YuZh98/latex2arxiv
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
latex2arxiv-0.5.1.tar.gz -
Subject digest:
af4fdad34d00f7f95e46ec4dccf3f02e70d5395fce0064d0900c59322cdfcefc - Sigstore transparency entry: 1436944306
- Sigstore integration time:
-
Permalink:
YuZh98/latex2arxiv@71bff2c8a5e3cbf1170ab6c842801691d6df603b -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/YuZh98
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@71bff2c8a5e3cbf1170ab6c842801691d6df603b -
Trigger Event:
push
-
Statement type:
File details
Details for the file latex2arxiv-0.5.1-py3-none-any.whl.
File metadata
- Download URL: latex2arxiv-0.5.1-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12d7493f0056a79a22918ebf29096ef9eb1aa696a7130133081266d683bad64e
|
|
| MD5 |
cf3b1a1d73aa35637d811edcafb21588
|
|
| BLAKE2b-256 |
f2df24031cf961f201e9659b4e1834a94425f4dad7f68c6c340a2497a3b6a79d
|
Provenance
The following attestation bundles were made for latex2arxiv-0.5.1-py3-none-any.whl:
Publisher:
publish.yml on YuZh98/latex2arxiv
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
latex2arxiv-0.5.1-py3-none-any.whl -
Subject digest:
12d7493f0056a79a22918ebf29096ef9eb1aa696a7130133081266d683bad64e - Sigstore transparency entry: 1436944325
- Sigstore integration time:
-
Permalink:
YuZh98/latex2arxiv@71bff2c8a5e3cbf1170ab6c842801691d6df603b -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/YuZh98
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@71bff2c8a5e3cbf1170ab6c842801691d6df603b -
Trigger Event:
push
-
Statement type: