Validates arXiv compatibility and cleans a LaTeX .zip project in one command
Project description
latex2arxiv
Submit to arXiv without the headache. One command cleans your project, catches rejection-causing errors, and walks you through the upload.
latex2arxiv paper.zip --compile # clean + verify PDF
latex2arxiv paper.zip --compile --guide # + step-by-step upload instructions
latex2arxiv paper/ --compile # directory input
latex2arxiv https://github.com/u/p.git # git URL input
Your original project is never modified. All output goes to a new
_arxiv.zipfile.
Try the built-in demo:
pip install latex2arxiv
latex2arxiv --demo --compile --guide
This processes a bundled self-documenting paper, opens the cleaned PDF, and writes a step-by-step arXiv upload guide with copy-paste-ready metadata. The cleaned demo's PDF is attached to every GitHub Release as demo_project_arxiv.pdf.
Before / After
On a real statistics paper (arXiv:2504.11630): 934 โ 40 files, 80.6 MB โ 3.1 MB.
| Before (Overleaf export) | After (latex2arxiv output) |
|---|---|
| ๐ Images/ | ๐ Images/ |
| ๐ JASA_main.tex | ๐ JASA_main.tex |
| ๐ JASA_main_backup.tex | ๐ ref.bib |
| ๐ main_bak_svm.tex | ๐ Supplementary_Materials.tex |
| ๐ cover_letter.md | |
| ๐ response.tex | |
| ๐ ref.bib | |
| ๐ JASA_main.aux/.log/.bbl/.pdf | |
| ๐ jasa_comments/, jasa_revision/ | |
| ... (and ~930 more) | |
| 934 files, 80.6 MB | 40 files, 3.1 MB |
Who is this for?
- First time submitting to arXiv? You have a LaTeX project that compiles locally, but you're not sure what arXiv will accept. latex2arxiv cleans your project, catches errors that would cause rejection, and writes a step-by-step upload guide so you know exactly what to paste where. โ
latex2arxiv paper.zip --compile --guide - You wrote your paper in Overleaf and need a clean, arXiv-ready zip without manually pruning files. โ Overleaf โ arXiv quickstart
- You want to gate a paper repo's CI on arXiv compliance so a bad merge can't slip through. โ
--dry-run+ non-zero exit on[error](details) - Your paper uses custom revision-tracking macros (
\added,\deleted,\textcolor{red}{...}) that you need stripped before submission. โ Custom removal rules
What it does
| Feature | What it does |
|---|---|
| ๐ฆ One command, any input | Accepts a .zip, directory, or git URL; outputs an arXiv-ready .zip; optionally compiles and opens the PDF for review |
| โ๏ธ Prunes your project to submission-ready | Keeps only files reachable from your main .tex; removes build artifacts, editor files, cover letters, unused figures |
๐งน Cleans your .tex |
Strips comments, removes \todo{} / \hl{} / draft packages, handles nested braces correctly (\deleted{see \cite{x}} works) |
| ๐จ Catches submission blockers before you upload | [error] for shell-escape packages that will fail on arXiv (minted, pythontex); [warn] for biblatex without .bbl, missing index files, oversized output, undefined citations, problematic filenames โ full list |
| ๐บ๏ธ Guides you through upload | --guide extracts title, authors, abstract, page/figure/table counts and writes a step-by-step arXiv upload walkthrough |
Also: BibTeX normalization, \pdfoutput=1 injection, image resizing (Pillow), --dry-run preview, --demo for first-run, --flatten for single-file output, --json for CI/tooling integration.
Dependency tracking respects \input, \include, \subfile, \includegraphics, \graphicspath, and \bibliography. Commented-out commands are ignored.
Upload guide (--guide)
Pass --guide and latex2arxiv writes a plain-text file alongside your output zip with everything you need for the arXiv upload form:
โโ arXiv Upload Guide โโ
๐ Your metadata (copy-paste ready):
Title:
Statistical Modeling of Combinatorial Response Data
Authors:
Yu Zheng, Malay Ghosh, Leo Duan
Abstract:
There is a rich literature for modeling binary and polychotomous responses...
Comments:
53 pages, 13 figures, 6 tables
๐ Step 1: Start a new submission or replace an existing one
๐ Step 2: Choose license
๐ Step 3: Select category
๐ Step 4: Upload files (arXiv may warn about .sty โ ignore it)
๐ Step 5: Check processing
๐ Step 6: Fill in metadata (paste from above)
๐ Step 7: Preview and submit
๐ Files in your zip:
JASA_main.tex โ main file
ref.bib
Supplementary_Materials.tex
Images/
...
No more guessing what goes where.
Works everywhere
Terminal โ one command, full pipeline:
latex2arxiv paper.zip --compile --guide
CI โ gate your paper repo on arXiv compliance:
- run: pip install latex2arxiv && latex2arxiv paper.zip --dry-run
AI agents โ Claude, Cursor, or Copilot validate and fix issues in conversation:
pip install "latex2arxiv[mcp]"
{"mcpServers": {"latex2arxiv": {"command": "latex2arxiv-mcp"}}}
Installation
pip install latex2arxiv
On macOS, install via Homebrew (no Python toolchain required):
brew tap YuZh98/latex2arxiv
brew install latex2arxiv
Note: The first
brew installmay appear to hang for 3โ5 minutes while compiling Pillow's C extensions. This is normal โ Homebrew builds Python packages from source and suppresses progress output. Usebrew install --verbose latex2arxivto see detailed build output.
Or, if you get an externally-managed-environment error from pip, use pipx:
brew install pipx
pipx install latex2arxiv
Or from source:
git clone https://github.com/YuZh98/latex2arxiv
cd latex2arxiv
pip install .
pdflatex is required only for --compile (install via TeX Live or MacTeX).
Usage
latex2arxiv input [output.zip] [options]
input can be a .zip file, a directory of LaTeX sources, or a git URL (https or ssh). Directories are zipped internally; git URLs are cloned with --depth 1.
| Flag | Description |
|---|---|
--main FILENAME |
Specify the main .tex file (e.g. JASA_main.tex). Auto-detected via \documentclass if omitted. |
--resize PX |
Resize images so longest side โค PX pixels (e.g. --resize 1600). Requires Pillow. |
--config FILE |
YAML config file for custom removal rules (see below). |
--compile |
Run pdflatex on the output and open the resulting PDF. |
--guide |
Write a detailed arXiv upload guide (metadata + step-by-step instructions) to a text file alongside the output. |
--dry-run |
Preview what would be removed/processed without writing any output. |
--flatten |
Inline every \input / \include / \subfile into the main .tex for single-file output. Details. |
--json |
Emit a machine-readable JSON summary on stdout; route progress to stderr. Schema. |
--demo |
Run the built-in demo project (no input file needed). |
--version |
Print version and exit. |
Examples
latex2arxiv paper.zip # zip input
latex2arxiv paper/ # directory input
latex2arxiv https://github.com/user/paper.git # git URL input
latex2arxiv paper.zip out.zip --main main.tex --compile
latex2arxiv paper.zip --resize 1600 --compile # shrink images
latex2arxiv paper.zip --config arxiv_config.yaml # custom rules
latex2arxiv paper.zip --compile --guide # full pipeline + upload guide
latex2arxiv paper.zip --dry-run # preview without writing
latex2arxiv --demo --compile --guide # run the built-in demo
Pre-flight checks
Before producing the output zip, latex2arxiv validates the project against arXiv's LaTeX submission guide. [error] lines block submission (the tool exits non-zero, useful for CI gating); [warn] lines are advisory and do not affect the exit code.
$ latex2arxiv paper.zip --dry-run
[error] \usepackage{minted} requires shell-escape โ arXiv compiles without it; this submission will fail to build
[error] \usepackage{psfig} โ arXiv no longer supports the psfig package
[warn] \today used in \date โ arXiv may rebuild the PDF and the date will change
[warn] .eps image found: photo.eps โ pdflatex does not support .eps; convert to .pdf or .png
[warn] \printindex used but no .ind file at root โ build locally and re-run latex2arxiv
Summary: 2 errors, 7 warnings
Either [error] line would have caused arXiv to reject the submission after upload. The exit code is non-zero on errors, so a CI step like latex2arxiv paper.zip --dry-run fails the build before the bad submission ever leaves the repo.
See docs/pre-flight.md for the full list of checks and silent fixes.
Custom removal rules (--config)
For revision markup and other project-specific cleanup, create a YAML config file. A template is in arxiv_config.yaml.
# Remove command AND its argument (text is lost)
commands_to_delete:
- \deleted
- \revision
# Remove command but KEEP its argument text
commands_to_unwrap:
- \color{red} # \color{red}text โ text
- \textcolor{red} # \textcolor{red}{text} โ text
- \added # \added{new text} โ new text
# Remove entire environments
environments_to_delete:
- response
# Raw regex (last resort โ prefer the verbs above when they fit).
replacements:
- pattern: '\\textcolor\{[^}]*\}\{([^}]*)\}'
replacement: '\1'
The brace-balanced matcher correctly handles nested commands like \deleted{see \cite{x}}. Unknown top-level keys warn โ typos like command_to_delete (singular) no longer silently no-op.
latex2arxiv vs. arxiv_latex_cleaner
arxiv_latex_cleaner is the incumbent โ Google-backed, mature, and cleans well. The key difference: it won't tell you that \usepackage{minted} will fail on arXiv, won't produce the .zip you upload, and has no exit code for CI gating.
latex2arxiv |
arxiv_latex_cleaner |
|
|---|---|---|
| Output format | Any input โ .zip |
Cleaned directory |
Pre-flight [error] / [warn] (details) |
โ | โ |
Upload walkthrough (--guide) |
โ | โ |
| Non-zero exit on errors | โ | โ |
--compile preview |
โ | โ |
Auto-detect main .tex |
โ | โ |
| Brace-balanced config | โ | โ |
| BibTeX normalization | โ | โ |
--dry-run |
โ | โ |
Built-in --demo |
โ | โ |
| Image resizing (Pillow) | โ | โ |
| MCP server (AI agent integration) | โ | โ |
GitHub Action + pre-commit hook |
โ | โ |
| PDF compression (Ghostscript) | โ | โ |
| PNG โ JPG conversion | โ | โ |
| Maturity | 7 regression fixtures, live pdflatex+biber end-to-end CI |
~5kโ , years |
Integrations
| Surface | Status | Details |
|---|---|---|
| CLI | โ | pip install latex2arxiv |
| GitHub Action | โ | action.yml |
pre-commit hook |
โ | latex2arxiv-dryrun |
| MCP server (AI agents) | โ | pip install "latex2arxiv[mcp]" โ setup |
| VS Code extension | ๐ | Planned |
| Homebrew formula | โ | brew tap YuZh98/latex2arxiv && brew install latex2arxiv |
Known limitations
Dynamically constructed filenames โ \includegraphics{\figpath/fig1} cannot be resolved statically and the image will be deleted. Expand path macros before running.
\subfile vs \input path resolution โ \input/\include paths resolve relative to the project root; \subfile paths resolve relative to the subfile's own directory. Unusual nested setups may cause images to be incorrectly pruned; use --compile to verify.
--compile is a local sanity check โ a successful local compile doesn't guarantee arXiv will compile it. arXiv pins specific TeX Live versions. Always check the arXiv submission preview after uploading.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file latex2arxiv-0.10.0.tar.gz.
File metadata
- Download URL: latex2arxiv-0.10.0.tar.gz
- Upload date:
- Size: 72.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0baa764f284cd4f42f2158a338128bdd9224e83547a404b706344cba6d584901
|
|
| MD5 |
08dea87a0e41a823a9a3042b1f2356d6
|
|
| BLAKE2b-256 |
64f2db18ad892073a58318ccf6959c5332c631292d53ddfe19ef5863228ce363
|
Provenance
The following attestation bundles were made for latex2arxiv-0.10.0.tar.gz:
Publisher:
publish.yml on YuZh98/latex2arxiv
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
latex2arxiv-0.10.0.tar.gz -
Subject digest:
0baa764f284cd4f42f2158a338128bdd9224e83547a404b706344cba6d584901 - Sigstore transparency entry: 1529729194
- Sigstore integration time:
-
Permalink:
YuZh98/latex2arxiv@073c67ccbf8cb55ed3e557ad5c03e2eba787ed8e -
Branch / Tag:
refs/tags/v0.10.0 - Owner: https://github.com/YuZh98
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@073c67ccbf8cb55ed3e557ad5c03e2eba787ed8e -
Trigger Event:
push
-
Statement type:
File details
Details for the file latex2arxiv-0.10.0-py3-none-any.whl.
File metadata
- Download URL: latex2arxiv-0.10.0-py3-none-any.whl
- Upload date:
- Size: 47.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25b138ee960864d4f3339b607276c5d9c5e4bc915be25c7577781ed3b8619904
|
|
| MD5 |
89ff4b0eed792a377303e30a23c50714
|
|
| BLAKE2b-256 |
8092a80371b97168b888f397383893536b1afec95b854f438a18d6744edb4b5c
|
Provenance
The following attestation bundles were made for latex2arxiv-0.10.0-py3-none-any.whl:
Publisher:
publish.yml on YuZh98/latex2arxiv
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
latex2arxiv-0.10.0-py3-none-any.whl -
Subject digest:
25b138ee960864d4f3339b607276c5d9c5e4bc915be25c7577781ed3b8619904 - Sigstore transparency entry: 1529729265
- Sigstore integration time:
-
Permalink:
YuZh98/latex2arxiv@073c67ccbf8cb55ed3e557ad5c03e2eba787ed8e -
Branch / Tag:
refs/tags/v0.10.0 - Owner: https://github.com/YuZh98
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@073c67ccbf8cb55ed3e557ad5c03e2eba787ed8e -
Trigger Event:
push
-
Statement type: