Skip to main content

Validates arXiv compatibility and cleans a LaTeX .zip project in one command

Project description

latex2arxiv

PyPI Downloads Tests Python License: MIT

Validates arXiv compatibility and cleans your LaTeX project in one command — zip in, zip out.

If you submit papers to arXiv, this tool is for you. Drop in a .zip, get a new arXiv-ready .zip back — your input is never overwritten — with pre-flight checks that catch submission-blocking issues before you upload.

latex2arxiv paper.zip --compile

Try the built-in demo:

pip install latex2arxiv
latex2arxiv --demo --compile

This processes a bundled self-documenting paper and opens the cleaned PDF. The cleaned demo's PDF is attached to every GitHub Release as demo_project_arxiv.pdf — see the output without installing.

Before / After

On a real statistics paper: 934 → 40 files, 80.6 MB → 3.1 MB.

latex2arxiv demo
Before (Overleaf export) After (latex2arxiv output)
📁 Images/ 📁 Images/
📄 JASA_main.tex 📄 JASA_main.tex[^main]
📄 JASA_main_backup.tex 📄 ref.bib
📄 main_bak_svm.tex 📄 Supplementary_Materials.tex[^supp]
📄 cover_letter.md
📄 response.tex
📄 ref.bib
📄 JASA_main.aux/.log/.bbl/.pdf
📁 jasa_comments/, jasa_revision/
... (and ~930 more)
934 files, 80.6 MB 40 files, 3.1 MB

Who is this for?

  • You wrote your paper in Overleaf and need a clean, arXiv-ready zip without manually pruning files. → Overleaf → arXiv quickstart
  • You want to gate a paper repo's CI on arXiv compliance so a bad merge can't slip through. → --dry-run + non-zero exit on [error] (details)
  • Your paper uses custom revision-tracking macros (\added, \deleted, \textcolor{red}{...}) that you need stripped before submission. → Custom removal rules

What it does

Feature What it does
📦 One-command zip-in / zip-out No directory dance, no manual repack; optionally compiles and opens the PDF for review
✂️ Prunes your project to submission-ready Keeps only files reachable from your main .tex; removes build artifacts, editor files, cover letters, unused figures
🧹 Cleans your .tex Strips comments, removes \todo{} / \hl{} / draft packages, handles nested braces correctly (\deleted{see \cite{x}} works)
🚨 Catches submission blockers before you upload [error] for shell-escape packages that will fail on arXiv (minted, pythontex); [warn] for biblatex without .bbl, missing index files, oversized output, problematic filenames — full list

Also: BibTeX normalization, \pdfoutput=1 injection, image resizing (Pillow), --dry-run preview, --demo for first-run.

Dependency tracking respects \input, \include, \subfile, \includegraphics, \graphicspath, and \bibliography. Commented-out commands are ignored.

latex2arxiv vs. arxiv_latex_cleaner

arxiv_latex_cleaner is the incumbent — Google-backed, mature, and cleans well. The key difference: it won't tell you that \usepackage{minted} will fail on arXiv, won't produce the .zip you upload, and has no exit code for CI gating.

latex2arxiv arxiv_latex_cleaner
Output format .zip.zip Cleaned directory
Pre-flight [error] / [warn] (details)
Non-zero exit on errors
--compile preview
Auto-detect main .tex
Brace-balanced config
BibTeX normalization
Auto \pdfoutput=1 injection
--dry-run
Built-in --demo
Image resizing (Pillow)
PDF compression (Ghostscript)
PNG → JPG conversion
Maturity 150 tests, 7 regression fixtures, live pdflatex+biber end-to-end CI ~5k★, years

Installation

pip install latex2arxiv

On macOS, if you get an externally-managed-environment error, use pipx:

brew install pipx
pipx install latex2arxiv

Or from source:

git clone https://github.com/YuZh98/latex2arxiv
cd latex2arxiv
pip install .

pdflatex is required only for --compile (install via TeX Live or MacTeX).

Once installed, try the built-in demo to see the tool in action:

latex2arxiv --demo --compile

Usage

latex2arxiv input [output.zip] [--main MAIN_TEX] [--resize PX] [--config FILE] [--compile]

input can be a .zip file, a directory of LaTeX sources, or a git URL (https or ssh). Directories are zipped internally; git URLs are cloned with --depth 1.

Flag Description
--main FILENAME Specify the main .tex file (e.g. JASA_main.tex). Auto-detected via \documentclass if omitted.
--resize PX Resize images so longest side ≤ PX pixels (e.g. --resize 1600). Requires Pillow.
--config FILE YAML config file for custom removal rules (see below).
--compile Run pdflatex on the output and open the resulting PDF.
--dry-run Preview what would be removed/processed without writing any output.
--demo Run the built-in demo project (no input file needed).

Examples

latex2arxiv paper.zip                                  # zip input
latex2arxiv paper/                                     # directory input
latex2arxiv https://github.com/user/paper.git          # git URL input
latex2arxiv paper.zip out.zip --main main.tex --compile
latex2arxiv paper.zip --resize 1600 --compile          # shrink images
latex2arxiv paper.zip --config arxiv_config.yaml       # custom rules
latex2arxiv paper.zip --dry-run                        # preview without writing
latex2arxiv --demo --compile                           # run the built-in demo

Pre-flight checks

Before producing the output zip, latex2arxiv validates the project against arXiv's LaTeX submission guide. [error] lines block submission (the tool exits non-zero, useful for CI gating); [warn] lines are advisory and do not affect the exit code.

Output on a project with several submission issues looks like this:

$ latex2arxiv paper.zip --dry-run
  [error] \usepackage{minted} requires shell-escape — arXiv compiles without it; this submission will fail to build
  [error] \usepackage{psfig} — arXiv no longer supports the psfig package
  [warn]  \today used in \date — arXiv may rebuild the PDF and the date will change
  [warn]  .eps image found: photo.eps — pdflatex does not support .eps; convert to .pdf or .png
  [warn]  \printindex used but no .ind file at root — build locally and re-run latex2arxiv

Summary: 2 errors, 7 warnings

Either [error] line would have caused arXiv to reject the submission after upload. The exit code is non-zero on errors, so a CI step like latex2arxiv paper.zip --dry-run fails the build before the bad submission ever leaves the repo.

Severity Trigger Why it matters
🛑 error \usepackage{minted} / pythontex / shellesc Require --shell-escape; arXiv compiles without it.
🛑 error \usepackage{psfig} arXiv no longer supports the psfig package.
🛑 error \usepackage{fontspec} / unicode-math Require XeLaTeX or LuaLaTeX; arXiv defaults to pdfLaTeX.
⚠️ warn \usepackage{xr} or xr-hyper File paths/locations differ on arXiv; external-document references break.
⚠️ warn Main .tex not at the submission root arXiv compiles from root; subdirectory main files aren't found.
⚠️ warn \printindex / \printglossary / \printnomenclature without matching .ind / .gls / .nls arXiv doesn't run makeindex or glossary processors; the printed section silently disappears.
⚠️ warn \usepackage{biblatex} (or \addbibresource) without <main>.bbl shipped If arXiv can't resolve any .bib file, your submission is blocked.
⚠️ warn \documentclass[referee] / [doublespace] / \doublespacing arXiv requires single-spaced submissions.
⚠️ warn \today inside \date{...} arXiv may rebuild the PDF; the date will change.
⚠️ warn \subfile'd document containing \bibliographystyle Likely a standalone supplement; remove the \subfile line to avoid duplicate bibliography commands.
⚠️ warn .eps images shipped pdflatex doesn't support .eps; convert to .pdf or .png.
⚠️ warn Custom .cls / .sty files Verify they aren't already provided by TeX Live.
⚠️ warn Filename has spaces or non-ASCII characters Breaks \input and \includegraphics resolution.
⚠️ warn Output .zip larger than 50 MB arXiv has size limits; consider --resize or splitting supplementary materials.

In addition to surfacing issues, the conversion silently fixes common pitfalls:

  • Inserts \pdfoutput=1 (or normalizes any \pdfoutput=N) in the main .tex, so arXiv selects pdfLaTeX.
  • Preserves 00README / 00README.XXX files at root for arXiv processor hints.
  • Strips comments and standard draft annotations (\todo, \hl, ...) and packages (todonotes, comment, ...).

Custom removal rules (--config)

For revision markup and other project-specific cleanup, create a YAML config file. A template is in arxiv_config.yaml.

# Remove command AND its argument (text is lost)
commands_to_delete:
  - \deleted
  - \revision

# Remove command but KEEP its argument text
commands_to_unwrap:
  - \color{red}       # \color{red}text → text
  - \textcolor{red}   # \textcolor{red}{text} → text
  - \added            # \added{new text} → new text

# Remove entire environments
environments_to_delete:
  - response

# Raw regex (last resort — prefer the verbs above when they fit).
# Recipe: any-color \textcolor → unwrapped text. Won't span nested
# commands like \cite — for those, use one commands_to_unwrap per color.
replacements:
  - pattern: '\\textcolor\{[^}]*\}\{([^}]*)\}'
    replacement: '\1'

The config parser is built in (no extra dependencies). The brace-balanced matcher correctly handles nested commands like \deleted{see \cite{x}}.

Unknown top-level keys warn — typos like command_to_delete (singular) no longer silently no-op. A malformed regex in any replacements rule emits a [warn] naming the rule's index, then skips just that rule; other rules still apply.

CI / pre-commit integration

A GitHub Action and pre-commit hook are coming in v0.7.0. See docs/ci.md for details and usage examples.

In the meantime, you can use latex2arxiv directly in any CI script:

- run: pip install latex2arxiv && latex2arxiv paper.zip --dry-run

The exit code is non-zero on [error], so this fails the job automatically.

Known limitations

Dynamically constructed filenames\includegraphics{\figpath/fig1} cannot be resolved statically and the image will be deleted. Expand path macros before running.

\subfile vs \input path resolution\input/\include paths resolve relative to the project root; \subfile paths resolve relative to the subfile's own directory. Unusual nested setups may cause images to be incorrectly pruned; use --compile to verify.

--compile is a local sanity check — a successful local compile doesn't guarantee arXiv will compile it. arXiv pins specific TeX Live versions. Always check the arXiv submission preview after uploading.

[^main]: JASA_main.tex is identified as the main file via auto-detection (or pass --main JASA_main.tex to be explicit). [^supp]: Supplementary_Materials.tex is kept because it's a \subfile dependency of the main file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex2arxiv-0.7.0.tar.gz (47.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex2arxiv-0.7.0-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file latex2arxiv-0.7.0.tar.gz.

File metadata

  • Download URL: latex2arxiv-0.7.0.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latex2arxiv-0.7.0.tar.gz
Algorithm Hash digest
SHA256 a6b84b98b69e87d841b59e76a9f07bf0ce1478978e1624bb03ec18c8cb25467d
MD5 720ef1623b92a4830f6f556fbc0175df
BLAKE2b-256 96dc4862e0fc5966709fd6141853d13daaa543667298a776c81d9ea1b7f98a1a

See more details on using hashes here.

Provenance

The following attestation bundles were made for latex2arxiv-0.7.0.tar.gz:

Publisher: publish.yml on YuZh98/latex2arxiv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file latex2arxiv-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: latex2arxiv-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latex2arxiv-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f59c464682b941fad60ec87b20a2ba444ecf4bfa154a82628b5cb08c17a3534d
MD5 7ddaa7ebcf1cf35cc56a4a654877edcc
BLAKE2b-256 cccde2e8f3f528bc9859cd1f29fa2d6ddfd1807edc8c5ab1bc7332a1bda55d6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for latex2arxiv-0.7.0-py3-none-any.whl:

Publisher: publish.yml on YuZh98/latex2arxiv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page