Skip to main content

Validates arXiv compatibility and cleans a LaTeX .zip project in one command

Project description

latex2arxiv

Python PyPI License: MIT

Validates arXiv compatibility and cleans your LaTeX project in one command.

If you submit papers to arXiv — especially from Overleaf — this tool is for you. Drop in a .zip, get an arXiv-ready .zip back, with pre-flight checks that catch submission-blocking issues before you upload.

On a real statistics paper: 950 → 40 files, 82 MB → 3 MB.

latex2arxiv paper.zip --compile
  main tex: paper.tex
  remove: .DS_Store
  remove: cover_letter.md
  remove: paper.aux
  remove: figures/old_unused.pdf
  ... (43 more)
  [warn] \today used in \date — arXiv may rebuild the PDF and the date will change

Done → paper_arxiv.zip
Summary: 47 removed, 12 kept | 79.1 MB → 3.2 MB | 0 errors, 1 warning

Compiling paper.tex ...
  PDF → paper_arxiv.pdf

The cleaned demo's PDF is attached to every GitHub Release as demo_project_arxiv.pdf — see the output without installing.

What it does

  • File pruning — keeps only files reachable from your main .tex; removes everything else (build artifacts, editor files, cover letters, unused figures)
  • arXiv compatibility checks[error] for shell-escape packages (minted, pythontex); [warn] for biblatex without .bbl, output > 50 MB, problematic filenames, and other gotchas
  • Comment + draft cleanup — strips % ... comments; removes \todo{}, \hl{}, \note{}, \fixme{}, \begin{comment} blocks, \iffalse...\fi blocks, and draft-only packages
  • \pdfoutput=1 auto-injection — arXiv requires it; easy to forget
  • BibTeX normalization — canonical field ordering, deduplication, private-field strip (requires bibtexparser)
  • Image resizing (optional) — caps longest side at N pixels via Pillow
  • Custom revision-markup rules (optional) — YAML config; brace-balanced matcher correctly handles \deleted{see \cite{x}}
  • --compile — runs pdflatex and opens the cleaned PDF for review

Dependency tracking respects \input, \include, \subfile, \includegraphics, \graphicspath, and \bibliography. Commented-out commands are ignored.

How does this compare to arxiv_latex_cleaner?

arxiv_latex_cleaner is the established tool in this space — Google-backed, ~5k★, years of usage. If you want the most battle-tested option, use it.

Where latex2arxiv is different

  • Pre-flight checks with severity levels. [error] blocks submission and exits non-zero (CI-friendly); [warn] flags risk. Nothing else in this space does this.
  • One-command zip-in / zip-out workflow. Drop a .zip, get a .zip back, optionally compile and preview the PDF. No directory dance, no manual repack.
  • Brace-balanced config matcher. \deleted{see \cite{x}} and \added{some \emph{nested} text} work correctly — naive regex-based cleaners silently leave nested content behind.
  • \pdfoutput=1 auto-injection and BibTeX normalization out of the box.

Where arxiv_latex_cleaner is stronger

  • Maturity — thousands of papers cleaned, larger contributor pool, more edge cases discovered.
  • Ghostscript-based PDF compression — we don't bundle this.
  • PNG → JPG conversion — we don't do this.

Full feature comparison

latex2arxiv arxiv_latex_cleaner
Output format .zip.zip Cleaned directory
Pre-flight [error] / [warn]
Non-zero exit on errors
--compile preview
Auto-detect main .tex
Brace-balanced config
BibTeX normalization
\pdfoutput=1 injection
--dry-run
Built-in --demo
Image resizing (Pillow)
PDF compression (Ghostscript)
PNG → JPG conversion
Maturity New (0.5.0) ~5k★, years

Installation

pip install latex2arxiv

On macOS, if you get an externally-managed-environment error, use pipx:

brew install pipx
pipx install latex2arxiv

Or from source:

git clone https://github.com/YuZh98/latex2arxiv
cd latex2arxiv
pip install .

pdflatex is required only for --compile (install via TeX Live or MacTeX).

Once installed, try the built-in demo to see the tool in action — no input file needed:

latex2arxiv --demo --compile

This processes a bundled self-documenting paper and opens the cleaned PDF.

Usage

latex2arxiv input.zip [output.zip] [--main MAIN_TEX] [--resize PX] [--config FILE] [--compile]
Flag Description
--main FILENAME Specify the main .tex file (e.g. JASA_main.tex). Auto-detected via \documentclass if omitted.
--resize PX Resize images so longest side ≤ PX pixels (e.g. --resize 1600). Requires Pillow.
--config FILE YAML config file for custom removal rules (see below).
--compile Run pdflatex on the output and open the resulting PDF.
--dry-run Preview what would be removed/processed without writing any output.
--demo Run the built-in demo project (no input file needed).

Examples

latex2arxiv paper.zip                                  # auto-detect main, basic conversion
latex2arxiv paper.zip out.zip --main main.tex --compile
latex2arxiv paper.zip --resize 1600 --compile          # shrink images
latex2arxiv paper.zip --config arxiv_config.yaml       # custom rules
latex2arxiv paper.zip --dry-run                        # preview without writing
latex2arxiv --demo --compile                           # run the built-in demo

The tool exits non-zero if any pre-flight error fires (e.g. \usepackage{minted}) — useful for CI gating. Warnings do not affect the exit code.

Custom removal rules (--config)

For revision markup and other project-specific cleanup, create a YAML config file. A template is in arxiv_config.yaml.

# Remove command AND its argument (text is lost)
commands_to_delete:
  - \deleted
  - \revision

# Remove command but KEEP its argument text
commands_to_unwrap:
  - \color{red}       # \color{red}text → text
  - \textcolor{red}   # \textcolor{red}{text} → text
  - \added            # \added{new text} → new text

# Remove entire environments
environments_to_delete:
  - response

# Raw regex replacements
replacements:
  - pattern: '\\added\{([^}]*)\}'
    replacement: '\1'

The config parser is built in (no extra dependencies). The brace-balanced matcher correctly handles nested commands like \deleted{see \cite{x}}.

Known limitations

Dynamically constructed filenames\includegraphics{\figpath/fig1} cannot be resolved statically and the image will be deleted. Expand path macros before running.

\subfile vs \input path resolution\input/\include paths resolve relative to the project root; \subfile paths resolve relative to the subfile's own directory. Unusual nested setups may cause images to be incorrectly pruned; use --compile to verify.

Inline \verb|...| — comment-stripping and draft-removal don't currently protect inline \verb|...|. A % or \todo{...} inside \verb|...| may get mangled. Standard verbatim, lstlisting, and minted block environments are protected.

--compile is a local sanity check — a successful local compile doesn't guarantee arXiv will compile it. arXiv pins specific TeX Live versions. Always check the arXiv submission preview after uploading.

Project structure

converter.py        # CLI entry point
pipeline/
    tex.py          # Comment stripping, draft annotation removal
    bibtex.py       # BibTeX normalization
    deps.py         # Dependency graph (tex includes, images, bib files)
    images.py       # Image resizing
    config.py       # User-defined removal rules
arxiv_config.yaml   # Sample config file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex2arxiv-0.5.1.tar.gz (32.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex2arxiv-0.5.1-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file latex2arxiv-0.5.1.tar.gz.

File metadata

  • Download URL: latex2arxiv-0.5.1.tar.gz
  • Upload date:
  • Size: 32.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latex2arxiv-0.5.1.tar.gz
Algorithm Hash digest
SHA256 af4fdad34d00f7f95e46ec4dccf3f02e70d5395fce0064d0900c59322cdfcefc
MD5 ad0542abe22523da470dc7ccf4aae8d4
BLAKE2b-256 5583c0b9d9c8264755640aac8a5a0eb936d7cb2cd5367ba68033d95573e6450d

See more details on using hashes here.

Provenance

The following attestation bundles were made for latex2arxiv-0.5.1.tar.gz:

Publisher: publish.yml on YuZh98/latex2arxiv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file latex2arxiv-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: latex2arxiv-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latex2arxiv-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 12d7493f0056a79a22918ebf29096ef9eb1aa696a7130133081266d683bad64e
MD5 cf3b1a1d73aa35637d811edcafb21588
BLAKE2b-256 f2df24031cf961f201e9659b4e1834a94425f4dad7f68c6c340a2497a3b6a79d

See more details on using hashes here.

Provenance

The following attestation bundles were made for latex2arxiv-0.5.1-py3-none-any.whl:

Publisher: publish.yml on YuZh98/latex2arxiv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page