Skip to main content

Validates arXiv compatibility and cleans a LaTeX .zip project in one command

Project description

latex2arxiv

PyPI Downloads Tests Python License: MIT

Submit to arXiv without the headache. One command cleans your project, catches rejection-causing errors, and walks you through the upload.

latex2arxiv paper.zip --compile          # clean + verify PDF
latex2arxiv paper.zip --compile --guide  # + step-by-step upload instructions
latex2arxiv paper/ --compile             # directory input
latex2arxiv https://github.com/u/p.git   # git URL input

Your original project is never modified. All output goes to a new _arxiv.zip file.

Try the built-in demo:

pip install latex2arxiv
latex2arxiv --demo --compile --guide

This processes a bundled self-documenting paper, opens the cleaned PDF, and writes a step-by-step arXiv upload guide with copy-paste-ready metadata. The cleaned demo's PDF is attached to every GitHub Release as demo_project_arxiv.pdf.

Before / After

On a real statistics paper (arXiv:2504.11630): 934 โ†’ 40 files, 80.6 MB โ†’ 3.1 MB.

latex2arxiv demo
Before (Overleaf export) After (latex2arxiv output)
๐Ÿ“ Images/ ๐Ÿ“ Images/
๐Ÿ“„ JASA_main.tex ๐Ÿ“„ JASA_main.tex
๐Ÿ“„ JASA_main_backup.tex ๐Ÿ“„ ref.bib
๐Ÿ“„ main_bak_svm.tex ๐Ÿ“„ Supplementary_Materials.tex
๐Ÿ“„ cover_letter.md
๐Ÿ“„ response.tex
๐Ÿ“„ ref.bib
๐Ÿ“„ JASA_main.aux/.log/.bbl/.pdf
๐Ÿ“ jasa_comments/, jasa_revision/
... (and ~930 more)
934 files, 80.6 MB 40 files, 3.1 MB

Who is this for?

You've never submitted to arXiv before. Your project compiles locally. arXiv might still reject it for reasons nobody warned you about. latex2arxiv paper.zip --compile --guide flags the rejection-causing issues and writes you a copy-paste-ready upload walkthrough.

You wrote it in Overleaf. Overleaf gave you hundreds of files and messy tex files. You need to tidy up everything safely. Overleaf โ†’ arXiv quickstart โ†’

You're CI-gating a paper repo. latex2arxiv paper.zip --dry-run exits non-zero on rejection-causing errors. Drop it into your build matrix.

Your paper has revision tracking. \added{}, \deleted{}, \textcolor{red}{} โ€” gone, no manual cleanup. Custom removal rules โ†’

What it does

Feature What it does
๐Ÿ“ฆ One command, any input Accepts a .zip, directory, or git URL; outputs an arXiv-ready .zip; optionally compiles and opens the PDF for review
โœ‚๏ธ Prunes your project to submission-ready Keeps only files reachable from your main .tex; removes build artifacts, editor files, cover letters, unused figures
๐Ÿงน Cleans your .tex Strips comments, removes \todo{} / \hl{} / draft packages, handles nested braces correctly (\deleted{see \cite{x}} works)
๐Ÿšจ Catches submission blockers before you upload [error] for shell-escape packages that will fail on arXiv (minted, pythontex); [warn] for biblatex without .bbl, missing index files, oversized output, undefined citations, problematic filenames โ€” full list
๐Ÿ—บ๏ธ Guides you through upload --guide extracts title, authors, abstract, page/figure/table counts and writes a step-by-step arXiv upload walkthrough

Also: --flatten (single-file output, docs), --json (CI integration, schema), --resize (image downscaling), --dry-run (preview without writing), BibTeX normalization, \pdfoutput=1 injection.

Dependency tracking respects \input, \include, \subfile, \includegraphics, \graphicspath, and \bibliography. Commented-out commands are ignored.

Upload guide (--guide)

Pass --guide and latex2arxiv writes a plain-text file alongside your output zip with everything you need for the arXiv upload form:

โ”€โ”€ arXiv Upload Guide โ”€โ”€

๐Ÿ“‹ Your metadata (copy-paste ready):

  Title:
    Statistical Modeling of Combinatorial Response Data

  Authors:
    Yu Zheng, Malay Ghosh, Leo Duan

  Abstract:
    There is a rich literature for modeling binary and polychotomous responses...

  Comments:
    53 pages, 13 figures, 6 tables

๐Ÿ“Œ Step 1: Start a new submission or replace an existing one
๐Ÿ“Œ Step 2: Choose license
๐Ÿ“Œ Step 3: Select category
๐Ÿ“Œ Step 4: Upload files (arXiv may warn about .sty โ€” ignore it)
๐Ÿ“Œ Step 5: Check processing
๐Ÿ“Œ Step 6: Fill in metadata (paste from above)
๐Ÿ“Œ Step 7: Preview and submit

๐Ÿ“ Files in your zip:
    JASA_main.tex โ† main file
    ref.bib
    Supplementary_Materials.tex
    Images/
    ...

No more guessing what goes where.

Works everywhere

Terminal โ€” one command, full pipeline:

latex2arxiv paper.zip --compile --guide

CI โ€” gate your paper repo on arXiv compliance:

- run: pip install latex2arxiv && latex2arxiv paper.zip --dry-run

AI agents โ€” Claude, Cursor, or Copilot validate and fix issues in conversation:

pip install "latex2arxiv[mcp]"
{"mcpServers": {"latex2arxiv": {"command": "latex2arxiv-mcp"}}}

Installation

pip install latex2arxiv

If you get an externally-managed-environment error from pip, use pipx:

brew install pipx
pipx install latex2arxiv

On macOS, install via Homebrew (no Python toolchain required):

brew tap YuZh98/latex2arxiv
brew install latex2arxiv

First brew install builds Pillow from source. To avoid 5+ min silence, add --verbose to monitor installation progress.

Or from source:

git clone https://github.com/YuZh98/latex2arxiv
cd latex2arxiv
pip install .

pdflatex is required only for --compile (install via TeX Live or MacTeX).

Usage

latex2arxiv input [output.zip] [options]

input can be a .zip file, a directory of LaTeX sources, or a git URL (https or ssh). Directories are zipped internally; git URLs are cloned with --depth 1.

Flag Description
--main FILENAME Specify the main .tex file (e.g. JASA_main.tex). Auto-detected via \documentclass if omitted.
--resize PX Resize images so longest side โ‰ค PX pixels (e.g. --resize 1600). Requires Pillow.
--config FILE YAML config file for custom removal rules (see below).
--compile Run pdflatex on the output and open the resulting PDF.
--guide Write a detailed arXiv upload guide (metadata + step-by-step instructions) to a text file alongside the output.
--dry-run Preview what would be removed/processed without writing any output.
--flatten Inline every \input / \include / \subfile into the main .tex for single-file output. Details.
--json Emit a machine-readable JSON summary on stdout; route progress to stderr. Schema.
--demo Run the built-in demo project (no input file needed).
--version Print version and exit.

Examples

latex2arxiv paper.zip                                  # zip input
latex2arxiv paper/                                     # directory input
latex2arxiv https://github.com/user/paper.git          # git URL input
latex2arxiv paper.zip out.zip --main main.tex --compile
latex2arxiv paper.zip --resize 1600 --compile          # shrink images
latex2arxiv paper.zip --config arxiv_config.yaml       # custom rules
latex2arxiv paper.zip --compile --guide                # full pipeline + upload guide
latex2arxiv paper.zip --dry-run                        # preview without writing
latex2arxiv --demo --compile --guide                   # run the built-in demo

Pre-flight checks

Before producing the output zip, latex2arxiv validates the project against arXiv's LaTeX submission guide. [error] lines block submission (the tool exits non-zero, useful for CI gating); [warn] lines are advisory and do not affect the exit code.

$ latex2arxiv paper.zip --dry-run
  [error] \usepackage{minted} requires shell-escape โ€” arXiv compiles without it; this submission will fail to build
  [error] \usepackage{psfig} โ€” arXiv no longer supports the psfig package
  [warn]  \today used in \date โ€” arXiv may rebuild the PDF and the date will change
  [warn]  .eps image found: photo.eps โ€” pdflatex does not support .eps; convert to .pdf or .png
  [warn]  \printindex used but no .ind file at root โ€” build locally and re-run latex2arxiv

Summary: 2 errors, 7 warnings

Either [error] line would have caused arXiv to reject the submission after upload. The exit code is non-zero on errors, so a CI step like latex2arxiv paper.zip --dry-run fails the build before the bad submission ever leaves the repo.

See docs/pre-flight.md for the full list of checks and silent fixes.

Custom removal rules (--config)

For revision markup and other project-specific cleanup, create a YAML config file. A template is in arxiv_config.yaml.

# Remove command AND its argument (text is lost)
commands_to_delete:
  - \deleted
  - \revision

# Remove command but KEEP its argument text
commands_to_unwrap:
  - \color{red}       # \color{red}text โ†’ text
  - \textcolor{red}   # \textcolor{red}{text} โ†’ text
  - \added            # \added{new text} โ†’ new text

# Remove entire environments
environments_to_delete:
  - response

# Raw regex (last resort โ€” prefer the verbs above when they fit).
replacements:
  - pattern: '\\textcolor\{[^}]*\}\{([^}]*)\}'
    replacement: '\1'

The brace-balanced matcher correctly handles nested commands like \deleted{see \cite{x}}. Unknown top-level keys warn โ€” typos like command_to_delete (singular) no longer silently no-op.

latex2arxiv vs. arxiv_latex_cleaner

arxiv_latex_cleaner is the incumbent โ€” Google-backed, mature, cleans well. Here's how the two compare on the things that change your workflow.

What only latex2arxiv does

latex2arxiv arxiv_latex_cleaner
Pre-flight [error] / [warn] (details) โœ… โŒ
Upload walkthrough (--guide) โœ… โŒ
Non-zero exit on errors (CI-gateable) โœ… โŒ
Outputs the .zip you upload โœ… โŒ
MCP server (Claude / Cursor / Copilot) โœ… โŒ
GitHub Action + pre-commit hook โœ… โŒ
VS Code extension โœ… โŒ
Multiple input forms (.zip / directory / git URL) โœ… โŒ
--compile preview โœ… โŒ
--dry-run โœ… โŒ
--demo โœ… โŒ
Auto-detect main .tex โœ… โŒ
Brace-balanced config โœ… โŒ

What only arxiv_latex_cleaner does

latex2arxiv arxiv_latex_cleaner
PDF compression (Ghostscript) โŒ โœ…
PNG โ†’ JPG conversion โŒ โœ…

If you need image transcoding for size, run arxiv_latex_cleaner first, or use latex2arxiv --resize PX.

Both do

BibTeX normalization ยท image resizing (Pillow).

Maturity

latex2arxiv arxiv_latex_cleaner
v1.0 production-stable ยท 380 tests ยท Python 3.10โ€“3.13 matrix ยท live pdflatex+biber end-to-end CI ยท 10 regression fixtures ~5kโ˜…, years in production

Integrations

Surface Status Details
CLI โœ… pip install latex2arxiv
GitHub Action โœ… action.yml
pre-commit hook โœ… latex2arxiv-dryrun
MCP server (AI agents) โœ… pip install "latex2arxiv[mcp]" โ€” setup
VS Code extension โœ… Marketplace โ€” ext install YuZh98.latex2arxiv
Homebrew formula โœ… brew tap YuZh98/latex2arxiv && brew install latex2arxiv

Known limitations

Dynamically constructed filenames โ€” \includegraphics{\figpath/fig1} cannot be resolved statically and the image will be deleted. Expand path macros before running.

\subfile vs \input path resolution โ€” \input/\include paths resolve relative to the project root; \subfile paths resolve relative to the subfile's own directory. Unusual nested setups may cause images to be incorrectly pruned; use --compile to verify.

--compile is a local sanity check โ€” a successful local compile doesn't guarantee arXiv will compile it. arXiv pins specific TeX Live versions. Always check the arXiv submission preview after uploading.

FAQ

1. arXiv rejected my submission even though latex2arxiv said it was clean. Pre-flight catches the documented submission-blocking patterns. arXiv pins specific TeX Live versions and occasionally surfaces new edge cases โ€” always run the arXiv submission preview after upload. If you hit a reproducible miss, file an issue with your project zip.

2. What's the difference between [error] and [warn]? Errors block submission and exit the tool non-zero โ€” use them to gate CI. Warnings are advisory: the build will likely succeed on arXiv but a human should look. Example: missing .bbl is a warn (arXiv will run BibTeX); \usepackage{minted} is an error (shell-escape isn't allowed).

3. My main .tex isn't being auto-detected correctly. Auto-detection ranks files containing \documentclass by \input reference count. For ambiguous projects (response letters next to the paper, multiple \documentclass files), pass --main paper.tex explicitly.

4. Will this modify my original files? No. All output goes to a new _arxiv.zip (or whatever path you pass). The source project is read-only.

5. My CI step keeps failing on what I thought were just warnings. Warnings don't fail CI. If your build is failing, it's an [error] โ€” read the message. Use --json for a machine-readable summary.

6. Why does brew install hang for 5+ minutes? Homebrew compiles Pillow's C extensions from source and suppresses progress output. Add --verbose to see what's happening.


โญ Found this useful? Star on GitHub โ€” it helps others find the tool.

๐Ÿ› Issues or feature requests: github.com/YuZh98/latex2arxiv/issues

๐Ÿ“ฆ Install: pip install latex2arxiv ยท brew install latex2arxiv (after brew tap YuZh98/latex2arxiv)

๐ŸŽฌ Try the demo: latex2arxiv --demo --compile --guide

Made by Hugh Zheng ยท MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex2arxiv-1.0.0.tar.gz (88.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex2arxiv-1.0.0-py3-none-any.whl (49.2 kB view details)

Uploaded Python 3

File details

Details for the file latex2arxiv-1.0.0.tar.gz.

File metadata

  • Download URL: latex2arxiv-1.0.0.tar.gz
  • Upload date:
  • Size: 88.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latex2arxiv-1.0.0.tar.gz
Algorithm Hash digest
SHA256 496ab03d380cacdcdde2fd8a190c5a20ddda2e51db965de80e61b43c3aec58a1
MD5 acff9311679a283712df2c72f85b5da9
BLAKE2b-256 4e8e598e6aaa6feab61b143c0ca921492f72c9d0889fae654e15b6b311e384d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for latex2arxiv-1.0.0.tar.gz:

Publisher: publish.yml on YuZh98/latex2arxiv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file latex2arxiv-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: latex2arxiv-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 49.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latex2arxiv-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b04ed35e58464b0dc68c1255470c1834a7be1dcd8b56c244cd1b7ab90bdb144
MD5 15df2c6130ca8f0dbe13638e6b878557
BLAKE2b-256 262239e421036ef90e42b6dd1dce75cf4cda3eb82ead814804977efadebf0ccf

See more details on using hashes here.

Provenance

The following attestation bundles were made for latex2arxiv-1.0.0-py3-none-any.whl:

Publisher: publish.yml on YuZh98/latex2arxiv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page