Skip to main content

Convert a LaTeX .zip project into an arXiv-ready .zip

Project description

latex2arxiv

Python PyPI License: MIT

A command-line tool that converts a LaTeX .zip project into an arXiv-ready .zip in one command.

latex2arxiv paper.zip --main main.tex --compile

Works with any LaTeX .zip — including projects exported directly from Overleaf.


🚀 Try it in 30 seconds — a self-documenting demo is included:

pip install latex2arxiv
latex2arxiv --demo --compile

This opens a PDF that explains exactly what the converter does and shows the cleaned output.


What it does

Stage Action
File pruning Removes unused .tex, .bib, image, and all non-essential files (build artifacts, editor files, cover letters, etc.)
Comment stripping Removes % ... comments from all .tex files
Draft cleanup Removes \todo{}, \hl{}, \note{}, \fixme{}, \begin{comment} blocks, \iffalse...\fi blocks, and draft-only packages
BibTeX normalization Canonical field ordering, deduplication, private field removal
\pdfoutput=1 Injected before \documentclass if missing (required by arXiv)
Image resizing Optional: resize images so longest side ≤ N pixels (helps keep submission size manageable)
Custom rules Optional: remove or unwrap user-defined commands via a config file
Pre-flight checks Flags arXiv compatibility issues: shell-escape packages (minted, pythontex) as errors; biblatex without .bbl, output > 50 MB, and problematic filenames as warnings
Compile check Optional: compiles with pdflatex and opens the PDF for review

Dependency tracking respects \input, \include, \subfile, \includegraphics, \begin{overpic}, and \bibliography. Commented-out commands are ignored.

Real-world results on a statistics paper:

  • 950 files → 40 files
  • 82 MB → 3 MB

Installation

pip install latex2arxiv

On macOS, if you get an externally-managed-environment error, use pipx instead:

brew install pipx
pipx install latex2arxiv

Or from source:

git clone https://github.com/YuZh98/latex2arxiv
cd latex2arxiv
pip install .

pdflatex is required only for the --compile flag (install via TeX Live or MacTeX).

Usage

latex2arxiv input.zip [output.zip] [--main MAIN_TEX] [--resize PX] [--config FILE] [--compile]
Flag Description
--main FILENAME Specify the main .tex file (e.g. JASA_main.tex). Auto-detected via \documentclass if omitted.
--resize PX Resize images so longest side ≤ PX pixels (e.g. --resize 1600). Requires Pillow.
--config FILE YAML config file for custom removal rules (see below).
--compile Run pdflatex on the output and open the resulting PDF.
--dry-run Preview what would be removed/processed without writing any output.
--demo Run the built-in demo project (no input file needed).

Examples

# Basic conversion (auto-detect main file)
latex2arxiv paper.zip

# Specify main file and compile for review
latex2arxiv paper.zip arxiv_ready.zip --main main.tex --compile

# Resize large images to reduce submission size
latex2arxiv paper.zip --resize 1600 --compile

# Apply custom removal rules
latex2arxiv paper.zip --config arxiv_config.yaml --compile

# Preview what would be removed without writing any output
latex2arxiv paper.zip --dry-run

# Run the built-in demo (no input file needed)
latex2arxiv --demo --compile

The tool exits non-zero if any pre-flight error fires (e.g. \usepackage{minted} detected) — useful for CI gating. Warnings do not affect the exit code.

Custom removal rules (--config)

For revision markup and other project-specific cleanup, create a YAML config file. A template is provided in arxiv_config.yaml.

# Remove command AND its argument (text is lost)
commands_to_delete:
  - \deleted
  - \revision

# Remove command but KEEP its argument text
commands_to_unwrap:
  - \color{red}       # \color{red}text → text
  - \textcolor{red}   # \textcolor{red}{text} → text
  - \added            # \added{new text} → new text

# Remove entire environments
environments_to_delete:
  - response

# Raw regex replacements
replacements:
  - pattern: '\\added\{([^}]*)\}'
    replacement: '\1'

No extra dependencies required — the config parser is built in.

Caveats

Dynamically constructed filenames — if your code uses a macro for an image path (e.g. \includegraphics{\figpath/fig1}), the tool cannot resolve it statically and will delete the image. Expand macros before running the converter.

Custom verbatim environments — comments inside standard verbatim, lstlisting, and minted blocks are preserved. Non-standard verbatim-like environments may not be protected.

\subfile vs \input path resolution — image paths in \input/\included files are resolved relative to the project root (how LaTeX works). Paths in \subfile documents are resolved relative to the subfile's own directory. Unusual nested path setups may cause images to be incorrectly pruned; use --compile to verify.

BibTeX normalization requires bibtexparser — install with pip install bibtexparser. If not installed, the .bib file is passed through unchanged.

--compile is a local sanity check — a successful local compile does not guarantee arXiv will compile it. arXiv uses specific TeX Live versions with fixed package sets. Always check the arXiv submission preview after uploading.

Custom style/class files — if your project includes a .cls or .sty file, the tool keeps it and warns you. Verify it is not already provided by TeX Live; if it is, remove it from your submission to avoid conflicts.

Double-spaced / referee mode — the tool warns if it detects referee, doublespace, or \doublespacing in your source. arXiv requires single-spaced submissions.

\today in \date — arXiv occasionally rebuilds PDFs, which will change the displayed date. The tool warns if it detects \today in \date.

Project structure

converter.py        # CLI entry point
pipeline/
    tex.py          # Comment stripping, draft annotation removal
    bibtex.py       # BibTeX normalization
    deps.py         # Dependency graph (tex includes, images, bib files)
    images.py       # Image resizing
    config.py       # User-defined removal rules
arxiv_config.yaml   # Sample config file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latex2arxiv-0.5.0.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latex2arxiv-0.5.0-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file latex2arxiv-0.5.0.tar.gz.

File metadata

  • Download URL: latex2arxiv-0.5.0.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latex2arxiv-0.5.0.tar.gz
Algorithm Hash digest
SHA256 2946f633a385510ba4f8883f23ee3a4491f142d23df97554a05b7968b50cd07b
MD5 5400d870a49165e2ff75ba119c4f73b5
BLAKE2b-256 a08b702849fd1db757001273466bfbe2ec19c3cfbac9321272d2b749757050f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for latex2arxiv-0.5.0.tar.gz:

Publisher: publish.yml on YuZh98/latex2arxiv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file latex2arxiv-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: latex2arxiv-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for latex2arxiv-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47238d70926bf0e32d21f97050143d02c5c9a4a3eb44a0051c68f9baa8efecb7
MD5 48858e90d2990959b1bd4cee8aae9936
BLAKE2b-256 890c5781b869c29641a614b0479ce661b28cc1f0e041ebbd7dc504e37bcf9669

See more details on using hashes here.

Provenance

The following attestation bundles were made for latex2arxiv-0.5.0-py3-none-any.whl:

Publisher: publish.yml on YuZh98/latex2arxiv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page