Convert arXiv papers to clean Markdown. Particularly useful for prompting LLMs.

These details have not been verified by PyPI

Project links

homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

arxiv2md

Convert arXiv papers to clean Markdown for LLMs

Live Demo · Documentation · Report Bug

Why?

I got tired of copy-pasting arXiv PDFs/HTML into LLMs and fighting references, TOCs, and token bloat. So I made gitingest.com but for arXiv papers.

The trick: Just append 2md to any arXiv URL:

https://arxiv.org/abs/2501.11120v1  →  https://arxiv2md.org/abs/2501.11120v1

Features

Zero friction: Append 2md to any arXiv URL (works with /abs/, /html/, /pdf/)
Section filtering: Remove references, appendix, or select only specific sections
Clean output: No messy PDFs or broken formatting
Section tree: Visual overview - click to include/exclude sections
LLM-optimized: Token counts, clean citations
Fast: Cached results, direct HTML parsing

How It Works

arxiv2md is fast because it takes advantage of arXiv's HTML format for papers. Instead of parsing PDFs (slow, error-prone), we directly parse the structured HTML that arXiv provides for newer papers. This gives us:

Clean section boundaries and hierarchies
Proper math rendering (MathML → Markdown)
Reliable table extraction
Fast processing (no OCR or PDF parsing)

The HTML is converted to Markdown using BeautifulSoup4, with custom logic for handling citations, math equations, and paper structure.

Usage

Web App

Visit arxiv2md.org and paste any arXiv URL, or append 2md to an arXiv URL in your browser.

CLI

# Install
pip install -e .

# Basic usage
arxiv2md 2501.11120v1 -o paper.md

# Only include specific sections
arxiv2md 2501.11120v1 --section-filter-mode include --sections "Abstract,Introduction" -o -

# Remove references and TOC
arxiv2md 2501.11120v1 --remove-refs --remove-toc -o -

# Include YAML frontmatter with paper metadata
arxiv2md 2501.11120v1 --frontmatter -o paper.md

API

Two GET endpoints for programmatic access:

# JSON response (with metadata)
curl "https://arxiv2md.org/api/json?url=2312.00752"

# Raw markdown
curl "https://arxiv2md.org/api/markdown?url=2312.00752"

Parameters:

Param	Default	Description
`url`	required	arXiv URL or ID
`remove_refs`	`true`	Remove references
`remove_toc`	`true`	Remove table of contents
`remove_citations`	`true`	Remove inline citations
`frontmatter`	`false`	Prepend YAML frontmatter with paper metadata (`/api/markdown` only)

Rate limit: 30 requests/minute per IP.

Section Filtering

Exclude mode (default): Remove unwanted sections like References or Appendix Include mode: Extract only what you need like "Abstract,Introduction,Conclusion"

The section tree in the UI lets you click sections to toggle them in/out.

Development

# Run locally
python -m venv .venv
source .venv/bin/activate
pip install -e .[server]
uvicorn server.main:app --reload --app-dir src

# Run tests
pip install -e .[dev]
pytest tests

Deployment

One-command deployment to DigitalOcean with Docker, Nginx, and SSL:

git clone https://github.com/timf34/arxiv2md.git /root/arxiv2md
cd /root/arxiv2md
chmod +x deploy.sh
sudo ./deploy.sh

Contributing

PRs welcome! Fork the repo, create a feature branch, add tests if applicable, and submit a PR.

License

MIT

Inspired by gitingest for digesting Git repos.

Star this repo if you find it useful!

Project details

These details have not been verified by PyPI

Project links

homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.0

Mar 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arxiv2markdown-0.1.0-py3-none-any.whl (1.8 MB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file arxiv2markdown-0.1.0-py3-none-any.whl.

File metadata

Download URL: arxiv2markdown-0.1.0-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 1.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for arxiv2markdown-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5adabc46ca4e1a55921304df0b5d284efa7f5ec1d7c7684e96b19f5452355232`
MD5	`65a31200b409c233f46eff64565ecd5e`
BLAKE2b-256	`e4db758b5c589be8187e856e0820cf67a8565af99f8b6b6ee380217e459b0814`

See more details on using hashes here.

arxiv2markdown 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

arxiv2md

Why?

Features

How It Works

Usage

Web App

CLI

API

Section Filtering

Development

Deployment

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes