Skip to main content

Command-line Papers Downloader. Citation extraction and PDF naming automation.

Project description

arXiv-dl

Command-line Research Paper Downloader for arXiv.org, ECVA & CVF Open Access.

Disclaimer: This is a highly-opinionated command-line tool for downloading papers. It priorities ease of use for researchers. Obviously, this is not an ArXiv official project.

What does it do?

  • Support downloading papers from arXiv, ECCV, CVPR, ICCV, WACV via simple CLI.
  • Support downloading speedup by using aria2.
  • Retrieve the paper's metadata such as:
    • Title, Abstract, Year
    • Authors
    • Comments (Conference acceptance info)
    • Repository URLs
    • BibTeX Citation
  • Automatically maintain a list of local papers and their metadata in a JSON file.
  • Configure the desired download destination via an environment variable or a command-line argument.
  • All downloaded papers will have standardized filename for easy browsing.

Why?

  • Save time and effort to download and organize papers on your machine.
  • Speedup downloading process by using multiple parallel connections.
  • Local paper list would be handy for quick local lookup, making notes, and doing citations.

How to install it?

This is a command-line tool, simply use pip to install the package globally, then you are good to go!

  • Pre-requisite: Python 3.x
python3 -m pip install -U arxiv-dl

[!NOTE] After installation, you need to ensure the installation path is included in your PATH environment variable (tips: here). If you encounter any difficulty finding / setting the PATH, there is this recommended way of installing stand alone command line tools, kindly follow its instruction when installing arxiv-dl.

Optionally, install aria2c for multi-connection download speedup.

  • MacOS: brew install aria2
  • Linux: sudo snap install aria2c

How to use it?

After installation, you may use the command paper in your shell to download papers. (Legacy commands arxiv-dl and getpaper are equivalent to the command paper.)

paper [OPTIONS] TARGET(s)

Use in your shell:

# download a single TARGET
$ paper 1512.03385

# download multiple TARGETs separated by space
$ paper 2103.15538 2304.04415 https://arxiv.org/abs/1512.03385

Supported types of download TARGETs:

Click to expand

✅ Supported, 🚧 Not Yet Supported, ❌ Not Supported

  • ArXiv
    • ✅ ArXiv ID: 1512.03385 or arXiv:1512.03385
    • ✅ Legacy ArXiv ID: alg-geom/9708001 or cs/0002001, etc.
    • ✅ ArXiv Abstract Page URL: https://arxiv.org/abs/1512.03385
    • ✅ ArXiv PDF Page URL: https://arxiv.org/pdf/1512.03385.pdf
    • ✅ ArXiv HTML Page URL: https://arxiv.org/html/2506.15442
  • CVF Open Access (CVPR, ICCV, WACV)
    • ✅ CVF Abstract Page URL: https://openaccess.thecvf.com/content/**/html/**/*.html
    • ✅ CVF PDF Page URL: https://openaccess.thecvf.com/content/**/papers/**/*.pdf
  • ECVA (ECCV)
    • ✅ ECVA Abstract Page URL: https://www.ecva.net/html/**/*.php
    • ❌ ECVA PDF Page URL: https://www.ecva.net/papers/**/*.pdf
  • NeurIPS
    • 🚧 NeurIPS Abstract Page URL
    • 🚧 NeurIPS PDF Page URL
  • OpenReview
    • 🚧 TODO

Frequently used OPTIONS:

  • -v, --verbose (optional): set to verbose, print full details.
  • -d, --download-dir (optional): Specify one-time download directory. This option will override the default download directory or the one specified in the environment variable ARXIV_DOWNLOAD_FOLDER.
  • -n, --n-threads (optional): Specify the number of parallel connections to be used by aria2.

[!TIP] more options are available, run paper -h to see all options.

Use it in your code:

from arxiv_dl import download_paper

download_paper(target="1512.03385", download_dir=".", set_verbose_level="silent")

Configurations

Default Download Destination

  • Without any configurations, all paper will be downloaded to $HOME/Downloads/ArXiv_Papers, where $HOME is current user's home directory.

Set Your Custom Download Destination (Optional)

You may configure your preferred download destination once and for all via an environment variable. This will override the default download destination. To do that, include the following line in your .bashrc or .zshrc file:

export ARXIV_DOWNLOAD_FOLDER="YOUR/PATH/TO/ANY/FOLDER"
  • Every time you use the paper command, the download destination will be set to the following order of priority:
    1. Command-line option -d (highest priority)
    2. Environment variable ARXIV_DOWNLOAD_FOLDER
    3. Default download destination (lowest priority)

Set Custom Command Alias (Optional)

  • You can always set your own preferred alias to rename the command or add more options.
  • Include the following line(s) in your .bashrc or .zshrc file to set your preferred alias:
    alias dp="paper"
    alias dpv="paper -v -d '~/Documents/Papers'"
    

Development

Set up development environment

# create a virtual environment
python3 -m venv venv && source venv/bin/activate

# install dependencies
pip install -U -r requirements.txt

# install the package in editable mode & dev dependencies
pip install -e ".[dev]"

Run Tests

pytest

Build the package

make

Clean cache & build artifacts

make clean

License

This project is licensed under the MIT License.
© Mark H. Huang. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_dl-1.2.2.tar.gz (908.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv_dl-1.2.2-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_dl-1.2.2.tar.gz.

File metadata

  • Download URL: arxiv_dl-1.2.2.tar.gz
  • Upload date:
  • Size: 908.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for arxiv_dl-1.2.2.tar.gz
Algorithm Hash digest
SHA256 3d3293c6b58119e2b24029eb947e89712a35c48321f37502bc3c38b2761d7c8e
MD5 836208bfc352e6106d695e23b6f2b712
BLAKE2b-256 1ee7efe872ddd800ed7e38126c82462ea4454f903ca5dfc72f83ec16f0e78a0b

See more details on using hashes here.

File details

Details for the file arxiv_dl-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: arxiv_dl-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for arxiv_dl-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 943aaa2c5667618f53b48e6e8ee56d5b34a4e913891e15864dc1963c799e8674
MD5 d69ebcb487d27c01cda72f7dae51c13c
BLAKE2b-256 2b062a2c8894904d43e290849468a6a2612f8c8eec1adca197b4eea340271188

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page