Skip to main content

Command-line Papers Downloader. Citation extraction and PDF naming automation.

Project description

arXiv-dl

Command-line ArXiv, ECVA & CVF Open Access Paper Downloader. [PyPI] [Source]

Disclaimer: This is a highly-opinionated command-line tool for downloading papers. It priorities ease of use for researchers. Obviously, this is not an official project.

What does it do?

  • Support downloading papers from ArXiv, ECCV, CVPR, ICCV, WACV via simple CLI.
  • Support downloading speedup by using aria2.
  • Retrieve the paper's metadata such as:
    • Title, Abstract, Year
    • Authors
    • Comments (Conference acceptance info)
    • Repository URLs
    • BibTeX Citation
  • Automatically maintain a list of local papers and their metadata in a JSON file.
  • Configure the desired download destination via an environment variable or a command-line argument.
  • All downloaded papers will have standardized filename for easy browsing.

Why?

  • Save time and effort to download and organize papers on your machine.
  • Speedup downloading process by using multiple parallel connections.
  • Local paper list would be handy for quick local lookup, making notes, and doing citations.

How to install it?

This is a command-line tool, simply use pip to install the package globally, then you are good to go!

  • Pre-requisite: Python 3.x
python3 -m pip install --upgrade arxiv-dl

NOTE: After installation, you need to ensure the installation path is included in your PATH variable. If you encounter any difficulty finding / setting the PATH, there is this recommended way of installing stand alone command line tools, kindly follow its instruction when installing arxiv-dl.

Optionally, install aria2c for download speedup.

  • MacOS: brew install aria2
  • Linux: sudo snap install aria2c

How to use it?

After installation, you may use the command paper in your shell to download papers. (You may also use the getpaper or arxiv-dl command, they are all equivalent.)

paper [OPTIONS] TARGET

Usage Examples:

# download a single TARGET
$ paper 1512.03385

# download multiple TARGETs at once
$ paper 1512.03385 2103.15538 2304.04415

Supported types of TARGET:

✅ Supported, 🚧 Not Yet Supported, ❌ Not Supported

  • ArXiv
    • ✅ ArXiv ID: 1512.03385
    • ✅ ArXiv Abstract Page URL: https://arxiv.org/abs/1512.03385
    • ✅ ArXiv PDF Page URL: https://arxiv.org/pdf/1512.03385.pdf
  • CVF Open Access (CVPR, ICCV, WACV)
    • ✅ CVF Abstract Page URL: https://openaccess.thecvf.com/content/**/html/**/*.html
    • ✅ CVF PDF Page URL: https://openaccess.thecvf.com/content/**/papers/**/*.pdf
  • ECVA (ECCV)
    • 🚧 ECVA Abstract Page URL: https://www.ecva.net/html/**/*.php
    • ❌ ECVA PDF Page URL: https://www.ecva.net/papers/**/*.pdf
  • NeurIPS
    • 🚧 NeurIPS Abstract Page URL
    • 🚧 NeurIPS PDF Page URL
  • OpenReview
    • 🚧 TODO

Supported OPTIONS:

  • -v, --verbose (optional): Print paper metadata.
  • -p, --pdf_only (optional): Download PDF only without creating Markdown notes
  • -d, --download_dir (optional): Specify one-time download directory. This option will override the default download directory or the one specified in the environment variable ARXIV_DOWNLOAD_FOLDER.
  • -n, --n_threads (optional): Specify the number of parallel connections to be used by aria2.

Configurations

Default Download Destination

  • Without any configurations, all paper will be downloaded to $HOME/Downloads/ArXiv_Papers.

Set Your Custom Download Destination (Optional)

You may configure your preferred download destination once and for all via an environment variable. This will override the default download destination. To do that, include the following line in your .bashrc or .zshrc file:

export ARXIV_DOWNLOAD_FOLDER="YOUR/PATH/TO/ANY/FOLDER"
  • Every time you use the paper command, the download destination will be set to the following order of priority:
    1. Command-line option -d
    2. Environment variable ARXIV_DOWNLOAD_FOLDER
    3. Default download destination

Set Custom Command Alias (Optional)

  • You can always set your own preferred alias to rename the command or add more options.
  • Include the following line(s) in your .bashrc or .zshrc file to set your preferred alias:
    alias dp="paper"
    alias dpv="paper -v -d '~/Documents/Papers'"
    

Development

Set up development environment

python3 -m venv venv && \
source venv/bin/activate && \
pip install -e ".[dev]"

Run Tests

pytest

Build the package

make

Clean cache & build artifacts

make clean

License

MIT License - Copyright (c) 2021-2024 Mark H. Huang

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_dl-1.1.6.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

arxiv_dl-1.1.6-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_dl-1.1.6.tar.gz.

File metadata

  • Download URL: arxiv_dl-1.1.6.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for arxiv_dl-1.1.6.tar.gz
Algorithm Hash digest
SHA256 12714a638e24f4e14ba7c56dc363fbd2bf23b1039af8c31e149a2fa1d3e6ab6f
MD5 7c36117dcc54c57bee189760521fc1b7
BLAKE2b-256 29316cd33d33a4551d5b4f7f672e5500e9bedb7432797187e13ae21a8ee9e984

See more details on using hashes here.

File details

Details for the file arxiv_dl-1.1.6-py3-none-any.whl.

File metadata

  • Download URL: arxiv_dl-1.1.6-py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for arxiv_dl-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 39a60cbb99324b623a50e8a44dbffba71b1e4d37be3a4a177d6dc4bb4df6c330
MD5 2a95bbf609e3b542344cd3bf72cc9f30
BLAKE2b-256 ba59afb1ac2384fd3303ecc6cdc3059895a8eed8fcb6f4a69b40e80d71fa29f4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page