Skip to main content

Command-line arXiv Papers Downloader. Citation extraction and PDF naming automation.

Project description

arXiv-dl

Command-line ArXiv Paper Downloader. [PyPI] [Source]

Disclaimer: This is a highly-opinionated CLI tool for downloading papers. It is designed to be easy to use. Obviously, not an official project.

Features

  • Download papers from arXiv.org via simple command line interface.
  • Support downloading speedup by using aria2c.
  • Automatically maintain a local list of downloaded papers.
  • Retrieve the paper's metadata and citation:
    • Paper Title
    • Authors
    • Abstract
    • Comments (Conference acceptance info)
    • Source Code Links
    • Citation (BibTeX)
  • Configure the desired download destination via environment variables.
  • All downloaded papers will be named by its arXiv ID and paper title without whitespace.

Why?

  • Save time and effort to download, rename, and organize papers.
  • Speedup downloading process by using parallel connections.
  • Local paper list would be handy for quick local lookup, locate, and cite papers.

Install

This is a command-line tool, use pip to install the package globally.

  • Pre-requisite: Python 3.x
python3 -m pip install --upgrade arxiv-dl

(Optional) Install aria2c for download speedup.

  • MacOS: brew install aria2
  • Linux: sudo snap install aria2c

Usage

After installation, the command getpaper should be available in your terminal.

$ getpaper [-h] [-v] [-d DOWNLOAD_DIR] [-n N_THREADS] urls [urls ...]

Options:

  • -v, --verbose (optional): Print paper metadata.
  • -d, --download_dir (optional): Specify one-time download directory. This option will override the default download directory or the one specified in the environment variable ARXIV_DOWNLOAD_FOLDER.
  • -n, --n_threads (optional): Specify the number of parallel connections to be used by aria2.

Example:

# Use ArXiv Paper ID
$ getpaper 1512.03385 2103.15538

# Use ArXiv Abstract Page URL
$ getpaper https://arxiv.org/abs/2103.15538

# Use ArXiv PDF Page URL
$ getpaper https://arxiv.org/pdf/1512.03385.pdf

Configurations

Set Custom Download Destination (Optional)

  • Default Download Destination: ~/Downloads/ArXiv_Papers
  • To set custom download destination, use the environment variable ARXIV_DOWNLOAD_FOLDER. Include the following line in your .bashrc or .zshrc file:
    export ARXIV_DOWNLOAD_FOLDER=~/Documents/Papers
    
  • Precedence:
    1. Command-line option -d
    2. Environment variable ARXIV_DOWNLOAD_FOLDER
    3. Default download destination

Set Custom Command Alias (Optional)

  • You can always set your own preferred alias for the default getpaper command.
  • Include the following line(s) in your .bashrc or .zshrc file to set your preferred alias:
    alias dp="getpaper"
    alias dpv="getpaper -v -d '~/Documents/Papers'"
    

Development

Set up development environment

python3 -m venv venv && \
source venv/bin/activate && \
pip install -e ".[dev]"

Run Tests

pytest

Build the package

make

Clean cache & build artifacts

make clean

TODOs

  • Add support for ara2c.
  • Add support for papers on CVF Open Access.
  • Add support for papers on OpenReview.

License

MIT License - Copyright (c) 2021-2022 Mark Huang

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv-dl-1.1.1.tar.gz (12.3 kB view hashes)

Uploaded Source

Built Distribution

arxiv_dl-1.1.1-py3-none-any.whl (12.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page