Command-line Papers Downloader. Citation extraction and PDF naming automation.
Project description
arXiv-dl
Command-line Paper Downloader for ArXiv
, ECVA
& CVF Open Access
.
Disclaimer: This is a highly-opinionated command-line tool for downloading papers. It priorities ease of use for researchers. Obviously, this is not an ArXiv official project.
What does it do?
- Support downloading papers from ArXiv, ECCV, CVPR, ICCV, WACV via simple CLI.
- Support downloading speedup by using aria2.
- Retrieve the paper's metadata such as:
- Title, Abstract, Year
- Authors
- Comments (Conference acceptance info)
- Repository URLs
BibTeX
Citation
- Automatically maintain a list of local papers and their metadata in a JSON file.
- Configure the desired download destination via an environment variable or a command-line argument.
- All downloaded papers will have standardized filename for easy browsing.
Why?
- Save time and effort to download and organize papers on your machine.
- Speedup downloading process by using multiple parallel connections.
- Local paper list would be handy for quick local lookup, making notes, and doing citations.
How to install it?
This is a command-line tool, simply use pip
to install the package globally, then you are good to go!
- Pre-requisite:
Python 3.x
python3 -m pip install -U arxiv-dl
NOTE: After installation, you need to ensure the installation path is included in your PATH variable. If you encounter any difficulty finding / setting the PATH, there is this recommended way of installing stand alone command line tools, kindly follow its instruction when installing
arxiv-dl
.
Optionally, install aria2c for download speedup.
- MacOS:
brew install aria2
- Linux:
sudo snap install aria2c
How to use it?
After installation, you may use the command paper
in your shell to download papers.
(Legacy commands arxiv-dl
and getpaper
are equivalent to the command paper
.)
paper [OPTIONS] TARGET
Use in your shell:
# download a single TARGET
$ paper 1512.03385
# download multiple TARGETs separated by space
$ paper 2103.15538 2304.04415 https://arxiv.org/abs/1512.03385
Supported types of TARGET:
✅ Supported, 🚧 Not Yet Supported, ❌ Not Supported
- ArXiv
- ✅ ArXiv ID:
1512.03385
- ✅ ArXiv Abstract Page URL:
https://arxiv.org/abs/1512.03385
- ✅ ArXiv PDF Page URL:
https://arxiv.org/pdf/1512.03385.pdf
- ✅ ArXiv ID:
- CVF Open Access (CVPR, ICCV, WACV)
- ✅ CVF Abstract Page URL:
https://openaccess.thecvf.com/content/**/html/**/*.html
- ✅ CVF PDF Page URL:
https://openaccess.thecvf.com/content/**/papers/**/*.pdf
- ✅ CVF Abstract Page URL:
- ECVA (ECCV)
- ✅ ECVA Abstract Page URL:
https://www.ecva.net/html/**/*.php
- ❌ ECVA PDF Page URL:
https://www.ecva.net/papers/**/*.pdf
- ✅ ECVA Abstract Page URL:
- NeurIPS
- 🚧 NeurIPS Abstract Page URL
- 🚧 NeurIPS PDF Page URL
- OpenReview
- 🚧 TODO
Supported OPTIONS:
-v
,--verbose
(optional): Print paper metadata.-p
,--pdf_only
(optional): Download PDF only without creating Markdown notes-d
,--download_dir
(optional): Specify one-time download directory. This option will override the default download directory or the one specified in the environment variableARXIV_DOWNLOAD_FOLDER
.-n
,--n_threads
(optional): Specify the number of parallel connections to be used byaria2
.
Use it in your code:
from arxiv_dl import download_paper
download_paper(target="1512.03385", download_dir=".", verbose=True)
Configurations
Default Download Destination
- Without any configurations, all paper will be downloaded to
$HOME/Downloads/ArXiv_Papers
.
Set Your Custom Download Destination (Optional)
You may configure your preferred download destination once and for all via an environment variable. This will override the default download destination. To do that, include the following line in your .bashrc
or .zshrc
file:
export ARXIV_DOWNLOAD_FOLDER="YOUR/PATH/TO/ANY/FOLDER"
- Every time you use the
paper
command, the download destination will be set to the following order of priority:- Command-line option
-d
- Environment variable
ARXIV_DOWNLOAD_FOLDER
- Default download destination
- Command-line option
Set Custom Command Alias (Optional)
- You can always set your own preferred alias to rename the command or add more options.
- Include the following line(s) in your
.bashrc
or.zshrc
file to set your preferred alias:alias dp="paper" alias dpv="paper -v -d '~/Documents/Papers'"
Development
Set up development environment
# create a virtual environment
python3 -m venv venv && source venv/bin/activate
# install dependencies
pip install -r requirements.txt
# install the package in editable mode & dev dependencies
pip install -e ".[dev]"
Run Tests
pytest
Build the package
make
Clean cache & build artifacts
make clean
License
MIT License - Copyright (c) 2021-2024 Mark H. Huang
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arxiv_dl-1.2.0.tar.gz
.
File metadata
- Download URL: arxiv_dl-1.2.0.tar.gz
- Upload date:
- Size: 401.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
42480ae59ecab634f94f54c2b2d2f50a48d6f6c09bd1b11f7c488bee109b68e4
|
|
MD5 |
65e83169de0d2999f1cab3270fabff9d
|
|
BLAKE2b-256 |
2ca347557b35594ca96a45b9808b2229b246fbfe6b76d6a34eb68f4c92e5fb24
|
File details
Details for the file arxiv_dl-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: arxiv_dl-1.2.0-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
45dc15dac0f16ee99d2800b2732b9c7a54b910caadfd40f1131aed41f4d6d753
|
|
MD5 |
e53b2845aa4e85cb2f6a6fdacc8063d4
|
|
BLAKE2b-256 |
8f8196873bfd440f3720adb4916a6126d4e32a7488242cee0e7f98c04609f04e
|