PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, and SciHub.
Project description
PyPaperBot
PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, and SciHub. The tool tries to download papers from different sources such as PDF provided by Scholar, Scholar related links, and Scihub. PyPaerbot is also able to download the bibtex of each paper.
Features
- Download papers given a query
- Download papers given paper's DOIs
- Download papers given a Google Scholar link
- Generate Bibtex of the downloaded paper
- Filter downloaded paper by year, journal and citations number
Installation
Use pip
to install from pypi:
pip install PyPaperBot
How to use
PyPaperBot arguments:
Arguments | Description | Type |
---|---|---|
--query | Query to make on Google Scholar or Google Scholar page link | string |
--doi | DOI of the paper to download (this option uses only SciHub to download) | string |
--doi-file | File .txt containing the list of paper's DOIs to download | string |
--scholar-pages | Number of Google Scholar pages to inspect. Each page has a maximum of 10 papers | int |
--dwn-dir | Directory path in which to save the result | string |
--min-year | Minimal publication year of the paper to download | int |
--max-dwn-year | Maximum number of papers to download sorted by year | int |
--max-dwn-cites | Maximum number of papers to download sorted by number of citations | int |
--journal-filter | CSV file path of the journal filter (More info on github) | string |
--restrict | 0:Download only Bibtex - 1:Down load only papers PDF | int |
-h | Shows the help | -- |
Note
You can use only one of the arguments in the following groups
- --query, --doi-file, and --doi
- --max-dwn-year and and max-dwn-cites
One of the arguments --scholar-pages, --query , and --file is mandatory The arguments --scholar-pages is mandatory when using *--query * The argument --dwn-dir is mandatory
The argument --journal-filter require the path of a CSV containing a list of journal name paired with a boolean which indicates whether or not to consider that journal (0: don't consider /1: consider) Example
The argument --doi-file require the path of a txt file containing the list of paper's DOIs to download organized with one DOI per line Example
SchiHub access
If access to SciHub is blocked in your country, consider using a free VPN service like ProtonVPN
Example
Download a maximum of 30 papers given a query and starting from 2018:
python -m PyPaperBot --query="Machine learning" --scholar-pages=3 --min-year=2018 --dwn-dir="C:\User\example\papers"
Download a paper given the DOI:
python -m PyPaperBot --doi="10.0086/s41037-711-0132-1" --dwn-dir="C:\User\example\papers"`
Download papers given a file containing the DOIs:
python -m PyPaperBot --doi-file="C:\User\example\papers\file.txt" --dwn-dir="C:\User\example\papers"`
If it doesn't work, try to use py instead of python i.e.
py -m PyPaperBot --doi="10.0086/s41037-711-0132-1" --dwn-dir="C:\User\example\papers"`
Contributions
Feel free to contribute to this project by proposing any change, fix, and enhancement on the dev branch
To do
- Tests
- Code documentation
- General improvements
Disclaimer
This application is for educational purposes only. I do not take responsibility for what you choose to do with this application.
Donation
If you like this project, you can give me a cup of coffee :)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file PyPaperBot-1.0.tar.gz
.
File metadata
- Download URL: PyPaperBot-1.0.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35058f4ae01cdfab10e162820df23a3cbb378010bdedc9b581eb67343d1948ba |
|
MD5 | 1f5ab11a541aab28b8dc83faf827551a |
|
BLAKE2b-256 | c7e44656df4c1628f5feaa5f6e912a2cf42d6890860e1da3c899461bf413f5e1 |