Skip to main content

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, and SciHub.

Project description

PyPaperBot

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, and SciHub. The tool tries to download papers from different sources such as PDF provided da Scholar, Scholar related links, and Scihub. PyPaerbot is also able to download the bibtex of each paper.

Features

  • Download papers given a query
  • Download papers given paper's DOIs
  • Download papers given a Google Scholar link
  • Generate Bibtex of the downloaded paper
  • Filter downloaded paper by year, journal and citations number

Installation

Use pip to install from pypi:

pip install PyPaperBot

How to use

PyPaperBot arguments:

Arguments Description Type
--query Query to make on Google Scholar or Google Scholar page link string
--doi DOI of the paper to download (this option uses only SciHub to download) string
--doi-file File .txt containing the list of paper's DOIs to download string
--scholar-pages Number of Google Scholar pages to inspect. Each page has a maximum of 10 papers int
--dwn-dir Directory path in which to save the result string
--min-year Minimal publication year of the paper to download int
--max-dwn-year Maximum number of papers to download sorted by year int
--max-dwn-cites Maximum number of papers to download sorted by number of citations int
--journal-filter CSV file path of the journal filter (More info on github) string
--restrict 0:Download only Bibtex - 1:Down load only papers PDF int
-h Shows the help --

Note

You can use only one of the arguments in the following groups

  • --query, --doi-file, and --doi
  • --max-dwn-year and and max-dwn-cites

One of the arguments --scholar-pages, --query , and --file is mandatory The arguments --scholar-pages is mandatory when using *--query * The argument --dwn-dir is mandatory

The argument --journal-filter require the path of a CSV containing a list of journal name paired with a boolean which indicates whether or not to consider that journal (0: don't consider /1: consider) Example

The argument --doi-file require the path of a txt file containing the list of paper's DOIs to download organized with one DOI per line Example

SchiHub access

If access to SciHub is blocked in your country, consider using a free VPN service like ProtonVPN

Example

Download a maximum of 30 papers given a query and starting from 2018:

python -m PyPaperBot --query="Machine learning" --scholar-pages=3  --min-year=2018 --dwn-dir="C:\User\example\papers"

Download a paper given the DOI:

python -m PyPaperBot --doi="10.0086/s41037-711-0132-1" --dwn-dir="C:\User\example\papers"`

Download papers given a file containing the DOIs:

python -m PyPaperBot --doi-file="C:\User\example\papers\file.txt" --dwn-dir="C:\User\example\papers"`

If it doesn't work, try to use py instead of python i.e.

py -m PyPaperBot --doi="10.0086/s41037-711-0132-1" --dwn-dir="C:\User\example\papers"`

Contributions

Feel free to contribute to this project bt proposing any change, fix and enhancement on the dev branch

To do

  • Tests
  • Code documentation
  • General improvements

Disclaimer

This application is for educational purposes only. I do not take responsibility for what you choose to do with this application.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyPaperBot-0.9.5.1.tar.gz (10.8 kB view details)

Uploaded Source

File details

Details for the file PyPaperBot-0.9.5.1.tar.gz.

File metadata

  • Download URL: PyPaperBot-0.9.5.1.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for PyPaperBot-0.9.5.1.tar.gz
Algorithm Hash digest
SHA256 5784525d1e590ddbaecb52bfd3dca1b526e5eca70f86329a0b1871cb39a57768
MD5 8782923aaa449b7863c369e680dbeeee
BLAKE2b-256 08412f2aa9373bd38209c4d88e33aac3f986e69d11461fdb53efb6b0757707b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page