sci-dl: help you download SciHub PDF programmatically
Project description
:green_book: sci-dl: help you download SciHub PDF programmatically
Features
- configuration file support.
- search by Google Scholar(coming soon).
- download using DOI.
- custom SciHub mirror url.
- proxy support.
- failure retry.
- captacha support(coming soon).
- a Python library that can be embedded in your program.
Installation
Windows
coming soon.
macOS
coming soon.
Linux
coming soon.
use pip
pip install sci-dl
build using setuptools
git clone https://github.com/soultoolman/sci-dl.git
cd sci-dl
python setup.py install
build using pyinstaller
pip install pyinstaller
git clone https://github.com/soultoolman/sci-dl.git
cd sci-dl
pyinstaller --name sci-dl -i sci-dl.icns --add-data locale:locale sci_dl.py
# for windows
# pyinstaller --name sci-dl -i sci-dl.icns --add-data locale;locale sci_dl.py
Usage
use as command line software
- initialization configuration file
sci-dl init-config
follow the prompt to create the configuration file.
- download using DOI
sci-dl dl -d '10.1016/j.neuron.2012.02.004'
# 10.1016/j.neuron.2012.02.004 is the article DOI you want to download
use as Python library
sci_dl.SciDlError raises when exception happens.
if you don't use proxy
from sci_dl import SciHub, Dl
doi = '10.1016/j.neuron.2012.02.004'
sh = SciHub('https://sci-hub.se')
dl = Dl(5) # 5 is the number of retries when failure
# get matchmaker response
matchmaker_url = sh.get_matchmaker_url('10.1016/j.neuron.2012.02.004')
matchmaker_response = dl.dl(matchmaker_url)
# parse PDF url
pdf_url = sh.parse_pdf_url(matchmaker_response.text)
# get PDF response
pdf_response = dl.dl(pdf_url)
# save PDF content once
with open('xxx', 'wb') as handle:
handle.write(pdf_response.content)
# save chunk by chunk
chunk_size = 1024
with open('xxx', 'wb') as handle:
for chunk in pdf_response.iter_content(chunk_size):
handle.write(chunk)
if you use a proxy
from sci_dl import SciHub, Dl, Proxy
doi = '10.1016/j.neuron.2012.02.004'
sh = SciHub('https://sci-hub.se')
proxy = Proxy(
protocol='socks5',
user=None,
password=None,
host='127.0.0.1',
port=6153
)
dl = Dl(5, proxy=proxy) # 5 is the number of retries when failure
# get matchmaker response
matchmaker_url = sh.get_matchmaker_url('10.1016/j.neuron.2012.02.004')
matchmaker_response = dl.dl(matchmaker_url)
# parse PDF url
pdf_url = sh.parse_pdf_url(matchmaker_response.text)
# get PDF response
pdf_response = dl.dl(pdf_url)
# save PDF content once
with open('xxx', 'wb') as handle:
handle.write(pdf_response.content)
# save chunk by chunk
chunk_size = 1024
with open('xxx', 'wb') as handle:
for chunk in pdf_response.iter_content(chunk_size):
handle.write(chunk)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sci-dl-0.0.2.tar.gz
(36.7 kB
view hashes)
Built Distribution
sci_dl-0.0.2-py3-none-any.whl
(6.4 kB
view hashes)