book118文档下载器

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

文档下载器

可用于下载book118的PDF文档

思路

爬虫爬取图片链接
下载图片
将图片拼合成pdf文件

参数说明

参数	解释	必备参数
`-h`、`--help`	显示帮助	❌
`-u`、`--url`	要下载的文件的网页地址	✔
`-o`、`--output`	文件保存名，默认是文档的标题.pdf	❌
`-p`、`--proxy`	设置要使用的代理地址（默认使用环境变量中`HTTP_PROXY`和`HTTPS_PROXY`设置的值），可以使用`-p ''`强制设置不走代理	❌
`-f`、`--force`	强制重新下载，不使用缓存	❌
`-t`、`--thread`	要使用的线程数，如不指定默认是10	❌
`-s`、`--safe`	如果被服务器拒绝可以打开此选项，将强制单线程，并增加请求和下载的间隔时间	❌

使用模块

使用已上传到 PyPI 的包

python3 -m pip install documentDownloader

安装完成后即可直接使用 documentDownloader 命令

如：documentDownloader -u https://max.book118.com/html/2020/0109/5301014320002213.shtm -o '单身人群专题研究报告-2019' -p http://127.0.0.1:1080 -f -t 20

直接使用源码中的 main.py

克隆该项目，或在releases页面选择版本下载

安装Python3
安装依赖模块(Pillow、reportlab、requests) python -m pip install -r requirements.txt
使用 python3 main.py 执行

如：python main.py -u https://max.book118.com/html/2020/0109/5301014320002213.shtm -o '单身人群专题研究报告-2019' -p http://127.0.0.1:1080 -f -t 20

仅供学习爬虫及相关知识，请支持正版图书
虽然book118上的好多pdf也是盗版吧

贡献列表

更新

2019-01-29: Book118网站更新,更改对应部分代码. @JodeZer
2020-01-09: 重构代码，增加多线程下载加速，允许使用代理，允许通过已有缓存直接建立pdf，自动识别图片大小生成pdf @OhYee
2020-05-25: 发布到 PyPI
2021-10-18: Book118网站更新，更改部分代码；设置默认导出pdf的文件名为文档标题；对无法免费预览全文的文档增加提示；调整请求间隔为2秒(实测请求间隔小于2秒很可能会返回空地址)；增加"慢速下载"选项，防止下载过快被服务器拒绝。@alxt17

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.1.0

Oct 19, 2021

1.0.0

May 25, 2020

0.0.2

May 25, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

documentDownloader-1.1.0.tar.gz (8.0 kB view details)

Uploaded Oct 19, 2021 Source

File details

Details for the file documentDownloader-1.1.0.tar.gz.

File metadata

Download URL: documentDownloader-1.1.0.tar.gz
Upload date: Oct 19, 2021
Size: 8.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for documentDownloader-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ff014fa13f54ee81f0f9cbe23f03e1093787bc34727dc30b91ed5aeccde0370d`
MD5	`f236dd4dba5eb2aacf381e8bfade6ea6`
BLAKE2b-256	`1338ba1e084627eeeb1b13845ee1f93e7b31abd831713abbcdbe348ee4cb8ed1`

See more details on using hashes here.

documentDownloader 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

文档下载器

思路

参数说明

使用模块

使用已上传到 PyPI 的包

直接使用源码中的 main.py

贡献列表

更新

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes