feapder是一款支持分布式、批次采集、任务防丢、报警丰富的python爬虫框架
Project description
FEAPDER
简介
feapder 是一款上手简单,功能强大的Python爬虫框架,使用方式类似scrapy,方便由scrapy框架切换过来,框架内置3种爬虫:
-
AirSpider
爬虫比较轻量,学习成本低。面对一些数据量较少,无需断点续爬,无需分布式采集的需求,可采用此爬虫。 -
Spider
是一款基于redis的分布式爬虫,适用于海量数据采集,支持断点续爬、爬虫报警、数据自动入库等功能 -
BatchSpider
是一款分布式批次爬虫,对于需要周期性采集的数据,优先考虑使用本爬虫。
feapder除了支持断点续爬、数据防丢、监控报警外,还支持浏览器渲染下载,自定义入库pipeline,方便对接其他数据库(默认数据库为Mysql,数据可自动入库,无需编写pipeline)
读音: [ˈfiːpdə]
- 官方文档:http://feapder.com
- 国内文档:https://boris-code.gitee.io/feapder
- github:https://github.com/Boris-code/feapder
- 更新日志:https://github.com/Boris-code/feapder/releases
环境要求:
- Python 3.6.0+
- Works on Linux, Windows, macOS
安装
From PyPi:
通用版
pip3 install feapder
完整版:
pip3 install feapder[all]
通用版与完整版区别:
- 完整版支持基于内存去重
完整版可能会安装出错,若安装出错,请参考安装问题
小试一下
创建爬虫
feapder create -s first_spider
创建后的爬虫代码如下:
import feapder
class FirstSpider(feapder.AirSpider):
def start_requests(self):
yield feapder.Request("https://www.baidu.com")
def parse(self, request, response):
print(response)
if __name__ == "__main__":
FirstSpider().start()
直接运行,打印如下:
Thread-2|2021-02-09 14:55:11,373|request.py|get_response|line:283|DEBUG|
-------------- FirstSpider.parse request for ----------------
url = https://www.baidu.com
method = GET
body = {'timeout': 22, 'stream': True, 'verify': False, 'headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36'}}
<Response [200]>
Thread-2|2021-02-09 14:55:11,610|parser_control.py|run|line:415|DEBUG| parser 等待任务 ...
FirstSpider|2021-02-09 14:55:14,620|air_spider.py|run|line:80|INFO| 无任务,爬虫结束
代码解释如下:
- start_requests: 生产任务
- parse: 解析数据
相关文章
学习交流
知识星球:
星球会不定时分享爬虫技术干货,涉及的领域包括但不限于js逆向技巧、爬虫框架刨析、爬虫技术分享等
赞赏
搞个微信赞赏码,不知道屏幕前的你愿不愿意请我喝杯咖啡,让我激动激动🥺
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file feapder-1.5.1.tar.gz
.
File metadata
- Download URL: feapder-1.5.1.tar.gz
- Upload date:
- Size: 123.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12d237b095d6438a7984067032d468b611149448165600432c81827cda03c72e |
|
MD5 | 524ee34953d95e3d38c2951d00fa611c |
|
BLAKE2b-256 | 678a1d32c843293327042857a02512f5877e4ea85c27c09821f34c61be74ffa9 |
File details
Details for the file feapder-1.5.1-py3-none-any.whl
.
File metadata
- Download URL: feapder-1.5.1-py3-none-any.whl
- Upload date:
- Size: 136.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48ff9f4cc9be893ac3d3bdf08d7ca2d177dd952e6e830fd4d6bc04cf234af287 |
|
MD5 | b55f4176c71fa6db70d4e638cb7edb2a |
|
BLAKE2b-256 | b9a7690aa0db988933c9cf0291f4396a8ad4d9a745d068ce764a8c399f3f5298 |