Skip to main content

feapder是一款支持分布式、批次采集、任务防丢、报警丰富的python爬虫框架

Project description

FEAPDER

简介

feapder 是一款上手简单,功能强大的Python爬虫框架,使用方式类似scrapy,方便由scrapy框架切换过来,框架内置3种爬虫:

  • AirSpider爬虫比较轻量,学习成本低。面对一些数据量较少,无需断点续爬,无需分布式采集的需求,可采用此爬虫。

  • Spider是一款基于redis的分布式爬虫,适用于海量数据采集,支持断点续爬、爬虫报警、数据自动入库等功能

  • BatchSpider是一款分布式批次爬虫,对于需要周期性采集的数据,优先考虑使用本爬虫。

feapder支持断点续爬数据防丢监控报警浏览器渲染下载、数据自动入库MysqlMongo,还可通过编写pipeline对接其他存储

读音: [ˈfiːpdə]

环境要求:

  • Python 3.6.0+
  • Works on Linux, Windows, macOS

安装

From PyPi:

通用版

pip3 install feapder

完整版:

pip3 install feapder[all]

通用版与完整版区别:

  1. 完整版支持基于内存去重

完整版可能会安装出错,若安装出错,请参考安装问题

小试一下

创建爬虫

feapder create -s first_spider

创建后的爬虫代码如下:

import feapder


class FirstSpider(feapder.AirSpider):
    def start_requests(self):
        yield feapder.Request("https://www.baidu.com")

    def parse(self, request, response):
        print(response)


if __name__ == "__main__":
    FirstSpider().start()
        

直接运行,打印如下:

Thread-2|2021-02-09 14:55:11,373|request.py|get_response|line:283|DEBUG|
                -------------- FirstSpider.parse request for ----------------
                url  = https://www.baidu.com
                method = GET
                body = {'timeout': 22, 'stream': True, 'verify': False, 'headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36'}}

<Response [200]>
Thread-2|2021-02-09 14:55:11,610|parser_control.py|run|line:415|DEBUG| parser 等待任务...
FirstSpider|2021-02-09 14:55:14,620|air_spider.py|run|line:80|INFO| 无任务,爬虫结束

代码解释如下:

  1. start_requests: 生产任务
  2. parse: 解析数据

爬虫工具推荐

  1. 验证码识别库:https://github.com/sml2h3/ddddocr
  2. 爬虫在线工具库:http://www.spidertools.cn

微信赞赏

如果您觉得这个项目帮助到了您,您可以帮作者买一杯咖啡表示鼓励 🍹

也可和作者交个朋友,解决您在使用过程中遇到的问题

赞赏码

学习交流

知识星球:17321694 作者微信: boris_tm QQ群号:750614606

加好友备注:feapder

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feapder-1.7.3b1.tar.gz (170.2 kB view details)

Uploaded Source

Built Distribution

feapder-1.7.3b1-py3-none-any.whl (181.2 kB view details)

Uploaded Python 3

File details

Details for the file feapder-1.7.3b1.tar.gz.

File metadata

  • Download URL: feapder-1.7.3b1.tar.gz
  • Upload date:
  • Size: 170.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for feapder-1.7.3b1.tar.gz
Algorithm Hash digest
SHA256 4e39395f48d9ae8f588b4dd0b50ed0a9221f981c68a58e7b6db062158d871a06
MD5 cd1b0ebc8214eb47667a509ff4229449
BLAKE2b-256 c90a2be09bc5733ff09bd006794d1eb23aff61c66a751fdef1bc0e02a1ffa15b

See more details on using hashes here.

Provenance

File details

Details for the file feapder-1.7.3b1-py3-none-any.whl.

File metadata

  • Download URL: feapder-1.7.3b1-py3-none-any.whl
  • Upload date:
  • Size: 181.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for feapder-1.7.3b1-py3-none-any.whl
Algorithm Hash digest
SHA256 04868936d74e31b0948eaadb7c1f6419275e916963acadec5c4c51d763512cec
MD5 1d19e4f120e385ee5887dba5715d7a22
BLAKE2b-256 70c906ace4ad291f9b0e16978b6d5a41859238a27b839925244d70fef3bae377

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page