Asynchronous crawler micro-framework based on python.
Project description
hoopa
简介
hoopa 是一个轻量、快速的异步分布式爬虫框架
- 支持基于内存、redis、rabbitmq的优先级队列
- 支持aiohttp、 httpx、requests等HTTP 库
- 支持断点续传
项目还在开发测试中,请勿用于生产环境,若发现问题,欢迎提issue
文档地址:https://fishtn.github.io/hoopa/
环境要求:
- Python 3.7.0+
- Works on Linux, Windows, macOS
安装
# For Linux & Mac
pip install -U hoopa[uvloop]
# For Windows
pip install -U hoopa
开始
创建爬虫
hoopa create -s first_spider
然后添加url:http://httpbin.org/get
import hoopa
class FirstSpider(hoopa.Spider):
name = "first"
start_urls = ["http://httpbin.org/get"]
def parse(self, request, response):
print(response)
if __name__ == "__main__":
FirstSpider.start()
todo
- 监控平台
- 远程部署
- 任务调度
感谢
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
hoopa-0.0.7-py3-none-any.whl
(48.3 kB
view hashes)