Asynchronous crawler micro-framework based on python.
Project description
hoopa
简介
hoopa 是一个轻量、快速的异步分布式爬虫框架
- 支持内存、redis的优先级队列
- 支持aiohttp、 httpx、requests等HTTP库
- 支持断点续传
兼容同步和异步代码,不习惯异步的,可以使用同步写,但是要注意的是不能在异步方法里面进行阻塞的操作
项目还在开发测试中,请勿用于生产环境,若发现问题,欢迎提issue
文档地址:https://fishtn.github.io/hoopa/
环境要求:
- Python 3.7.0+
- Works on Linux, Windows, macOS
安装
# For Linux & Mac
pip install -U hoopa[uvloop]
# For Windows
pip install -U hoopa
开始
创建爬虫
hoopa create -s first_spider
然后添加url:http://httpbin.org/get
import hoopa
class FirstSpider(hoopa.Spider):
name = "first"
start_urls = ["http://httpbin.org/get"]
def parse(self, request, response):
print(response)
if __name__ == "__main__":
FirstSpider.start()
todo
- 监控平台
- 远程部署
- 任务调度
感谢
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hoopa-0.1.10.tar.gz
(39.6 kB
view hashes)
Built Distribution
hoopa-0.1.10-py3-none-any.whl
(52.4 kB
view hashes)