Asynchronous crawler micro-framework based on python.
Project description
hoopa
简介
hoopa 是一个轻量、快速的异步分布式爬虫框架
- 支持内存、redis的优先级队列
- 支持aiohttp、 httpx、requests等HTTP库
- 支持断点续传
兼容同步和异步代码,不习惯异步的,可以使用同步写,但是要注意的是不能在异步方法里面进行阻塞的操作
项目还在开发测试中,请勿用于生产环境,若发现问题,欢迎提issue
文档地址:https://fishtn.github.io/hoopa/
环境要求:
- Python 3.7.0+
- Works on Linux, Windows, macOS
安装
# For Linux & Mac
pip install -U hoopa[uvloop]
# For Windows
pip install -U hoopa
开始
创建爬虫
hoopa create -s first_spider
然后添加url:http://httpbin.org/get
import hoopa
class FirstSpider(hoopa.Spider):
name = "first"
start_urls = ["http://httpbin.org/get"]
def parse(self, request, response):
print(response)
if __name__ == "__main__":
FirstSpider.start()
todo
- 监控平台
- 远程部署
- 任务调度
感谢
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hoopa-0.1.10.tar.gz
(39.6 kB
view details)
Built Distribution
hoopa-0.1.10-py3-none-any.whl
(52.4 kB
view details)
File details
Details for the file hoopa-0.1.10.tar.gz
.
File metadata
- Download URL: hoopa-0.1.10.tar.gz
- Upload date:
- Size: 39.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 125ec7d570d2c346f63ebe2d1b97f73d3d1ae8733dbd3da0ea010489fb39bda7 |
|
MD5 | 07dac29fee83e16dd98300254f548457 |
|
BLAKE2b-256 | dacccb00959b1ab3a1a11969367f4098f3169ba5ec6c63ac27fe643ed716dfa6 |
File details
Details for the file hoopa-0.1.10-py3-none-any.whl
.
File metadata
- Download URL: hoopa-0.1.10-py3-none-any.whl
- Upload date:
- Size: 52.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 600e99ddf508f12f7de37994c10b28324fe7770037492803c2f7558cbdb15f5e |
|
MD5 | e987d8c1becb28d0e229f77455263ff6 |
|
BLAKE2b-256 | 4f9590473cf280455e28ece75fb85e1c00465d2d147a2c3da92ff67fd26dda07 |