Asynchronous crawler micro-framework based on python.
Project description
hoopa
简介
hoopa 是一个轻量、快速的异步分布式爬虫框架
- 支持内存、redis 的优先级队列
- 支持 aiohttp、 httpx、requests 等 HTTP 库
- 支持断点续传
兼容同步和异步代码,不习惯异步的,可以使用同步写,但是要注意的是不能在异步方法里面进行阻塞的操作
自用框架,不保证稳定性,请勿用于生产环境
文档地址:https://fishtn.github.io/hoopa/
环境要求:
- Python 3.7.0+
- Works on Linux, Windows, macOS
安装
# For Linux & Mac
pip install -U hoopa[uvloop]
# For Windows
pip install -U hoopa
开始
创建爬虫
hoopa create -s first_spider
然后添加 url:http://httpbin.org/get
import hoopa
class FirstSpider(hoopa.Spider):
name = "first"
start_urls = ["http://httpbin.org/get"]
def parse(self, request, response):
print(response)
if __name__ == "__main__":
FirstSpider.start()
todo
- 监控平台
- 远程部署
- 任务调度
感谢
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hoopa-0.1.18.tar.gz
(39.9 kB
view details)
Built Distribution
hoopa-0.1.18-py3-none-any.whl
(52.7 kB
view details)
File details
Details for the file hoopa-0.1.18.tar.gz
.
File metadata
- Download URL: hoopa-0.1.18.tar.gz
- Upload date:
- Size: 39.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
9f87016f76aff5e2ef4f8dca17a0e0296acf4afc1d1aabd7d2e49cc8b9e747f3
|
|
MD5 |
c82fd0e7e42f93d0efd83adf2a53c627
|
|
BLAKE2b-256 |
4d6e8946524bf90281fa96fb85e3f19c6ab2e1152273528544a7858d2265783d
|
File details
Details for the file hoopa-0.1.18-py3-none-any.whl
.
File metadata
- Download URL: hoopa-0.1.18-py3-none-any.whl
- Upload date:
- Size: 52.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
9c86ea135c98881e3b05fffca851fbceb1819b19a73231ccb93784714ecdf6a5
|
|
MD5 |
e2a9db9dcb22875b9d1abbe99a3a64a7
|
|
BLAKE2b-256 |
d8f46f1eb64e617fc7eb364be87345b25adab181b08fea7ca26d73380bc99fb1
|