Skip to main content

Asynchronous crawler micro-framework based on python.

Project description

hoopa

简介

hoopa 是一个轻量、快速的异步分布式爬虫框架

  • 支持基于内存、redis、rabbitmq的等优先级队列
  • 支持aiohttp、 httpx、requests等HTTP库
  • 支持断点续传

兼容同步和异步代码,不习惯异步的,可以使用同步写,但是要注意的是不能在异步方法里面进行阻塞的操作

项目还在开发测试中,请勿用于生产环境,若发现问题,欢迎提issue

文档地址:https://fishtn.github.io/hoopa/

环境要求:

  • Python 3.7.0+
  • Works on Linux, Windows, macOS

安装

# For Linux & Mac
pip install -U hoopa[uvloop]

# For Windows
pip install -U hoopa

开始

创建爬虫

hoopa create -s first_spider

然后添加url:http://httpbin.org/get

import hoopa


class FirstSpider(hoopa.Spider):
    name = "first"
    start_urls = ["http://httpbin.org/get"]

    def parse(self, request, response):
        print(response)


if __name__ == "__main__":
    FirstSpider.start()

todo

  • <input type="checkbox" disabled="" /> 监控平台
  • <input type="checkbox" disabled="" /> 远程部署
  • <input type="checkbox" disabled="" /> 任务调度

感谢

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for hoopa, version 0.0.10
Filename, size File type Python version Upload date Hashes
Filename, size hoopa-0.0.10-py3-none-any.whl (48.6 kB) File type Wheel Python version py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page