Skip to main content

Asynchronous crawler micro-framework based on python.

Project description

hoopa

简介

hoopa 是一个轻量、快速的异步分布式爬虫框架

  • 支持基于内存、redis、rabbitmq的优先级队列
  • 支持aiohttp、httpx
  • 支持断点续传

项目还在开发测试中,请勿用于生产环境

文档地址:https://fishtn.github.io/hoopa/

环境要求:

  • Python 3.7.0+
  • Works on Linux, Windows, macOS

安装

# For Linux & Mac
pip install -U hoopa[uvloop]

# For Windows
pip install -U hoopa

开始

创建爬虫

hoopa create -s first_spider

然后添加url:http://httpbin.org/get

import hoopa


class FirstSpider(hoopa.Spider):
    name = "first"
    start_urls = ["http://httpbin.org/get"]

    async def parse(self, request, response):
        print(response)


if __name__ == "__main__":
    FirstSpider().start()

感谢

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

hoopa-0.0.5-py3-none-any.whl (45.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page