Skip to main content

An out-of-the-box lightweight asynchronous crawler framework

Project description

Smaller 一个开箱即用的爬虫框架

简介

Smaller 是一个开箱即用的轻量爬虫框架

github地址 : https://github.com/Ntrashh/smallder

环境要求

  • Python 3.7.0+
  • Works on Linux, Windows, macOS

安装

pip3 install smallder

使用

创建爬虫

smallder create -s demo_spider
from typing import Any
from smallder import Spider, Request, Response

class Demo(Spider):
    name = "demo"
    fastapi = True  # 控制内部统计api的数据
    redis_task_key = ""  # 任务池key如果存在值,则直接从redis中去任务,需要重写make_request_for_redis
    start_urls = []
    max_retry: int = 10  # 重试次数
    # thread_count = 0       # 线程总数 默认为cpu核心数两倍线程
    # batch_size = 0         # 批次从redis中获取多少数据 不使用redis不需要次参数
    # pipline_mode = "list"  # 两种模式 single代表单条入库,list代表多条入库 默认为single
    # pipline_batch = 100    # 只有在pipline_mode=list时生效,代表多少条item进入pipline,默认100
    # save_failed_request = False  # 保存错误请求到redis,不使用redis可不用开启
    custom_settings = {
        # "middleware_settings": {}, # 设置中间件
        # "mysql": "",  # "mysql://xxx:xxxxx@host:port/db_name"
        # "redis": "" # "redis://xxx:xxxxx@host:port/db_name"
    }

    # def __init__(self, param):
    #     self.param = param

    def parse(self, response: Response) -> Any:
        self.log.info(response)

    def download_middleware(self, request: Request) -> Request:
        request.headers = {
            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                          "Chrome/108.0.0.0 Safari/537.36"
        }
        return request


if __name__ == "__main__":
    Demo.start()
    # Demo.start(param="param传递")

如果你在使用过程中对smallder有任何问题或建议可以联系我

微信:

wechat

邮箱: yinghui0214@163.com

PyCharm logo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smallder-0.0.1.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smallder-0.0.1-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file smallder-0.0.1.tar.gz.

File metadata

  • Download URL: smallder-0.0.1.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.10

File hashes

Hashes for smallder-0.0.1.tar.gz
Algorithm Hash digest
SHA256 89ca099d8404ef50b6d7f3757cf77ccf0d8bcb1b303a4cc729d8c5836a4f2074
MD5 25fabaf66ffeebbd300e34243a9eba86
BLAKE2b-256 80a9ebcc2ca776593fc86452c7a34c3c89b794870d78e1beacfea9a029c8924e

See more details on using hashes here.

File details

Details for the file smallder-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: smallder-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 28.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.10

File hashes

Hashes for smallder-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d9c0278836f3932a480d459c929cdd79ac8b1d40346afd95b46b23ddc2925ba8
MD5 3de4df65bbb8fdd0e456e9b5b22445ec
BLAKE2b-256 652e0326f327d5c05b93f45cb24ff639494c820b21d5413ba64e546f7b3b0c20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page