Skip to main content

简单易用的scrapy插件盒

Project description

简单好用的scrapy插件盒

Download middlewares

  • 代理下载中间件

    # 普通代理url
    PROXY_ADDR = 'your proxy API'
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.RandomProxyDownloaderMiddleware': 740,
    }
    # splash代理url, the number must under 723
    PROXY_ADDR = 'your proxy API'
    DOWNLOADER_MIDDLEWARES = {
        'scrapy_box.SplashProxyDownloaderMiddleware': 160,
        'scrapy_box.SplashRetryProxyMiddleware': 162,
    }
  • 错误重定向重试的下载中间件

    # 需要对重定向的url,使用原url进行重试的url片段
    ERROR_REDIRECT_URL_SNIPPET = ['redirect', 'retry']
    # 注意,数字必须大于600
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.ErrorRedirectMiddleware': 601,
    }
  • 当响应中存在某些字符串的时候,进行重试的中间件

    # 需要判断的字符串,如果响应中包含其中任何一个字符串,将重试
    ERROR_RESPONSE_SNIPPET = ['Are your a robot?', '访问过于频繁,请稍后再试']
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.ErrorResponseMiddleware': 540,
    }

Pipelines

  • mongo批量插入数据库 pipeline

    MONGO_ADDR = ['mongo_cluster_addr1', 'mongo_cluster_addr2']
    MONGO_USERNAME = 'root'
    MONGO_PASSWORD = 'xxx'
    MONGO_INSERT_SIZE = 100
    MONGO_DB = 'xxx'
    MONGO_COLLECTION = 'xxxx'
    ITEM_PIPELINES = {
       'scrapy_box.MongoBatchInsertPipeline': 300,
    }
  • mysql异步插入数据库 pipeline

    MYSQL_HOST = '192.168.0.43'
    MYSQL_USERNAME = 'root'
    MYSQL_PASSWORD = 'xxx'
    MYSQL_DB = 'xxx'
    MYSQL_CHARSET = 'utf8'
    MYSQL_TABLE = 'xxx'
    ITEM_PIPELINES = {
       'scrapy_box.MysqlInsertPipeline': 300,
    }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_box-1.4.2.tar.gz (7.9 kB view details)

Uploaded Source

File details

Details for the file scrapy_box-1.4.2.tar.gz.

File metadata

  • Download URL: scrapy_box-1.4.2.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.4

File hashes

Hashes for scrapy_box-1.4.2.tar.gz
Algorithm Hash digest
SHA256 e87b5a89dee62a07640263d61f723048cc33fc892a67c8d29aacf5cc8e4ddbb2
MD5 d60fd8d4fb707b26780113497bed991b
BLAKE2b-256 564fcf90782c890cd420f825281dfdef533d9e6c6bff8c7092523f17a8754f89

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page