Skip to main content

简单易用的scrapy插件盒

Project description

简单好用的scrapy插件盒

Download middlewares

  • 代理下载中间件

    # 普通代理url
    PROXY_ADDR = 'your proxy API'
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.RandomProxyDownloaderMiddleware': 740,
    }
    # splash代理url, the number must under 723
    PROXY_ADDR = 'your proxy API'
    DOWNLOADER_MIDDLEWARES = {
        'scrapy_box.SplashProxyDownloaderMiddleware': 160,
        'scrapy_box.SplashRetryProxyMiddleware': 162,
    }
  • 错误重定向重试的下载中间件

    # 需要对重定向的url,使用原url进行重试的url片段
    ERROR_REDIRECT_URL_SNIPPET = ['redirect', 'retry']
    # 注意,数字必须大于600
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.ErrorRedirectMiddleware': 601,
    }
  • 当响应中存在某些字符串的时候,进行重试的中间件

    # 需要判断的字符串,如果响应中包含其中任何一个字符串,将重试
    ERROR_RESPONSE_SNIPPET = ['Are your a robot?', '访问过于频繁,请稍后再试']
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.ErrorResponseMiddleware': 540,
    }

Pipelines

  • mongo批量插入数据库 pipeline

    MONGO_ADDR = ['mongo_cluster_addr1', 'mongo_cluster_addr2']
    MONGO_USERNAME = 'root'
    MONGO_PASSWORD = 'xxx'
    MONGO_INSERT_SIZE = 100
    MONGO_DB = 'xxx'
    MONGO_COLLECTION = 'xxxx'
    ITEM_PIPELINES = {
       'scrapy_box.MongoBatchInsertPipeline': 300,
    }
  • mysql异步插入数据库 pipeline

    MYSQL_HOST = '192.168.0.43'
    MYSQL_USERNAME = 'root'
    MYSQL_PASSWORD = 'xxx'
    MYSQL_DB = 'xxx'
    MYSQL_CHARSET = 'utf8'
    MYSQL_TABLE = 'xxx'
    ITEM_PIPELINES = {
       'scrapy_box.MysqlInsertPipeline': 300,
    }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_box-1.4.3.tar.gz (7.9 kB view details)

Uploaded Source

File details

Details for the file scrapy_box-1.4.3.tar.gz.

File metadata

  • Download URL: scrapy_box-1.4.3.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.4

File hashes

Hashes for scrapy_box-1.4.3.tar.gz
Algorithm Hash digest
SHA256 982757629c66e362522141b60c10644f3a5b051e8e567e51202ac78cb0a22c70
MD5 4980cef160b22348db642d58c74911e1
BLAKE2b-256 f8c13a0030f378927bf84f2c6ad13fff4716d89f3ea1c870795d18d1ae77a5ef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page