Skip to main content

简单易用的scrapy插件盒

Project description

简单好用的scrapy插件盒

Download middlewares

  • 代理下载中间件

    # 普通代理url
    PROXY_ADDR = 'your proxy API'
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.RandomProxyDownloaderMiddleware': 740,
    }
    # splash代理url, the number must under 723
    PROXY_ADDR = 'your proxy API'
    DOWNLOADER_MIDDLEWARES = {
        'scrapy_box.SplashProxyDownloaderMiddleware': 160,
        'scrapy_box.SplashRetryProxyMiddleware': 162,
    }
  • 错误重定向重试的下载中间件

    # 需要对重定向的url,使用原url进行重试的url片段
    ERROR_REDIRECT_URL_SNIPPET = ['redirect', 'retry']
    # 注意,数字必须大于600
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.ErrorRedirectMiddleware': 601,
    }
  • 当响应中存在某些字符串的时候,进行重试的中间件

    # 需要判断的字符串,如果响应中包含其中任何一个字符串,将重试
    ERROR_RESPONSE_SNIPPET = ['Are your a robot?', '访问过于频繁,请稍后再试']
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.ErrorResponseMiddleware': 540,
    }

Pipelines

  • mongo批量插入数据库 pipeline

    MONGO_ADDR = ['mongo_cluster_addr1', 'mongo_cluster_addr2']
    MONGO_USERNAME = 'root'
    MONGO_PASSWORD = 'xxx'
    MONGO_INSERT_SIZE = 100
    MONGO_DB = 'xxx'
    MONGO_COLLECTION = 'xxxx'
    ITEM_PIPELINES = {
       'scrapy_box.MongoBatchInsertPipeline': 300,
    }
  • mysql异步插入数据库 pipeline

    MYSQL_HOST = '192.168.0.43'
    MYSQL_USERNAME = 'root'
    MYSQL_PASSWORD = 'xxx'
    MYSQL_DB = 'xxx'
    MYSQL_CHARSET = 'utf8'
    MYSQL_TABLE = 'xxx'
    ITEM_PIPELINES = {
       'scrapy_box.MysqlInsertPipeline': 300,
    }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_box-1.4.1.tar.gz (7.9 kB view details)

Uploaded Source

File details

Details for the file scrapy_box-1.4.1.tar.gz.

File metadata

  • Download URL: scrapy_box-1.4.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.4

File hashes

Hashes for scrapy_box-1.4.1.tar.gz
Algorithm Hash digest
SHA256 e384b2ea598088ab9e790a53c1ce7c32370cd4b63d10b4b93bd880503c37f395
MD5 6783ab3ca44853bebde49b9b3fcf85fd
BLAKE2b-256 f687c4c8649d40382213ba8320564f439508b14402bcb09dd2c88098d28cdc5c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page