Skip to main content

简单易用的scrapy插件盒

Project description

SCRAPY_BOX

简单好用的scrapy插件盒

Download middlewares

  • 代理下载中间件 ```python # 普通代理url PROXY_ADDR = ‘your proxy API’ DOWNLOADER_MIDDLEWARES = { ‘scrapy_box.RandomProxyDownloaderMiddleware’: 740, }

splash代理url, the number must under 723

PROXY_ADDR = ‘your proxy API’ DOWNLOADER_MIDDLEWARES = { ‘scrapy_box.SplashProxyDownloaderMiddleware’: 160, ‘scrapy_box.SplashRetryProxyMiddleware’: 162, } ```

  • 错误重定向重试的下载中间件

    # 需要对重定向的url,使用原url进行重试的url片段
    ERROR_REDIRECT_URL_SNIPPET = ['redirect', 'retry']
    # 注意,数字必须大于600
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.ErrorRedirectMiddleware': 601,
    }
  • 当响应中存在某些字符串的时候,进行重试的中间件

    # 需要判断的字符串,如果响应中包含其中任何一个字符串,将重试
    ERROR_RESPONSE_SNIPPET = ['Are your a robot?', '访问过于频繁,请稍后再试']
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_box.ErrorResponseMiddleware': 540,
    }

Pipelines

  • mongo批量插入数据库 pipeline ```python MONGO_ADDR = [‘mongo_cluster_addr1’, ‘mongo_cluster_addr2’] MONGO_USERNAME = ‘root’ MONGO_PASSWORD = ‘xxx’ MONGO_INSERT_SIZE = 100 MONGO_DB = ‘xxx’ MONGO_COLLECTION = ‘xxxx’

ITEM_PIPELINES = { ‘scrapy_box.MongoBatchInsertPipeline’: 300, } ```

  • mysql异步插入数据库 pipeline ```python MYSQL_HOST = ‘192.168.0.43’ MYSQL_USERNAME = ‘root’ MYSQL_PASSWORD = ‘xxx’ MYSQL_DB = ‘xxx’ MYSQL_CHARSET = ‘utf8’ MYSQL_TABLE = ‘xxx’

ITEM_PIPELINES = { ‘scrapy_box.MysqlInsertPipeline’: 300, } ```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_box-1.4.0.tar.gz (8.0 kB view details)

Uploaded Source

File details

Details for the file scrapy_box-1.4.0.tar.gz.

File metadata

  • Download URL: scrapy_box-1.4.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.4

File hashes

Hashes for scrapy_box-1.4.0.tar.gz
Algorithm Hash digest
SHA256 6fb701e4e1f9b502ed658e8e86b12c3017dfe6faa99860a45391fbd033fde3f4
MD5 53fdadf18d44113662e55f7d066d3f65
BLAKE2b-256 e7d8802777ff7898ab691108633b95aa1268b1be90e1e4fff963a8d4feabed33

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page