简单易用的scrapy插件盒
Project description
SCRAPY_BOX
简单好用的scrapy插件盒
Download middlewares
代理下载中间件 ```python # 普通代理url PROXY_ADDR = ‘your proxy API’ DOWNLOADER_MIDDLEWARES = { ‘scrapy_box.RandomProxyDownloaderMiddleware’: 740, }
splash代理url, the number must under 723
PROXY_ADDR = ‘your proxy API’ DOWNLOADER_MIDDLEWARES = { ‘scrapy_box.SplashProxyDownloaderMiddleware’: 160, ‘scrapy_box.SplashRetryProxyMiddleware’: 162, } ```
错误重定向重试的下载中间件
# 需要对重定向的url,使用原url进行重试的url片段 ERROR_REDIRECT_URL_SNIPPET = ['redirect', 'retry'] # 注意,数字必须大于600 DOWNLOADER_MIDDLEWARES = { 'scrapy_box.ErrorRedirectMiddleware': 601, }
当响应中存在某些字符串的时候,进行重试的中间件
# 需要判断的字符串,如果响应中包含其中任何一个字符串,将重试 ERROR_RESPONSE_SNIPPET = ['Are your a robot?', '访问过于频繁,请稍后再试'] DOWNLOADER_MIDDLEWARES = { 'scrapy_box.ErrorResponseMiddleware': 540, }
Pipelines
mongo批量插入数据库 pipeline ```python MONGO_ADDR = [‘mongo_cluster_addr1’, ‘mongo_cluster_addr2’] MONGO_USERNAME = ‘root’ MONGO_PASSWORD = ‘xxx’ MONGO_INSERT_SIZE = 100 MONGO_DB = ‘xxx’ MONGO_COLLECTION = ‘xxxx’
ITEM_PIPELINES = { ‘scrapy_box.MongoBatchInsertPipeline’: 300, } ```
mysql异步插入数据库 pipeline ```python MYSQL_HOST = ‘192.168.0.43’ MYSQL_USERNAME = ‘root’ MYSQL_PASSWORD = ‘xxx’ MYSQL_DB = ‘xxx’ MYSQL_CHARSET = ‘utf8’ MYSQL_TABLE = ‘xxx’
ITEM_PIPELINES = { ‘scrapy_box.MysqlInsertPipeline’: 300, } ```
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapy_box-1.4.0.tar.gz
.
File metadata
- Download URL: scrapy_box-1.4.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fb701e4e1f9b502ed658e8e86b12c3017dfe6faa99860a45391fbd033fde3f4 |
|
MD5 | 53fdadf18d44113662e55f7d066d3f65 |
|
BLAKE2b-256 | e7d8802777ff7898ab691108633b95aa1268b1be90e1e4fff963a8d4feabed33 |