简单易用的scrapy插件盒
Project description
简单好用的scrapy插件盒
Download middlewares
代理下载中间件
# 普通代理url PROXY_ADDR = 'your proxy API' DOWNLOADER_MIDDLEWARES = { 'scrapy_box.RandomProxyDownloaderMiddleware': 740, } # splash代理url, the number must under 723 PROXY_ADDR = 'your proxy API' DOWNLOADER_MIDDLEWARES = { 'scrapy_box.SplashProxyDownloaderMiddleware': 160, 'scrapy_box.SplashRetryProxyMiddleware': 162, }
错误重定向重试的下载中间件
# 需要对重定向的url,使用原url进行重试的url片段 ERROR_REDIRECT_URL_SNIPPET = ['redirect', 'retry'] # 注意,数字必须大于600 DOWNLOADER_MIDDLEWARES = { 'scrapy_box.ErrorRedirectMiddleware': 601, }
当响应中存在某些字符串的时候,进行重试的中间件
# 需要判断的字符串,如果响应中包含其中任何一个字符串,将重试 ERROR_RESPONSE_SNIPPET = ['Are your a robot?', '访问过于频繁,请稍后再试'] DOWNLOADER_MIDDLEWARES = { 'scrapy_box.ErrorResponseMiddleware': 540, }
Pipelines
mongo批量插入数据库 pipeline
MONGO_ADDR = ['mongo_cluster_addr1', 'mongo_cluster_addr2'] MONGO_USERNAME = 'root' MONGO_PASSWORD = 'xxx' MONGO_INSERT_SIZE = 100 MONGO_DB = 'xxx' MONGO_COLLECTION = 'xxxx' ITEM_PIPELINES = { 'scrapy_box.MongoBatchInsertPipeline': 300, }
mysql异步插入数据库 pipeline
MYSQL_HOST = '192.168.0.43' MYSQL_USERNAME = 'root' MYSQL_PASSWORD = 'xxx' MYSQL_DB = 'xxx' MYSQL_CHARSET = 'utf8' MYSQL_TABLE = 'xxx' ITEM_PIPELINES = { 'scrapy_box.MysqlInsertPipeline': 300, }
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy_box-1.4.1.tar.gz
(7.9 kB
view details)
File details
Details for the file scrapy_box-1.4.1.tar.gz
.
File metadata
- Download URL: scrapy_box-1.4.1.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e384b2ea598088ab9e790a53c1ce7c32370cd4b63d10b4b93bd880503c37f395 |
|
MD5 | 6783ab3ca44853bebde49b9b3fcf85fd |
|
BLAKE2b-256 | f687c4c8649d40382213ba8320564f439508b14402bcb09dd2c88098d28cdc5c |