新增 scrapy pipeline 对 item 目标字段组合除重
Project description
处理若干个pdf水印事项
- 项目PYPI地址 https://pypi.org/project/scrapy_bloomerfiler/
- 安装:
pip install scrapy-bloomerfilter
- scrapy 管道中添加配置
ITEM_PIPELINES = {'scrapy_bloomerfiler.bloomerfilerpipeline': 400},
scrapy.cfg
添加配置REDIS_HOST / REDIS_PORT / REDIS_DB / REDIS_PASSWORD
测试环境
[redis_cfg_dev]
REDIS_HOST = ***
REDIS_PORT = ***
REDIS_DB = ***
REDIS_PASSWORD= ***
正式环境
[redis_cfg_prod]
REDIS_HOST = ***
REDIS_PORT = ***
REDIS_DB = ***
REDIS_PASSWORD= ***
- scrapy settings 中添加
IF_PROD 是否为正式环境配置 eg : True
Data_Size 数据体量 eg: 1000*10000/百万级/千万级/千万
Aim_Set 除重依据字段 Aim_Set <dict> eg: {"title","all_json"}
- 环境变量中参数(可选)
IF_PROD 是否为正式环境配置 eg : True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for scrapy_bloomerfiler-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a30b4014c966821f3a2fc19e3175e5c5c620a3bba02831a6ff50af412f8e5eda |
|
MD5 | 112561aa53bd8b22a0a8c1ade5285d85 |
|
BLAKE2b-256 | b1f36346bdb77ab1bb1ee6ac069a48f410accdc51f138e685cfff5a2eda24ad2 |