新增 scrapy pipeline 对 item 目标字段组合除重
Project description
处理若干个pdf水印事项
- 项目PYPI地址 https://pypi.org/project/scrapy_bloomerfiler/
- 安装:
pip install scrapy-bloomerfilter
- scrapy 管道中添加配置
ITEM_PIPELINES = {'scrapy_bloomerfiler.bloomerfilerpipeline': 400},
scrapy.cfg
添加配置REDIS_HOST / REDIS_PORT / REDIS_DB / REDIS_PASSWORD
测试环境
[redis_cfg_dev]
REDIS_HOST = ***
REDIS_PORT = ***
REDIS_DB = ***
REDIS_PASSWORD= ***
正式环境
[redis_cfg_prod]
REDIS_HOST = ***
REDIS_PORT = ***
REDIS_DB = ***
REDIS_PASSWORD= ***
- scrapy settings 中添加
IF_PROD 是否为正式环境配置 eg : True
Data_Size 数据体量 eg: 1000*10000/百万级/千万级/千万
Aim_Set 除重依据字段 Aim_Set <dict> eg: {"title","all_json"}
- 环境变量中参数(可选)
IF_PROD 是否为正式环境配置 eg : True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for scrapy_bloomerfiler-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f036b80a5d9cdc0080e552d73d995d8768d7508f13a84c260bf41defa831b9e1 |
|
MD5 | 12571b82b773363849fdbb15e34cebfc |
|
BLAKE2b-256 | 5e7fd2d3534a9754a6580d1f960105ec1dac0cdcb122c6169630b0dfc3cd598d |