新增 scrapy pipeline 对 item 目标字段组合除重
Project description
处理若干个pdf水印事项
- 项目PYPI地址 https://pypi.org/project/scrapy_bloomerfiler/
- 安装:
pip install scrapy-bloomerfilter
- scrapy 管道中添加配置
ITEM_PIPELINES = {'scrapy_bloomerfiler.bloomerfilerpipeline': 400},
scrapy.cfg
添加配置REDIS_HOST / REDIS_PORT / REDIS_DB / REDIS_PASSWORD
测试环境
[redis_cfg_dev]
REDIS_HOST = ***
REDIS_PORT = ***
REDIS_DB = ***
REDIS_PASSWORD= ***
正式环境
[redis_cfg_prod]
REDIS_HOST = ***
REDIS_PORT = ***
REDIS_DB = ***
REDIS_PASSWORD= ***
- scrapy settings 中添加
IF_PROD 是否为正式环境配置 eg : True
Data_Size 数据体量 eg: 1000*10000/百万级/千万级/千万
Aim_Set 除重依据字段 Aim_Set <dict> eg: {"title","all_json"}
- 环境变量中参数(可选)
IF_PROD 是否为正式环境配置 eg : True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapy_bloomerfiler-0.1.2.tar.gz
.
File metadata
- Download URL: scrapy_bloomerfiler-0.1.2.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f036b80a5d9cdc0080e552d73d995d8768d7508f13a84c260bf41defa831b9e1 |
|
MD5 | 12571b82b773363849fdbb15e34cebfc |
|
BLAKE2b-256 | 5e7fd2d3534a9754a6580d1f960105ec1dac0cdcb122c6169630b0dfc3cd598d |