Skip to main content

新增 scrapy pipeline 对 item 目标字段组合除重

Project description

处理若干个pdf水印事项

pip install scrapy-bloomerfilter
  • scrapy 管道中添加配置
ITEM_PIPELINES = {'scrapy_bloomerfiler.bloomerfilerpipeline': 400},
  • scrapy.cfg 添加配置REDIS_HOST / REDIS_PORT / REDIS_DB / REDIS_PASSWORD
测试环境
[redis_cfg_dev]
REDIS_HOST = ***
REDIS_PORT = ***
REDIS_DB = ***
REDIS_PASSWORD= ***

正式环境
[redis_cfg_prod]
REDIS_HOST = ***
REDIS_PORT = ***
REDIS_DB = ***
REDIS_PASSWORD= ***
  • scrapy settings 中添加
IF_PROD 是否为正式环境配置 eg : True
Data_Size 数据体量 eg: 1000*10000/百万级/千万级/千万 
Aim_Set  除重依据字段 Aim_Set <dict>  eg: {"title","all_json"}
  • 环境变量中参数(可选)
IF_PROD 是否为正式环境配置 eg : True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_bloomerfiler-0.1.2.tar.gz (4.7 kB view details)

Uploaded Source

File details

Details for the file scrapy_bloomerfiler-0.1.2.tar.gz.

File metadata

  • Download URL: scrapy_bloomerfiler-0.1.2.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.0

File hashes

Hashes for scrapy_bloomerfiler-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f036b80a5d9cdc0080e552d73d995d8768d7508f13a84c260bf41defa831b9e1
MD5 12571b82b773363849fdbb15e34cebfc
BLAKE2b-256 5e7fd2d3534a9754a6580d1f960105ec1dac0cdcb122c6169630b0dfc3cd598d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page