Redis Cluster for Scrapy.
Project description
scrapy-redis 集群版
本项目基于原项目 scrapy-redis
进行修改,修改内容如下:
- 添加了
Redis
哨兵连接支持 - 添加了
Redis
集群连接支持 - 添加了
Bloomfilter
去重
安装
pip install scrapy-redis-sentinel --user
配置示例
原版本 scrapy-redis 的所有配置都支持, 优先级:哨兵模式 > 集群模式 > 单机模式
# ----------------------------------------Bloomfilter 配置-------------------------------------
# 使用的哈希函数数,默认为 6
BLOOMFILTER_HASH_NUMBER = 6
# Bloomfilter 使用的 Redis 内存位,30 表示 2 ^ 30 = 128MB,默认为 30 (2 ^ 22 = 1MB 可去重 130W URL)
BLOOMFILTER_BIT = 30
# 是否开启去重调试模式 默认为 False 关闭
DUPEFILTER_DEBUG = False
# ----------------------------------------Redis 单机模式-------------------------------------
# Redis 单机地址
REDIS_HOST = "172.25.2.25"
REDIS_PORT = 6379
# REDIS 单机模式配置参数
REDIS_PARAMS = {
"password": "password",
"db": 0
}
# ----------------------------------------Redis 哨兵模式-------------------------------------
# Redis 哨兵地址
REDIS_SENTINELS = [
('172.25.2.25', 26379),
('172.25.2.26', 26379),
('172.25.2.27', 26379)
]
# REDIS_SENTINEL_PARAMS 哨兵模式配置参数。
REDIS_SENTINEL_PARAMS = {
"service_name": "mymaster",
"password": "password",
"db": 0
}
# ----------------------------------------Redis 集群模式-------------------------------------
# Redis 集群地址
REDIS_STARTUP_NODES = [
{"host": "172.25.2.25", "port": "6379"},
{"host": "172.25.2.26", "port": "6379"},
{"host": "172.25.2.27", "port": "6379"},
]
# REDIS_CLUSTER_PARAMS 集群模式配置参数
REDIS_CLUSTER_PARAMS = {
"password": "password"
}
# ----------------------------------------Scrapy 其他参数-------------------------------------
# 在 redis 中保持 scrapy-redis 用到的各个队列,从而允许暂停和暂停后恢复,也就是不清理 redis queues
SCHEDULER_PERSIST = True
# 调度队列
SCHEDULER = "scrapy_redis_sentinel.scheduler.Scheduler"
# 基础去重
DUPEFILTER_CLASS = "scrapy_redis_sentinel.dupefilter.RedisDupeFilter"
# BloomFilter
# DUPEFILTER_CLASS = "scrapy_redis_sentinel.dupefilter.RedisBloomFilter"
# 启用基于 Redis 统计信息
STATS_CLASS = "scrapy_redis_sentinel.stats.RedisStatsCollector"
# 指定排序爬取地址时使用的队列
# 默认的 按优先级排序( Scrapy 默认),由 sorted set 实现的一种非 FIFO、LIFO 方式。
# SCHEDULER_QUEUE_CLASS = 'scrapy_redis_sentinel.queue.SpiderPriorityQueue'
# 可选的 按先进先出排序(FIFO)
# SCHEDULER_QUEUE_CLASS = 'scrapy_redis_sentinel.queue.SpiderStack'
# 可选的 按后进先出排序(LIFO)
# SCHEDULER_QUEUE_CLASS = 'scrapy_redis_sentinel.queue.SpiderStack'
注:当使用集群时单机不生效
spiders 使用
修改 RedisSpider 引入方式
原版本 scrapy-redis
使用方式
from scrapy_redis.spiders import RedisSpider
class Spider(RedisSpider):
...
scrapy-redis-sentinel
使用方式
from scrapy_redis_sentinel.spiders import RedisSpider
class Spider(RedisSpider):
...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapy-redis-sentinel-0.7.2.tar.gz
.
File metadata
- Download URL: scrapy-redis-sentinel-0.7.2.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02c12eae22777b5c622f57440dae95d1342c56c5cf65bbc9693a99be9ed58b65 |
|
MD5 | c4a761423260b8f8ee1a3716a4ab3086 |
|
BLAKE2b-256 | 6718d4a8b495982d36679a16591cb0b9c38eab45adad579a2ca02a19a09fda6d |
File details
Details for the file scrapy_redis_sentinel-0.7.2-py2.py3-none-any.whl
.
File metadata
- Download URL: scrapy_redis_sentinel-0.7.2-py2.py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 394507acc94c23be45d95c6206743014122683f6ace3832069104134941e3a4d |
|
MD5 | 3018d3bdee71b29e6e2511fc6785230f |
|
BLAKE2b-256 | 997bf90943c6c0827ade2b9de8a625a0930b954e912264558026829fb9c50bbe |