a tool to push the spider's fail urls in mongodb to redis
Project description
重推失败的 url 回到 redis 的小工具
必须:
1. 爬虫多次重试失败的 url 放入 mongodb
2. mongodb 内失败 url 的 key 为 "url",即 {"url": "www.xxx.com"}
3. 爬虫设置可以使用 redis 的 start_urls
安装:
# 看具体版本,包在 dist 文件夹下
$ pip install pushurls.tar.gz
使用:
# 直接开始:
$ pushurls
# 指定配置文件:
$ pushurls /root/push_fail_urls_set.json
配置文件格式:(建议直接运行,让程序自动生成配置文件,下次就不必再输入配置)
{"from": [
{
"adder_sep": ">>>",
"condition": {},
"db": "test_db",
"from_collection": "test_data",
"fromdb_str": "127.0.0.1.amazon.test_data",
"host": "127.0.0.1",
"password": "123456",
"port": 27017,
"source": "admin",
"url_head": "",
"url_tail": "**-fixed-**test_url",
"user": "root"
}],
"to": [
{
"db": "0",
"host": "127.0.0.1",
"port": 6379,
"spiders": {
"spider_name1": "S1:start_urls",
"spider_name2": "S2:start_urls"
},
"todb_str": "127.0.0.1:6379.0"
}]}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pushurls-0.0.3.tar.gz
(6.6 kB
view details)
Built Distribution
File details
Details for the file pushurls-0.0.3.tar.gz
.
File metadata
- Download URL: pushurls-0.0.3.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e93e475a083f338c64e533d956f92aeb3b3cb6018d5ddab30daf5ffcdadafda6 |
|
MD5 | cce590714e8001a0f07833f8a8e13245 |
|
BLAKE2b-256 | 279bd32f5e758b9dcd8ff3340551e7eb82deb259a6fe02ee8a1d8c9968d43306 |
File details
Details for the file pushurls-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: pushurls-0.0.3-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25b642d017b71a49d476082717ebbfac999808fee1aba23fd55948990b63a221 |
|
MD5 | 85a51bb5cf5eaf257de2b98dd8e96ef2 |
|
BLAKE2b-256 | ba85c5041d67c4f4bad9a7d547cf3b3f41c7efba084d69ae4741c27efbc21932 |