Skip to main content

a tool to push the spider's fail urls in mongodb to redis

Project description

重推失败的 url 回到 redis 的小工具


必须:

1. 爬虫多次重试失败的 url 放入 mongodb

2. mongodb 内失败 url 的 key 为 "url",即 {"url": "www.xxx.com"}

3. 爬虫设置可以使用 redis 的 start_urls


安装:

# 看具体版本,包在 dist 文件夹下
$ pip install pushurls.tar.gz

使用:

# 直接开始:
$ pushurls

# 指定配置文件:
$ pushurls /root/push_fail_urls_set.json

配置文件格式:(建议直接运行,让程序自动生成配置文件,下次就不必再输入配置)

{"from": [
  {
    "adder_sep": ">>>",
    "condition": {},
    "db": "test_db",
    "from_collection": "test_data",
    "fromdb_str": "127.0.0.1.amazon.test_data",
    "host": "127.0.0.1",
    "password": "123456",
    "port": 27017,
    "source": "admin",
    "url_head": "",
    "url_tail": "**-fixed-**test_url",
    "user": "root"
  }],
  "to": [
  {
    "db": "0",
    "host": "127.0.0.1",
    "port": 6379,
    "spiders": {
      "spider_name1": "S1:start_urls",
      "spider_name2": "S2:start_urls"
    },
    "todb_str": "127.0.0.1:6379.0"
  }]}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pushurls-0.0.3.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

pushurls-0.0.3-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file pushurls-0.0.3.tar.gz.

File metadata

  • Download URL: pushurls-0.0.3.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9

File hashes

Hashes for pushurls-0.0.3.tar.gz
Algorithm Hash digest
SHA256 e93e475a083f338c64e533d956f92aeb3b3cb6018d5ddab30daf5ffcdadafda6
MD5 cce590714e8001a0f07833f8a8e13245
BLAKE2b-256 279bd32f5e758b9dcd8ff3340551e7eb82deb259a6fe02ee8a1d8c9968d43306

See more details on using hashes here.

File details

Details for the file pushurls-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: pushurls-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9

File hashes

Hashes for pushurls-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 25b642d017b71a49d476082717ebbfac999808fee1aba23fd55948990b63a221
MD5 85a51bb5cf5eaf257de2b98dd8e96ef2
BLAKE2b-256 ba85c5041d67c4f4bad9a7d547cf3b3f41c7efba084d69ae4741c27efbc21932

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page