Skip to main content

Scrapy Spider Stats to MongoDB Extension

Project description

Scrapy SpiderStats Extension

将 Spider Stats 存储到 MongoDB 的扩展,可以用于爬虫监控和统计。

安装

pip install scrapy-spiderstats-extension

使用

settings.py 配置文件中开启 SpiderStats

EXTENSIONS = {
    "scrapyspiderstats.SpiderStats": 0
}
STATS_MONGODB_URI = "mongodb://localhost:27017"
STATS_MONGODB_DB = "scrapy"
STATS_MONGODB_COL = "spiderstats"

存储结果

启动状态

{
    "_id":"5fb23d9cbaf515d71d3a9c6c",
    "log_count/INFO":9,
    "start_time":"2020-11-16T08:51:40.705Z",
    "stats_id":"2b55df7b46a548269ca603bb7ad889b2",
    "spider_name":"test",
    "pages":0,
    "pagerate":0,
    "items":0,
    "itemrate":0,
    "record_time":"2020-11-16T08:51:40.706Z"
}

记录状态

{
    "_id":"5fb23dd8baf515d71d3a9c6d",
    "log_count/INFO":12,
    "start_time":"2020-11-16T08:51:40.705Z",
    "stats_id":"2b55df7b46a548269ca603bb7ad889b2",
    "spider_name":"test",
    "pages":510,
    "pagerate":510,
    "items":0,
    "itemrate":0,
    "record_time":"2020-11-16T08:52:40.713Z",
    "log_count/DEBUG":1034,
    "scheduler/enqueued/redis":521,
    "scheduler/dequeued/redis":520,
    "downloader/request_count":520,
    "downloader/request_method_count/GET":520,
    "downloader/request_bytes":239235,
    "downloader/response_count":510,
    "downloader/response_status_count/200":510,
    "downloader/response_bytes":110675,
    "response_received_count":510,
    "downloader/exception_count":3,
    "downloader/exception_type_count/twisted-internet-error-TimeoutError":3,
    "retry/count":3,
    "retry/reason_count/twisted-internet-error-TimeoutError":3
}

完成状态

{
    "_id":"5fb23e2ebaf515d71d3a9c6f",
    "log_count/INFO":16,
    "start_time":"2020-11-16T08:51:40.705Z",
    "stats_id":"2b55df7b46a548269ca603bb7ad889b2",
    "spider_name":"test",
    "pages":1000,
    "pagerate":6,
    "items":0,
    "itemrate":0,
    "record_time":"2020-11-16T08:54:06.125Z",
    "log_count/DEBUG":2015,
    "scheduler/enqueued/redis":1007,
    "scheduler/dequeued/redis":1007,
    "downloader/request_count":1007,
    "downloader/request_method_count/GET":1007,
    "downloader/request_bytes":463763,
    "downloader/response_count":1000,
    "downloader/response_status_count/200":1000,
    "downloader/response_bytes":216997,
    "response_received_count":1000,
    "downloader/exception_count":7,
    "downloader/exception_type_count/twisted.internet.error.TimeoutError":7,
    "retry/count":7,
    "retry/reason_count/twisted.internet.error.TimeoutError":7,
    "elapsed_time_seconds":145.420645,
    "finish_time":"2020-11-16T08:54:06.125Z",
    "finish_reason":"finished"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-spiderstats-extension-0.0.2.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_spiderstats_extension-0.0.2-py2.py3-none-any.whl (3.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file scrapy-spiderstats-extension-0.0.2.tar.gz.

File metadata

  • Download URL: scrapy-spiderstats-extension-0.0.2.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.0

File hashes

Hashes for scrapy-spiderstats-extension-0.0.2.tar.gz
Algorithm Hash digest
SHA256 e274fa360ecaf7367c30669d6a6db399a0fd11ec201f6c77bc1d9fc6b691c20e
MD5 b37aa35b96afde2789f502cefa4222e2
BLAKE2b-256 2bd927e9d7eddcaae5d77af02834447f48c7684b86e077ee1419ce49d8e72935

See more details on using hashes here.

File details

Details for the file scrapy_spiderstats_extension-0.0.2-py2.py3-none-any.whl.

File metadata

  • Download URL: scrapy_spiderstats_extension-0.0.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.0

File hashes

Hashes for scrapy_spiderstats_extension-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8230e3c41df447913ea67d73662163aaee1bcd4500e9237724006fd7ca219e6f
MD5 bd87397e5d7d2f03a26875ec65b5f16e
BLAKE2b-256 051cf6bbf699258e7c09268382e6e33a86808492a60509afa4b4765c7a2cd7c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page