Scrapy Spider Stats to MongoDB Extension
Project description
Scrapy SpiderStats Extension
将 Spider Stats 存储到 MongoDB 的扩展,可以用于爬虫监控和统计.
安装
pip3 install scrapy-spiderstats-extension
使用
在 settings.py
配置文件中开启 SpiderStats
EXTENSIONS = {
"scrapyspiderstats.SpiderStats": 0
}
STATS_MONGODB_URI = "mongodb://localhost:27017"
STATS_MONGODB_DB = "scrapy"
STATS_MONGODB_COL = "spiderstats"
存储结果
启动状态
{
"_id": ObjectId("5fb23d9cbaf515d71d3a9c6c"),
"log_count/INFO": NumberInt("9"),
"start_time": ISODate("2020-11-16T08:51:40.705Z"),
"StatsId": "2b55df7b46a548269ca603bb7ad889b2",
"spider_name": "test",
"pages": NumberInt("0"),
"pagerate": 0,
"items": NumberInt("0"),
"itemrate": 0,
"record_time": ISODate("2020-11-16T08:51:40.706Z")
}
记录状态
{
"_id": ObjectId("5fb23dd8baf515d71d3a9c6d"),
"log_count/INFO": NumberInt("12"),
"start_time": ISODate("2020-11-16T08:51:40.705Z"),
"StatsId": "2b55df7b46a548269ca603bb7ad889b2",
"spider_name": "test",
"pages": NumberInt("510"),
"pagerate": 510,
"items": NumberInt("0"),
"itemrate": 0,
"record_time": ISODate("2020-11-16T08:52:40.713Z"),
"log_count/DEBUG": NumberInt("1034"),
"scheduler/enqueued/redis": NumberInt("521"),
"scheduler/dequeued/redis": NumberInt("520"),
"downloader/request_count": NumberInt("520"),
"downloader/request_method_count/GET": NumberInt("520"),
"downloader/request_bytes": NumberInt("239235"),
"downloader/response_count": NumberInt("510"),
"downloader/response_status_count/200": NumberInt("510"),
"downloader/response_bytes": NumberInt("110675"),
"response_received_count": NumberInt("510"),
"downloader/exception_count": NumberInt("3"),
"downloader/exception_type_count/twisted-internet-error-TimeoutError": NumberInt("3"),
"retry/count": NumberInt("3"),
"retry/reason_count/twisted-internet-error-TimeoutError": NumberInt("3")
}
完成状态
{
"_id": ObjectId("5fb23e2ebaf515d71d3a9c6f"),
"log_count/INFO": NumberInt("16"),
"start_time": ISODate("2020-11-16T08:51:40.705Z"),
"StatsId": "2b55df7b46a548269ca603bb7ad889b2",
"spider_name": "test",
"pages": NumberInt("1000"),
"pagerate": 6,
"items": NumberInt("0"),
"itemrate": 0,
"record_time": ISODate("2020-11-16T08:54:06.125Z"),
"log_count/DEBUG": NumberInt("2015"),
"scheduler/enqueued/redis": NumberInt("1007"),
"scheduler/dequeued/redis": NumberInt("1007"),
"downloader/request_count": NumberInt("1007"),
"downloader/request_method_count/GET": NumberInt("1007"),
"downloader/request_bytes": NumberInt("463763"),
"downloader/response_count": NumberInt("1000"),
"downloader/response_status_count/200": NumberInt("1000"),
"downloader/response_bytes": NumberInt("216997"),
"response_received_count": NumberInt("1000"),
"downloader/exception_count": NumberInt("7"),
"downloader/exception_type_count/twisted.internet.error.TimeoutError": NumberInt("7"),
"retry/count": NumberInt("7"),
"retry/reason_count/twisted.internet.error.TimeoutError": NumberInt("7"),
"elapsed_time_seconds": 145.420645,
"finish_time": ISODate("2020-11-16T08:54:06.125Z"),
"finish_reason": "finished"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrapy-spiderstats-extension-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13a8f3f551c9671637de1c2930e82a4643eed02e6e5ae3168a7c55c96dd5e6f7 |
|
MD5 | b9d1745f65bff0e91f521cfc8b4778dc |
|
BLAKE2b-256 | 0d6216185a59e65a6102859d3f971b3189af7ccbb1de9933817eca0382ab2cf7 |
Close
Hashes for scrapy_spiderstats_extension-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 692798e1438a00ae8431dd90bea3b95a1d271f2358d9de2eabcf6915838193ad |
|
MD5 | 7b583a495220ca5a2aaa2ae157254998 |
|
BLAKE2b-256 | 1f99a03f95cb59ebc42c98a1da58f40148a3c94e258bbcff7852ce4483f8c868 |