A tool for parsing Scrapy logfiles periodically and incrementally, designed for ScrapydWeb.
Project description
LogParser: A tool for parsing Scrapy logfiles periodically and incrementally, designed for ScrapydWeb.
Installation
- Use pip:
pip install logparser
- Use git:
git clone https://github.com/my8100/logparser.git
cd logparser
python setup.py install
Usage
To use in Python
View codes
In [1]: from logparser import parse
In [2]: log = """2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)
...: 2018-10-23 18:29:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
...: {'downloader/exception_count': 3,
...: 'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 3,
...: 'downloader/request_bytes': 1336,
...: 'downloader/request_count': 7,
...: 'downloader/request_method_count/GET': 7,
...: 'downloader/response_bytes': 1669,
...: 'downloader/response_count': 4,
...: 'downloader/response_status_count/200': 2,
...: 'downloader/response_status_count/302': 1,
...: 'downloader/response_status_count/404': 1,
...: 'dupefilter/filtered': 1,
...: 'finish_reason': 'finished',
...: 'finish_time': datetime.datetime(2018, 10, 23, 10, 29, 41, 174719),
...: 'httperror/response_ignored_count': 1,
...: 'httperror/response_ignored_status_count/404': 1,
...: 'item_scraped_count': 2,
...: 'log_count/CRITICAL': 5,
...: 'log_count/DEBUG': 14,
...: 'log_count/ERROR': 5,
...: 'log_count/INFO': 75,
...: 'log_count/WARNING': 3,
...: 'offsite/domains': 1,
...: 'offsite/filtered': 1,
...: 'request_depth_max': 1,
...: 'response_received_count': 3,
...: 'retry/count': 2,
...: 'retry/max_reached': 1,
...: 'retry/reason_count/twisted.internet.error.TCPTimedOutError': 2,
...: 'scheduler/dequeued': 7,
...: 'scheduler/dequeued/memory': 7,
...: 'scheduler/enqueued': 7,
...: 'scheduler/enqueued/memory': 7,
...: 'start_time': datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)}
...: 2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)"""
In [3]: d = parse(log, headlines=1, taillines=1)
In [4]: d
Out[4]:
OrderedDict([('head',
'2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)'),
('tail',
'2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)'),
('first_log_time', '2018-10-23 18:28:34'),
('latest_log_time', '2018-10-23 18:29:42'),
('elapsed', '0:01:08'),
('first_log_timestamp', 1540290514),
('latest_log_timestamp', 1540290582),
('datas', []),
('pages', 3),
('items', 2),
('latest_matches',
{'resuming_crawl': '',
'latest_offsite': '',
'latest_duplicate': '',
'latest_crawl': '',
'latest_scrape': '',
'latest_item': '',
'latest_stat': ''}),
('latest_crawl_timestamp', 0),
('latest_scrape_timestamp', 0),
('log_categories',
{'critical_logs': {'count': 5, 'details': []},
'error_logs': {'count': 5, 'details': []},
'warning_logs': {'count': 3, 'details': []},
'redirect_logs': {'count': 1, 'details': []},
'retry_logs': {'count': 2, 'details': []},
'ignore_logs': {'count': 1, 'details': []}}),
('shutdown_reason', 'N/A'),
('finish_reason', 'finished'),
('last_update_timestamp', 1547559048),
('last_update_time', '2019-01-15 21:30:48')])
In [5]: d['elapsed']
Out[5]: '0:01:08'
In [6]: d['pages']
Out[6]: 3
In [7]: d['items']
Out[7]: 2
In [8]: d['finish_reason']
Out[8]: 'finished'
To run as a service
- Make sure that Scrapyd has been installed and started on the current host.
- Start LogParser via command
logparser
. - Visit http://127.0.0.1:6800/logs/stats.json (Assuming the Scrapyd service runs on port 6800.)
- Visit http://127.0.0.1:6800/logs/projectname/spidername/jobid.json to get stats of a job in details.
To work with ScrapydWeb for visualization
Check out https://github.com/my8100/scrapydweb for more info.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
logparser-0.8.0.tar.gz
(16.7 kB
view hashes)
Built Distribution
logparser-0.8.0-py3-none-any.whl
(30.3 kB
view hashes)
Close
Hashes for logparser-0.8.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 620f5fd7755baeb94e59de973b96af1e00aadf17ebe6b39441def77c88f54c57 |
|
MD5 | 5875b553e01e39ef843b72a0a8c3fe97 |
|
BLAKE2b-256 | daaa302d0d52f57fb53323375360ef9805e6c25b7a293050ddb04e095c6e9d13 |