Skip to main content

A tool for parsing Scrapy log files periodically and incrementally, designed for ScrapydWeb.

Project description

LogParser: A tool for parsing Scrapy log files periodically and incrementally, designed for ScrapydWeb.

PyPI - logparser Version PyPI - Python Version CircleCI codecov Coverage Status Downloads - total GitHub license

Installation

  • Use pip:
pip install logparser

Note that you may need to execute python -m pip install --upgrade pip first in order to get the latest version of logparser, or download the tar.gz file from https://pypi.org/project/logparser/#files and get it installed via pip install logparser-x.x.x.tar.gz

  • Use git:
pip install --upgrade git+https://github.com/my8100/logparser.git

Or:

git clone https://github.com/my8100/logparser.git
cd logparser
python setup.py install

Usage

To use in Python

View codes
In [1]: from logparser import parse

In [2]: log = """2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)
   ...: 2018-10-23 18:29:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
   ...: {'downloader/exception_count': 3,
   ...:  'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 3,
   ...:  'downloader/request_bytes': 1336,
   ...:  'downloader/request_count': 7,
   ...:  'downloader/request_method_count/GET': 7,
   ...:  'downloader/response_bytes': 1669,
   ...:  'downloader/response_count': 4,
   ...:  'downloader/response_status_count/200': 2,
   ...:  'downloader/response_status_count/302': 1,
   ...:  'downloader/response_status_count/404': 1,
   ...:  'dupefilter/filtered': 1,
   ...:  'finish_reason': 'finished',
   ...:  'finish_time': datetime.datetime(2018, 10, 23, 10, 29, 41, 174719),
   ...:  'httperror/response_ignored_count': 1,
   ...:  'httperror/response_ignored_status_count/404': 1,
   ...:  'item_scraped_count': 2,
   ...:  'log_count/CRITICAL': 5,
   ...:  'log_count/DEBUG': 14,
   ...:  'log_count/ERROR': 5,
   ...:  'log_count/INFO': 75,
   ...:  'log_count/WARNING': 3,
   ...:  'offsite/domains': 1,
   ...:  'offsite/filtered': 1,
   ...:  'request_depth_max': 1,
   ...:  'response_received_count': 3,
   ...:  'retry/count': 2,
   ...:  'retry/max_reached': 1,
   ...:  'retry/reason_count/twisted.internet.error.TCPTimedOutError': 2,
   ...:  'scheduler/dequeued': 7,
   ...:  'scheduler/dequeued/memory': 7,
   ...:  'scheduler/enqueued': 7,
   ...:  'scheduler/enqueued/memory': 7,
   ...:  'start_time': datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)}
   ...: 2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)"""

In [3]: odict = parse(log, headlines=1, taillines=1)

In [4]: odict
Out[4]:
OrderedDict([('head',
              '2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)'),
             ('tail',
              '2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)'),
             ('first_log_time', '2018-10-23 18:28:34'),
             ('latest_log_time', '2018-10-23 18:29:42'),
             ('runtime', '0:01:08'),
             ('first_log_timestamp', 1540290514),
             ('latest_log_timestamp', 1540290582),
             ('datas', []),
             ('pages', 3),
             ('items', 2),
             ('latest_matches',
              {'telnet_console': '',
               'resuming_crawl': '',
               'latest_offsite': '',
               'latest_duplicate': '',
               'latest_crawl': '',
               'latest_scrape': '',
               'latest_item': '',
               'latest_stat': ''}),
             ('latest_crawl_timestamp', 0),
             ('latest_scrape_timestamp', 0),
             ('log_categories',
              {'critical_logs': {'count': 5, 'details': []},
               'error_logs': {'count': 5, 'details': []},
               'warning_logs': {'count': 3, 'details': []},
               'redirect_logs': {'count': 1, 'details': []},
               'retry_logs': {'count': 2, 'details': []},
               'ignore_logs': {'count': 1, 'details': []}}),
             ('shutdown_reason', 'N/A'),
             ('finish_reason', 'finished'),
             ('crawler_stats',
              OrderedDict([('source', 'log'),
                           ('last_update_time', '2018-10-23 18:29:41'),
                           ('last_update_timestamp', 1540290581),
                           ('downloader/exception_count', 3),
                           ('downloader/exception_type_count/twisted.internet.error.TCPTimedOutError',
                            3),
                           ('downloader/request_bytes', 1336),
                           ('downloader/request_count', 7),
                           ('downloader/request_method_count/GET', 7),
                           ('downloader/response_bytes', 1669),
                           ('downloader/response_count', 4),
                           ('downloader/response_status_count/200', 2),
                           ('downloader/response_status_count/302', 1),
                           ('downloader/response_status_count/404', 1),
                           ('dupefilter/filtered', 1),
                           ('finish_reason', 'finished'),
                           ('finish_time',
                            'datetime.datetime(2018, 10, 23, 10, 29, 41, 174719)'),
                           ('httperror/response_ignored_count', 1),
                           ('httperror/response_ignored_status_count/404', 1),
                           ('item_scraped_count', 2),
                           ('log_count/CRITICAL', 5),
                           ('log_count/DEBUG', 14),
                           ('log_count/ERROR', 5),
                           ('log_count/INFO', 75),
                           ('log_count/WARNING', 3),
                           ('offsite/domains', 1),
                           ('offsite/filtered', 1),
                           ('request_depth_max', 1),
                           ('response_received_count', 3),
                           ('retry/count', 2),
                           ('retry/max_reached', 1),
                           ('retry/reason_count/twisted.internet.error.TCPTimedOutError',
                            2),
                           ('scheduler/dequeued', 7),
                           ('scheduler/dequeued/memory', 7),
                           ('scheduler/enqueued', 7),
                           ('scheduler/enqueued/memory', 7),
                           ('start_time',
                            'datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)')])),
             ('last_update_time', '2019-03-08 16:53:50'),
             ('last_update_timestamp', 1552035230),
             ('logparser_version', '0.8.1')])

In [5]: odict['runtime']
Out[5]: '0:01:08'

In [6]: odict['pages']
Out[6]: 3

In [7]: odict['items']
Out[7]: 2

In [8]: odict['finish_reason']
Out[8]: 'finished'

To run as a service

  1. Make sure that Scrapyd has been installed and started on the current host.
  2. Start LogParser via command logparser
  3. Visit http://127.0.0.1:6800/logs/stats.json (Assuming the Scrapyd service runs on port 6800.)
  4. Visit http://127.0.0.1:6800/logs/projectname/spidername/jobid.json to get stats of a job in details.

To work with ScrapydWeb for visualization

Check out https://github.com/my8100/scrapydweb for more info.

stats

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logparser-0.8.2.tar.gz (23.6 kB view details)

Uploaded Source

Built Distribution

logparser-0.8.2-py3-none-any.whl (37.3 kB view details)

Uploaded Python 3

File details

Details for the file logparser-0.8.2.tar.gz.

File metadata

  • Download URL: logparser-0.8.2.tar.gz
  • Upload date:
  • Size: 23.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for logparser-0.8.2.tar.gz
Algorithm Hash digest
SHA256 b1720067ba267f13a4c1b08cfdcad5b9ed7e0db95817a46a946de83a5b18f227
MD5 11bed4a1de8927840325600137aa6be4
BLAKE2b-256 0ee1c2679116e29ff09cb5018635297fbe59ab371eae1d9b913b0e9e22a6e974

See more details on using hashes here.

File details

Details for the file logparser-0.8.2-py3-none-any.whl.

File metadata

  • Download URL: logparser-0.8.2-py3-none-any.whl
  • Upload date:
  • Size: 37.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for logparser-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9d34701ff8ae22596a987eb10defd81c86003860a9289e74f72ee185f7c01db8
MD5 ef00fab72ff523182c5d58e0096a9cb0
BLAKE2b-256 d67e528828509144799c08e903e0f49cbc037cec4128286eaafa19e312263651

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page