Skip to main content

A tool to parse Scrapyd logs for detailed statistics.

Project description

Scrapyd Log Parser

Python Version License: MIT Status

A high-performance tool designed to parse Scrapyd logs and generate detailed statistics, providing deep insights that Scrapyd doesn't provide natively.

Features

  • High Performance: Leverages ProcessPoolExecutor for parallel log parsing, maximizing CPU utilization.
  • Organized Output: Centralizes all parsed JSON data into a dedicated scrapydlogparser directory, mirroring your project's structure.
  • Loop Mode: Automatically monitors and re-parses logs at configurable intervals.
  • Smart Cleanup: Automatically removes orphaned JSON files when their corresponding log files are deleted.
  • Incremental Parsing: Only processes new or modified logs by checking file size against existing data.
  • Error Detection: Specifically detects critical unhandled errors and crashes, labeling them for easy identification.

Quick Start

Installation

Install in editable mode for development:

pip install -e .

Basic Usage

Run the parser by pointing it to your Scrapyd logs directory:

scrapyd-logparser /path/to/scrapyd/logs

By default, this will:

  1. Create a scrapydlogparser/ directory inside your logs folder.
  2. Generate individual .json files for every log, mirroring the project/spider structure.
  3. Save a global scrapydlogparser.json summary in the same directory.

CLI Options

Option Shorthand Description Default
--interval -i Interval in seconds for continuous monitoring (loop mode). 5
--output -o Path to the summary JSON file. logs/scrapydlogparser/scrapydlogparser.json
--force -f Forces a full re-parse of all log files. Disabled
--json-dir Custom directory to store individual JSON files. logs/scrapydlogparser/

Advanced Usage

Continuous Monitoring (Loop Mode)

To keep your statistics updated in real-time (e.g., every 60 seconds):

scrapyd-logparser ./logs --interval 60

Data Structure

The tool transforms your standard Scrapyd logs into a clean, queryable JSON structure:

logs/
├── project/
│   └── spider/
│       └── job.log
└── scrapydlogparser/
    ├── scrapydlogparser.json (Global Summary)
    └── project/
        └── spider/
            └── job.json (Detailed stats)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyd_logparser-0.1.0b1.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapyd_logparser-0.1.0b1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file scrapyd_logparser-0.1.0b1.tar.gz.

File metadata

  • Download URL: scrapyd_logparser-0.1.0b1.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for scrapyd_logparser-0.1.0b1.tar.gz
Algorithm Hash digest
SHA256 2dfe43e9a00e60b0ca100f4e319b84a05e1b91754d710bcf4746987040123dd4
MD5 6da226acda610e3b244ed81eea53efb2
BLAKE2b-256 3adfbfa7d9a6fc6b5d5bd69781b1335af575190fac58bca8822b786bd7634070

See more details on using hashes here.

File details

Details for the file scrapyd_logparser-0.1.0b1-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapyd_logparser-0.1.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 dca33dd0106b49d85a8a9f4458ea49df12284fd1ee32ffb0cd07889279209bfe
MD5 76ac11f6c7f9cdb97147c0912a8eb2a9
BLAKE2b-256 8c942a8d77a37ad62e6a1ca59683a82679855e9c264c5dc385822f6877f34283

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page