A tool to parse Scrapyd logs for detailed statistics.
Project description
Scrapyd Log Parser
A high-performance tool designed to parse Scrapyd logs and generate detailed statistics, providing deep insights that Scrapyd doesn't provide natively.
Features
- High Performance: Leverages
ProcessPoolExecutorfor parallel log parsing, maximizing CPU utilization. - Organized Output: Centralizes all parsed JSON data into a dedicated
scrapydlogparserdirectory, mirroring your project's structure. - Loop Mode: Automatically monitors and re-parses logs at configurable intervals.
- Smart Cleanup: Automatically removes orphaned JSON files when their corresponding log files are deleted.
- Incremental Parsing: Only processes new or modified logs by checking file size against existing data.
- Error Detection: Specifically detects critical unhandled errors and crashes, labeling them for easy identification.
Quick Start
Installation
Install in editable mode for development:
pip install -e .
Basic Usage
Run the parser by pointing it to your Scrapyd logs directory:
scrapyd-logparser /path/to/scrapyd/logs
By default, this will:
- Create a
scrapydlogparser/directory inside your logs folder. - Generate individual
.jsonfiles for every log, mirroring the project/spider structure. - Save a global
scrapydlogparser.jsonsummary in the same directory.
CLI Options
| Option | Shorthand | Description | Default |
|---|---|---|---|
--interval |
-i |
Interval in seconds for continuous monitoring (loop mode). | 5 |
--output |
-o |
Path to the summary JSON file. | logs/scrapydlogparser/scrapydlogparser.json |
--force |
-f |
Forces a full re-parse of all log files. | Disabled |
--json-dir |
Custom directory to store individual JSON files. | logs/scrapydlogparser/ |
Advanced Usage
Continuous Monitoring (Loop Mode)
To keep your statistics updated in real-time (e.g., every 60 seconds):
scrapyd-logparser ./logs --interval 60
Data Structure
The tool transforms your standard Scrapyd logs into a clean, queryable JSON structure:
logs/
├── project/
│ └── spider/
│ └── job.log
└── scrapydlogparser/
├── scrapydlogparser.json (Global Summary)
└── project/
└── spider/
└── job.json (Detailed stats)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapyd_logparser-0.1.0b1.tar.gz.
File metadata
- Download URL: scrapyd_logparser-0.1.0b1.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2dfe43e9a00e60b0ca100f4e319b84a05e1b91754d710bcf4746987040123dd4
|
|
| MD5 |
6da226acda610e3b244ed81eea53efb2
|
|
| BLAKE2b-256 |
3adfbfa7d9a6fc6b5d5bd69781b1335af575190fac58bca8822b786bd7634070
|
File details
Details for the file scrapyd_logparser-0.1.0b1-py3-none-any.whl.
File metadata
- Download URL: scrapyd_logparser-0.1.0b1-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dca33dd0106b49d85a8a9f4458ea49df12284fd1ee32ffb0cd07889279209bfe
|
|
| MD5 |
76ac11f6c7f9cdb97147c0912a8eb2a9
|
|
| BLAKE2b-256 |
8c942a8d77a37ad62e6a1ca59683a82679855e9c264c5dc385822f6877f34283
|