Structured log triage. Stream a log, match against configurable patterns, get a summary.
Project description
logtally
Stream a log, match it against configurable regex patterns, get a summary report. Memory-efficient (handles files larger than RAM), zero-config for common patterns, JSONL output for pipelines.
Why
grep is great when you know exactly what you're looking for. cProfile-style heavyweight log analyzers are overkill for "what broke in the last hour?" logtally is the in-between: point it at a log, get an honest breakdown of what's in there — by pattern, by severity, with a time range — in one command.
It's particularly useful when:
- You inherited a service and want to know what kinds of errors are normal vs. new
- A deployment went sideways and you want to find the noisy patterns fast
- You need to feed structured match data into another tool via JSONL
What it looks like
$ logtally examples/sample-custom.log
[summary]
Total lines: 55
Matched lines: 39
Total matches: 65
Match rate: 70.91%
Time range: 2026-04-29T09:00:01 to 2026-04-29T12:10:01 (3:10:00)
[by severity]
info 7
warn 24
error 33
critical 1
[top 10 patterns]
1. 18 [error ] generic_error (Generic error-level log line)
2. 12 [warn ] generic_warning (Generic warning-level log line)
3. 5 [info ] startup (Service startup signal)
4. 5 [warn ] retry (Operation was retried)
5. 4 [error ] connection_refused (TCP connection refused by remote)
6. 3 [warn ] http_4xx (HTTP 4xx response status)
7. 3 [error ] auth_failure (Failed authentication or authorization)
8. 3 [error ] http_5xx (HTTP 5xx response status)
9. 2 [error ] db_error (Database error)
10. 2 [warn ] rate_limit (Request was rate limited)
Note: a single line can match multiple patterns (e.g. an HTTP 500 line might also be a generic_error), which is why Total matches (65) is higher than Matched lines (39).
Install
From source (only path for now; PyPI publishing is on the roadmap):
git clone https://github.com/amadhusudan/logtally.git
cd logtally
pip install -e .
The only runtime dependency is pyyaml.
Usage
# Default: scan a file with bundled patterns, print summary
logtally app.log
# Show only top 20 patterns
logtally app.log --top 20
# Filter to error and critical only
logtally app.log --min-severity error
# Pipe from stdin
cat app.log | logtally -
journalctl -u myservice | logtally -
# Output every match as JSONL for downstream processing
logtally app.log --json > matches.jsonl
# Use your own pattern set
logtally app.log --config my-patterns.yaml
CLI reference
positional arguments:
source Path to log file. Omit or use '-' for stdin.
options:
--config, -c PATH Path to YAML pattern config. Defaults to bundled patterns.
--top, -n N Number of top patterns to show (default: 10).
--min-severity, -s LEVEL Only count matches at this severity or higher.
Choices: info, warn, error, critical.
--json Output matches as JSONL to stdout instead of summary.
--version, -v Show version and exit.
Try it out
The repo ships with several sample logs in examples/ covering common formats. They double as smoke tests for the bundled patterns:
| File | Format | What's in it |
|---|---|---|
sample-custom.log |
Generic app log — 2026-04-29 09:00:01 LEVEL logger msg |
deadlocks, retries, 4xx/5xx, auth failures, OOM |
sample-syslog.log |
RFC3164 syslog (yearless) — Apr 29 09:00:01 host … |
service lifecycle, OOM, retries |
sample-systemd.log |
journalctl default output |
unit transitions, sshd auth failures, kernel OOM kill |
sample-nginx-access.log |
nginx/Apache Combined Log Format | HTTP 4xx/5xx breakdown, bot scanners |
sample-json.log |
JSONL (zap/zerolog/pino style) | structured logs with level/msg/contextual fields |
# HTTP status breakdown from nginx access logs
logtally examples/sample-nginx-access.log
# Only error/critical from structured JSON logs
logtally examples/sample-json.log --min-severity error
# systemd journal scan piped through, then count critical patterns
logtally examples/sample-systemd.log --json | jq -r '.pattern' | sort | uniq -c
Patterns
logtally ships with a default pattern set covering common signals: HTTP errors, timeouts, auth failures, deadlocks, OOMs, deprecation warnings, retries, and more. See logtally/_data/default.yaml for the full list.
You can write your own pattern file in the same format:
patterns:
- name: my_custom_signal
regex: "checkout flow stuck for user_id=\\d+"
severity: error
description: "Checkout hang"
- name: gateway_5xx
regex: " 5\\d{2} .* upstream "
severity: critical
description: "5xx response from upstream gateway"
Then run:
logtally app.log --config my-patterns.yaml
A pattern can match anywhere in a line (regexes use Python re.search with case-insensitive matching). One line can fire multiple patterns; each fired pattern counts independently in the top-N report.
Design notes
A few choices worth knowing about:
- Streaming, not loading.
logtallyreads one line at a time. A 10 GB log file uses ~10 MB of memory, not 10 GB. - Lenient encoding. Real-world logs contain random non-UTF-8 bytes.
logtallyuseserrors='replace'rather than crashing mid-scan. - No SQL, no database, no daemon. It's a single Python process that reads stdin or a file and prints to stdout. Compose it with shell pipelines.
- JSONL output is the integration story. If you want a dashboard, alerting, or trend analysis, pipe
--jsonoutput into whatever tool already does that well.
Performance
logtally is regex-bound, which means a few simple rules apply:
- More patterns = slower scan (each pattern is tried against each line)
- Anchored or specific patterns are faster than greedy ones
- The bundled default set processes roughly 200K–400K lines/second on a modern laptop CPU
If you're scanning gigabyte-scale logs and want hard numbers, profile your specific config with sentinel-trace.
Testing
pip install -e ".[dev]"
pytest -v
Tests cover pattern compilation, severity filtering, line streaming (including encoding edge cases), aggregation logic, timestamp parsing (ISO, yearless syslog, and Apache/nginx formats), and output formatting. CI runs the suite against Python 3.9-3.12 on every push and pull request.
Roadmap
--bucket 1h— time-bucketed match counts for spotting bursts--follow/-f— tail mode (liketail -f)- Anomaly detection — flag patterns whose rate jumps significantly within a time window
- Multi-file aggregation —
logtally logs/*.logwith per-file breakdown - Pre-built pattern packs for nginx, syslog, Kubernetes events
- PyPI publish
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file logtally-0.1.0.tar.gz.
File metadata
- Download URL: logtally-0.1.0.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f08bcf20224c679d4d4d1e1204ecc19a0f9dc540ff6138b801d5e5f589633527
|
|
| MD5 |
eeb301d2e86ed2b2eddb7f595b47a0ea
|
|
| BLAKE2b-256 |
14c9f3688cd9e7f2c2e0fef246f422c90c5eab8f364a1e760b0f54da33c273f1
|
File details
Details for the file logtally-0.1.0-py3-none-any.whl.
File metadata
- Download URL: logtally-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7837e106ec33cb699c484e7244d9195431b6e9bdb66c34835c240e99fbe5f86
|
|
| MD5 |
4e3c4d3a739071329783e08ee262b693
|
|
| BLAKE2b-256 |
bd5b942a6554a437f396e340cab09a8380f51b9760d63061494fa5cea0ad19f8
|