Deterministic log templating on top of Drain3, packaged as an artifact for AI agents.
Project description
AgenticAILogAnalyser
Python port of codag-drain that uses the upstream Python Drain3 package as its grouping engine. Same CLI surface, same output shape, same evidence-rich artifact, packaged as a single binary you can drop into any environment.
The intended consumer is an AI agent that needs to read a large log window under a fixed token budget. Instead of feeding the agent 1,400 raw lines, you feed it 8 templates with slot statistics and a few raw examples per group.
What it does
Takes a stream of log lines on stdin, groups near-duplicates with Drain3, and emits one templated line per group with:
- the count of collapsed lines,
- a derived
<*>template, - per-slot stats (min / max / median for numeric slots, distinct values for
enums, an auto-detected unit like
msorMB), - a few raw sample lines.
The intended consumer is an LLM agent that needs to read a large log window under a fixed token budget.
Real-world example
A 1,438-line Kiro IDE log compresses to 8 templates at ~180x compression:
[x1] [WebviewProcessMonitor] Service starting
[x4] update#setState <*> [idle,downloading,downloaded,ready]
[x14] [WebviewProcessMonitor] Tracking webview renderer: pid=<*>, origin=<*>, windowId=<*> [13773..87619 p50=87288.5]
[x1] update#setState checking for updates
[x14] Extension host with pid <*> exited with code: 0, signal: unknown. [13697..89755 p50=73921]
[x1395] No ptyHost heartbeat after 6 seconds
[x8] [WebviewProcessMonitor] Webview renderer process gone: pid=<*>
[x1] Extracting content from 1 URIs
[codag-drain-py] 1438 lines -> 8 templates (179.8x)
The dominant signal — 97% of the file being one repeating warning — is the first thing the model sees instead of being buried. Numeric ranges and enum values are preserved, so outliers and state distributions stay visible.
Install
From source:
pip install -e .
From source with the build extra (PyInstaller):
pip install -e ".[build]"
Usage
echo 'worker latency 20ms
worker latency 20ms
worker latency 20ms
worker latency 8400ms' | codag-drain-py --stats
[x4] worker latency <*> [20..8400ms p50=20ms]
[codag-drain-py] 4 lines -> 1 templates (4.0x)
JSON output:
echo 'worker ready shard=1' | codag-drain-py --format json
Choose a grouper:
cat logs.txt | codag-drain-py --grouper drain-stock
NDJSON input:
cat events.ndjson | codag-drain-py --json
Available groupers:
| name | description |
|---|---|
drain (default) |
Drain3 with codag's compact-line tokenizer fallback |
drain-stock |
Drain3 with vanilla whitespace tokenization |
drain-delimited |
Drain with extra punctuation delimiters folded into whitespace |
drain-fullsearch |
Drain similarity over all same-length clusters (no prefix-tree) |
statistical |
Non-Drain control: IDF-weighted anchor co-occurrence |
Build a single-file binary
./scripts/build_binary.sh
./dist/codag-drain-py --help
PyInstaller bundles the Python interpreter and drain3 into one file under
dist/. Build on each OS / architecture you intend to ship.
Programmatic API
from codag_drain_py import LogLine, TemplaterConfig, template_logs
result = template_logs(
[LogLine(message="latency 20ms"), LogLine(message="latency 8400ms")]
)
print(result.render())
print(result.to_json(indent=2))
TemplateIndex exposes the streaming variant:
from codag_drain_py import LogLine, TemplateIndex
idx = TemplateIndex()
for msg in some_iterator():
idx.push(LogLine(message=msg))
print(idx.templates().render())
Tests
pip install -e ".[dev]"
pytest
Credits
- Drain3 — the underlying log template miner from logpai. We use the published PyPI package directly.
- codag-drain — the Rust project this Python port is modeled on. The compact-line tokenizer fallback, multi-member template derivation, slot profiling, and CLI surface all follow that design.
- Drain paper — He et al., "Drain: An Online Log Parsing Approach with Fixed Depth Tree", ICWS 2017.
License
MIT. See LICENSE.
Layout
src/codag_drain_py/
__init__.py public exports
__main__.py `python -m codag_drain_py`
cli.py argparse + stdin pipeline
compress.py templater entry point + rendering
grouper.py Drain / DrainStock / DrainDelimited / FullSearch / Statistical
input.py heuristic line + NDJSON parsers
lex.py character-class tokenizer + lex template derivation
profile.py slot capture, numeric stats, distinct-value summaries
stream.py TemplateIndex streaming wrapper
template.py whitespace template derivation + capture regex
tests/
test_compress.py
test_input.py
scripts/
build_binary.sh PyInstaller --onefile build
MCP server (use as a tool from Kiro / Claude / any MCP client)
The analyser ships with a built-in Model Context Protocol server. Once registered with Kiro or Claude Desktop, your assistant can call it as a tool to compress logs on demand without you piping anything through a shell.
What it exposes
Five tools, all served over stdio:
| tool | description |
|---|---|
analyse_logs |
Compress an inline log body. Returns templated artifact + summary. |
analyse_log_file |
Same but reads the body from a local file path. |
stream_push |
Append lines to a named streaming session. |
stream_project |
Render templates over the accumulated session. |
stream_reset |
Clear a session. |
Each tool accepts the full set of analyser options: grouper, sample_cap,
template_clip, body_format, output_format.
Build the MCP binary
./scripts/build_mcp_binary.sh
This produces a single self-contained binary at dist/agentic-log-analyser-mcp
(~22 MB). It bundles the Python interpreter, the analyser, drain3, and the
MCP SDK — no Python install required on the machine that runs it.
Register with Kiro
Open Kiro's MCP config (Command Palette → "Open MCP Config" or edit
.kiro/settings/mcp.json in your workspace, or ~/.kiro/settings/mcp.json for
user-wide). Add:
{
"mcpServers": {
"agentic-log-analyser": {
"command": "/absolute/path/to/dist/agentic-log-analyser-mcp",
"args": [],
"disabled": false,
"autoApprove": ["analyse_logs", "analyse_log_file", "stream_project"]
}
}
}
There's a ready-to-paste example at examples/mcp_config_kiro.json. Reload the
MCP config from the MCP Server view in the Kiro feature panel.
Register with Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json
(macOS) and merge in:
{
"mcpServers": {
"agentic-log-analyser": {
"command": "/absolute/path/to/dist/agentic-log-analyser-mcp",
"args": []
}
}
}
Restart Claude Desktop. The tools will appear in the tools menu.
Use it from a chat
In Kiro or Claude, just ask:
"Compress this log file and tell me what stands out:
/Users/me/Desktop/logs/cloudtrail_event.txt"
The assistant will pick up analyse_log_file, call it with the path, and
diagnose against the templated artifact instead of the raw bytes.
Debug from the CLI
To run the server manually and tail its output:
./dist/agentic-log-analyser-mcp
It speaks JSON-RPC over stdio. The repo's scripts/smoke_mcp_binary.py shows a
real client roundtrip you can use as a reference.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_log_analyser-0.1.0.tar.gz.
File metadata
- Download URL: agentic_log_analyser-0.1.0.tar.gz
- Upload date:
- Size: 27.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02755b2755e061d063af953712abfdddf96a8263324a09602359e8330c1f533b
|
|
| MD5 |
5ac21c825f0fb1745b7d9f01b0870d56
|
|
| BLAKE2b-256 |
9943277f0c89d5923ece067065caf429638b408869732c75da929bba7fc7c9e4
|
File details
Details for the file agentic_log_analyser-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentic_log_analyser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc4c42bd34b960f19e3893f5ffc4f2c88b33892aae622975a3b1618a21286b22
|
|
| MD5 |
b43beaa385224d5e02c2b939d6eb7fc9
|
|
| BLAKE2b-256 |
22755da6c0e33e2b30bfc62e388d90a3d9c77bab8c0403b6cbdc6c09fb5c8914
|