Reproducible log anomaly detection pipelines, from raw logs to deterministic, template-mapped sequences
Project description
AnomaLog
AnomaLog turns raw logs into reproducible, model-ready datasets for anomaly detection.
It is designed for research workflows where preprocessing is not incidental, but part of the experimental artifact. Parsing, template mining, labeling, and sequence construction are made explicit, composable, and reproducible.
Benchmark results in log anomaly detection often depend on hidden preprocessing decisions. AnomaLog surfaces those decisions and makes them first-class, enabling fair comparison and repeatable experiments.
Some typical use cases include:
- Comparing anomaly detectors under controlled preprocessing assumptions
- Running ablations over parsers or template miners
- Reproducing published benchmarks from raw logs
⚡ 10-second example
from anomalog.presets import bgl
from anomalog.representations import TemplatePhraseRepresentation
samples = (
bgl.build()
.group_by_entity()
.with_train_fraction(0.8)
.represent_with(TemplatePhraseRepresentation())
)
This constructs model-ready features from raw logs with a fully specified, reproducible preprocessing pipeline.
Installation
pip install anomalog
Pipeline at a glance
AnomaLog models preprocessing as a deterministic pipeline:
- Source - raw log ingestion and dataset sourcing
- Parsing - structured parsing into typed log records
- Templating - template mining and assignment
- Sequencing - grouping logs into windows
- Representation - model-ready feature extraction
The public API is centered on the fluent builder:
dataset = (
DatasetSpec(...)
.from_source(...)
.parse_with(...)
.label_with(...)
.template_with(...)
.build()
)
That produces a templated dataset, which can then be grouped into sequences and converted into model-ready representations.
Quickstart
Our presets include popular log anomaly datasets like BGL and HDFS v1.
A typical workflow is:
- Load a preset dataset (e.g. BGL).
- Build structured logs with parsing and templating.
- Group logs into labeled sequences.
- Convert sequences into model-ready features.
>>> from anomalog import SplitLabel
>>> from anomalog.parsers import IdentityTemplateParser
>>> from anomalog.presets import bgl
>>> from anomalog.representations import TemplatePhraseRepresentation
>>> dataset = bgl.build()
# Presets are ordinary DatasetSpec objects, so preprocessing choices stay visible
>>> bgl.template_parser.name
'drain3'
# Deterministically group logs into sequences with explicit train/test semantics
>>> sequences = dataset.group_by_entity().with_train_fraction(0.8)
# Each event stores (template, parameters, timing delta)
>>> next(iter(sequences))
TemplateSequence(
events=[
('RAS KERNEL FATAL <:*:> <:*:> <:*:>', ['data', 'storage', 'interrupt'], None),
('RAS KERNEL FATAL <:*:> <:*:> <:*:>', ['instruction', 'address:', '0x00004ed8'], 523407),
...
],
label=1,
entity_ids=['R00-M0-N0-C:J08-U01'],
split_label=<SplitLabel.TRAIN: 'train'>
)
# Convert sequences into n-gram features for modeling
>>> train_samples = sequences.represent_with(
TemplatePhraseRepresentation(phrase_ngram_min=1, phrase_ngram_max=2),
)
>>> next(sample for sample in train_samples if sample.split_label is SplitLabel.TRAIN)
SequenceSample(data=Counter({'ras': 49, 'kernel': 49, 'ras kernel': 49, ...})
label=1,
entity_ids=['R00-M0-N0-C:J08-U01'],
split_label=<SplitLabel.TRAIN: 'train'>,
window_id=0)
# Ablation: disable template mining and use raw log lines directly
>>> ablated_dataset = bgl.template_with(IdentityTemplateParser).build()
Built-in presets are ordinary DatasetSpec objects, so the exact source,
parser, label, and template-mining choices stay visible in code and can be
modified for ablations instead of being hidden behind preprocessed artifacts.
You can also define a custom dataset:
from pathlib import Path
from anomalog import DatasetSpec
from anomalog.labels import CSVReader
from anomalog.parsers import HDFSV1Parser
from anomalog.sources import LocalZipSource
dataset = (
DatasetSpec("my-hdfs")
.from_source(
LocalZipSource(
Path("HDFS_v1.zip"),
raw_logs_relpath=Path("HDFS.log"),
),
)
.parse_with(HDFSV1Parser())
.label_with(
CSVReader(
relative_path=Path("preprocessed/anomaly_label.csv"),
entity_column="BlockId",
label_column="Label",
),
)
.build()
)
Documentation
The full documentation is organized by task:
- Getting started for the end-to-end workflow, grouping choices, representation stage, and split semantics
- Experiments for config-driven detector runs and recorded artifacts
- Reference for the codebase map and API pages
Experiments
The repository also includes a config-driven experiment layer under
experiments/ for model experimentation on top of AnomaLog preprocessing.
uv run python -m experiments.runners.run_experiment \
--config experiments/configs/runs/bgl_template_frequency.toml
See experiments/README.md for the experiment layout and artifact format.
Development
Contributor setup and local commands are documented in Development.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anomalog-0.3.0.tar.gz.
File metadata
- Download URL: anomalog-0.3.0.tar.gz
- Upload date:
- Size: 37.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3640e49ecc8265bc3814505dea3a0bae8dece93ae7805376e79acc5b0f249385
|
|
| MD5 |
d506e51c7c7d48edb8e8286ebc9db59f
|
|
| BLAKE2b-256 |
9b7bf50a392c9a173171790805902831eb62a2427a9d4513ece938bc44a82052
|
File details
Details for the file anomalog-0.3.0-py3-none-any.whl.
File metadata
- Download URL: anomalog-0.3.0-py3-none-any.whl
- Upload date:
- Size: 52.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebe9dbd8c8ba6edaf538526836aa8b22a2abd6308bab2644bbb3c9e9fae17862
|
|
| MD5 |
f613281dc3ffd448231a6a33a9461989
|
|
| BLAKE2b-256 |
86a5c04f64cb9a1fc47d220e6d6867586efae57ed1b2c9341280bf918729be41
|