Reproducible log anomaly detection pipelines, from raw logs to deterministic, template-mapped sequences

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harens

These details have not been verified by PyPI

Project description

AnomaLog

GitHub License

AnomaLog turns raw logs into reproducible, model-ready datasets for anomaly detection.

It is designed for research workflows where preprocessing is not incidental, but part of the experimental artifact. Parsing, template mining, labeling, and sequence construction are made explicit, composable, and reproducible.

Benchmark results in log anomaly detection often depend on hidden preprocessing decisions. AnomaLog surfaces those decisions and makes them first-class, enabling fair comparison and repeatable experiments.

Some typical use cases include:

Comparing anomaly detectors under controlled preprocessing assumptions
Running ablations over parsers or template miners
Reproducing published benchmarks from raw logs

⚡ 10-second example

from anomalog.presets import bgl
from anomalog.representations import TemplatePhraseRepresentation

samples = (
    bgl.build()
    .group_by_entity()
    .with_train_fraction(0.8)
    .represent_with(TemplatePhraseRepresentation())
)

This constructs model-ready features from raw logs with a fully specified, reproducible preprocessing pipeline.

Installation

pip install anomalog

Pipeline at a glance

AnomaLog models preprocessing as a deterministic pipeline:

Source - raw log ingestion and dataset sourcing
Parsing - structured parsing into typed log records
Templating - template mining and assignment
Sequencing - grouping logs into windows
Representation - model-ready feature extraction

The public API is centered on the fluent builder:

dataset = (
    DatasetSpec(...)
    .from_source(...)
    .parse_with(...)
    .label_with(...)
    .template_with(...)
    .build()
)

That produces a templated dataset, which can then be grouped into sequences and converted into model-ready representations.

Quickstart

Our presets include popular log anomaly datasets like BGL and HDFS v1.

A typical workflow is:

Load a preset dataset (e.g. BGL).
Build structured logs with parsing and templating.
Group logs into labeled sequences.
Convert sequences into model-ready features.

>>> from anomalog import SplitLabel
>>> from anomalog.parsers import IdentityTemplateParser
>>> from anomalog.presets import bgl
>>> from anomalog.representations import TemplatePhraseRepresentation

>>> dataset = bgl.build()
# Presets are ordinary DatasetSpec objects, so preprocessing choices stay visible
>>> bgl.template_parser.name
'drain3'
# Deterministically group logs into sequences with explicit train/test semantics
>>> sequences = dataset.group_by_entity().with_train_fraction(0.8)

# Each event stores (template, parameters, timing delta)
>>> next(iter(sequences))
TemplateSequence(
    events=[
        ('RAS KERNEL FATAL <:*:> <:*:> <:*:>', ['data', 'storage', 'interrupt'], None),
        ('RAS KERNEL FATAL <:*:> <:*:> <:*:>', ['instruction', 'address:', '0x00004ed8'], 523407),
        ...
    ],
    label=1,
    entity_ids=['R00-M0-N0-C:J08-U01'],
    split_label=<SplitLabel.TRAIN: 'train'>
)

# Convert sequences into n-gram features for modeling
>>> train_samples = sequences.represent_with(
    TemplatePhraseRepresentation(phrase_ngram_min=1, phrase_ngram_max=2),
)
>>> next(sample for sample in train_samples if sample.split_label is SplitLabel.TRAIN)
SequenceSample(data=Counter({'ras': 49, 'kernel': 49, 'ras kernel': 49, ...})
               label=1,
               entity_ids=['R00-M0-N0-C:J08-U01'],
               split_label=<SplitLabel.TRAIN: 'train'>,
               window_id=0)

# Ablation: disable template mining and use raw log lines directly
>>> ablated_dataset = bgl.template_with(IdentityTemplateParser).build()

Built-in presets are ordinary DatasetSpec objects, so the exact source, parser, label, and template-mining choices stay visible in code and can be modified for ablations instead of being hidden behind preprocessed artifacts.

You can also define a custom dataset:

from pathlib import Path

from anomalog import DatasetSpec
from anomalog.labels import CSVReader
from anomalog.parsers import HDFSV1Parser
from anomalog.sources import LocalZipSource

dataset = (
    DatasetSpec("my-hdfs")
    .from_source(
        LocalZipSource(
            Path("HDFS_v1.zip"),
            raw_logs_relpath=Path("HDFS.log"),
        ),
    )
    .parse_with(HDFSV1Parser())
    .label_with(
        CSVReader(
            relative_path=Path("preprocessed/anomaly_label.csv"),
            entity_column="BlockId",
            label_column="Label",
        ),
    )
    .build()
)

Documentation

The full documentation is organised by task:

Getting started for the end-to-end workflow, grouping choices, representation stage, and split semantics
Experiments for config-driven detector runs and recorded artifacts
Reference for the codebase map and API pages

Experiments

The repository also includes a config-driven experiment layer under experiments/ for model experimentation on top of AnomaLog preprocessing.

Built-in experiment detectors include Template Frequency, Naive Bayes, River, and a scoped DeepLog and DeepCASE reimplementation.

uv run python -m experiments.runners.run_experiment \
  --experiment bgl_entity_chronological_template_frequency

Local execution is the canonical reproducibility path. Experiment runs reuse AnomaLog's dataset-side caches and write deterministic result directories, but detector training and test scoring are intentionally rerun for new config fingerprints.

To run a curated local suite:

uv run python -m experiments.runners.run_suite \
  --group bgl_deeplog_ccs2017_paper \
  --group hdfs_deeplog_paper \
  --max-parallel 2

The same registry also drives the optional Slurm backend:

uv run python -m experiments.execution.slurm submit \
  --group bgl_deeplog_ccs2017_paper \
  --group hdfs_deeplog_paper

See experiments/README.md for the experiment layout and artifact format.

Development

Contributor setup and local commands are documented in Development.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harens

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.0

Jun 11, 2026

0.5.0

Jun 4, 2026

This version

0.4.0

Jun 3, 2026

0.3.0

Apr 14, 2026

0.2.0

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anomalog-0.4.0.tar.gz (81.0 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anomalog-0.4.0-py3-none-any.whl (101.2 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file anomalog-0.4.0.tar.gz.

File metadata

Download URL: anomalog-0.4.0.tar.gz
Upload date: Jun 3, 2026
Size: 81.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for anomalog-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`57bc2f8ac1611c0dcdd7eec0af35711ad7a64de7fe6135c15ed93bbbce61a5b8`
MD5	`0a2c940eddd6c469a987ef3ec0494377`
BLAKE2b-256	`a0603f761614057461e32ffcad13e268af57335ea2a63e20e5a7a20065c65206`

See more details on using hashes here.

File details

Details for the file anomalog-0.4.0-py3-none-any.whl.

File metadata

Download URL: anomalog-0.4.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 101.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for anomalog-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c538d2cf2c2309e02f73d7b826aaf5482fe7d79f9dc2f003dd9ad16537c9d34`
MD5	`04713afa762005d42d4ae7da7eadf507`
BLAKE2b-256	`0e23df9a0bdd2b2979bcfeb54531598bc33b87b59340cdbacd738a197ccba9fa`

See more details on using hashes here.

anomalog 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

AnomaLog

⚡ 10-second example

Installation

Pipeline at a glance

Quickstart

Documentation

Experiments

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes