Reproducible log anomaly detection pipelines, from raw logs to deterministic, template-mapped sequences

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harens

These details have not been verified by PyPI

Project description

AnomaLog

GitHub License

An orchestration-driven research framework for reproducible log anomaly detection pipelines. Converts raw logs into deterministic, template-mapped sequences ready for controlled detector experiments.

Built on Prefect, AnomaLog emphasises end-to-end reproducibility from raw log ingestion to model-ready sequences.

Motivation

Many log anomaly detection implementations focus primarily on modelling techniques while omitting the full preprocessing pipeline. Parsing details are often described but not fully reproducible from code, and experiments frequently rely on preprocessed datasets without documenting raw log handling.

“The same dataset” is not always the same once parsing choices, windowing rules, entity grouping, and leakage controls are considered.

AnomaLog provides a cache-aware, pipeline-first framework that treats log preprocessing as a first-class research artifact. Each stage, from raw ingestion → parsing → template mining → sequencing, is modular and reproducible, rather than one-off scripts with hidden assumptions.

This enables controlled ablation studies, fair model comparisons, and fully repeatable experiments from raw logs. Researchers can focus on modeling choices rather than reverse-engineering preprocessing and experiment glue.

Key Features

Deterministic pipeline execution. Workflow stages are fingerprinted and cached so only modified components are recomputed.
Protocol-driven modularity. All preprocessing stages implement explicit protocol interfaces, enabling parsers, template miners (e.g. Drain3), and sequencing strategies to be swapped without altering downstream logic.
Explicit sequencing strategies. Entity-based, fixed-length, and time-windowed sequences are built with deterministic split controls.
Dataset-first workflows. Built-in benchmark presets and custom datasets share the same public interface.
Scalable, artifact-first storage. Structured events are persisted in Parquet by default so expensive parsing can be reused.

Research Usage

Unlike model-centric repositories that assume preprocessed inputs, AnomaLog makes preprocessing part of the research surface. A typical workflow is:

Materialise a templated dataset (raw → structured → templates).
Generate deterministic sequences under an explicit split protocol.
Plug in any detector that consumes TemplateSequence.

Determinism is a property of the pipeline, not the random number generator. Event ordering is defined by the default dataset backend and preserved through sequencing. This allows for reproducible train/test splits across runs without requiring random seeds.

from anomalog import SplitLabel
from anomalog.presets import bgl

dataset = bgl.build()
sequence_view = dataset.group_by_entity().with_train_fraction(0.2)

for seq in sequence_view:
    if seq.split_label == SplitLabel.TRAIN:
        ...

Custom Dataset Definition

To add a dataset, define a DatasetSpec by specifying the source, structured parser, optional label alignment, and template parser. This makes dataset provenance and preprocessing assumptions explicit and versionable.

from pathlib import Path

from anomalog import DatasetSpec
from anomalog.labels import CSVReader
from anomalog.parsers import HDFSV1Parser
from anomalog.sources import LocalZipSource

dataset = (
    DatasetSpec("my-hdfs")
    .from_source(LocalZipSource(Path("HDFS_v1.zip"), raw_logs_relpath=Path("HDFS.log")))
    .parse_with(HDFSV1Parser())
    .label_with(
        CSVReader(
            relative_path=Path("preprocessed/anomaly_label.csv"),
            entity_column="BlockId",
            label_column="Label",
        ),
    )
    .build()
)

Built-in presets

from anomalog.presets import bgl, hdfs_v1

bgl_dataset = bgl.build()
hdfs_dataset = hdfs_v1.build()

Preprocessing Ablation Studies

Preprocessing decisions such as the template miner, label alignment, and grouping strategy can be treated as experimental variables rather than hidden implementation details.

from anomalog.parsers import Drain3Parser, IdentityTemplateParser
from anomalog.presets import bgl

drain_dataset = bgl.template_with(Drain3Parser).build()
identity_dataset = bgl.template_with(IdentityTemplateParser).build()

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harens

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

Apr 14, 2026

This version

0.2.0

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anomalog-0.2.0.tar.gz (28.5 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anomalog-0.2.0-py3-none-any.whl (40.6 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file anomalog-0.2.0.tar.gz.

File metadata

Download URL: anomalog-0.2.0.tar.gz
Upload date: Mar 31, 2026
Size: 28.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for anomalog-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`f11a79cacf1c1df47a9670bd66fdc0fbad30d127d6fe68bb4648d91def5415e9`
MD5	`0ee1a11fdb0f2a60ce47db7f9009caf8`
BLAKE2b-256	`8daf0b66eb6259b7ff4b0da95a5a065c4de40291cfd907da1816b5f0d16120ad`

See more details on using hashes here.

File details

Details for the file anomalog-0.2.0-py3-none-any.whl.

File metadata

Download URL: anomalog-0.2.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 40.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for anomalog-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2891054999001cf9c0f2a61e6c1dfb0ae14d90c6581c5febff151526d76a0080`
MD5	`6342211a2bdf7760aba8a54ec0837155`
BLAKE2b-256	`ad3ab0102af32580949d204de9a97f3e599b6675b9fa5f35dd8dec39a65f6a26`

See more details on using hashes here.

anomalog 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

AnomaLog

Motivation

Key Features

Research Usage

Custom Dataset Definition

Built-in presets

Preprocessing Ablation Studies

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes