Skip to main content

Bayesian Active Learning Isolation Forest

Project description

BALIF: Bayesian Active Learning Isolation Forest

Version: 0.1.2
License: MIT

Description

Convert unsupervised tree ensembles into Bayesian Anomaly Detectors (BAD) that can be updated dynamically. BAD models are build on top of the popular PyOD and keep the original interface, while adding cool new capabilities for:

  • Weakly Supervised Learning
  • Active Learning
  • Lifelong Learning

Installation

Install BALIF using pip:

pip install balif

Usage

PyOD Compatibility

BAD model maintain the same interface as PyOD, making it easy to integrate into existing workflows. The core methods like fit(), decision_function(), and predict() work exactly the same way as in standard PyOD models. This allows users to seamlessly switch between regular PyOD models and BALIF's Bayesian versions with minimal code changes.

from pyod.models.iforest import IForest
from balif import BADIForest
import numpy as np

# Generate some data
X_inliers = np.random.randn(1000, 5)
X_outliers = np.random.uniform(low=-4, high=4, size=(50, 5))
X_train = np.concatenate([X_inliers, X_outliers], axis=0)

# BAD model follow the PyOD interface
pyod_model = IForest().fit(X_train)
bad_model = BADIForest().fit(X_train)

# Get anomaly scores
scores = pyod_model.decision_function(X_train)
scores = bad_model.decision_function(X_train)

# Predict if points are anomalies
predictions = pyod_model.predict(X_train)
predictions = bad_model.predict(X_train)

Incremental Learning with .update()

BAD models support incremental learning through the .update() method, allowing you to update the model with new data without retraining from scratch:

# New labelled data becomes available
X_new = np.random.randn(100, 5)
y_new = np.array([0] * 90 + [1] * 10)  # 0: normal, >=1: anomaly

# Update the model with the new data
bad_model.update(X_new, y_new)

# The model now incorporates knowledge from both datasets
updated_scores = bad_model.decision_function(X_test)

Note: For some applications, it might be necessary to recompute the contamination threshold after updating the model, especially if the distribution of your data changes significantly over time.

Active Learning with the AL Module

BALIF includes an active learning module that helps identify the most informative instances for labeling:

from balif import active_learning, BADIForest

# Generate data and fit model
X_inliers = np.random.randn(1000, 5)
X_outliers = np.random.uniform(low=-4, high=4, size=(50, 5))
X_train = np.concatenate([X_inliers, X_outliers], axis=0)
model = BADIForest().fit(X_train)

# get top-k most interesting points 
queries_idx = active_learning.get_queries_independent(
    model, X_train, interest_method="margin", batch_size=10
)

The active learning module offers several query strategies:

  • 'margin': Prioritize instances with predictions close to the decision boundary.
  • 'anom': Prioritize instances with high anomaly score
  • 'bald': Prioritize instances with high mutual entropy between prediction and parameters

Active learning can significantly reduce the labeling effort while maintaining high model performance.

Batteries included with ODDS dataset

BALIF provides easy access to benchmark anomaly detection datasets from the Outlier Detection DataSets (ODDS) repository:

from balif import odds_datasets

# Show included Datasets from ODDS
for name in odds_datasets.dataset_names:
    X, y = odds_datasets.load(name)
    print(f"DATASET: {dataset}")
    print(f"X: {X.shape}")
    print(f"contamination: {100*y.mean():.2f}%")
    print()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

balif-0.3.0.tar.gz (5.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

balif-0.3.0-py3-none-any.whl (5.5 MB view details)

Uploaded Python 3

File details

Details for the file balif-0.3.0.tar.gz.

File metadata

  • Download URL: balif-0.3.0.tar.gz
  • Upload date:
  • Size: 5.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for balif-0.3.0.tar.gz
Algorithm Hash digest
SHA256 431e11796476715a0c4540efbe92bd5d7e78cac32a954c742fb9c6cab401255d
MD5 99fe9ec99d6c08d09825f423939a5c1f
BLAKE2b-256 5934070819478c7438cbc809e3da1dc162ce0fb3bc45da5a1df8c11aae3103e6

See more details on using hashes here.

File details

Details for the file balif-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: balif-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 5.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for balif-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 554db7d5e87d12b4588e1e356497210c674d8d935037a96ff81cd7d4a42082f5
MD5 4b84033a452b00909941f786a750f2df
BLAKE2b-256 ae2f640f703cb79bb9d330c57528e9ea39b82e138103fd268bd1cf3b4346e7c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page