Skip to main content

Bayesian Active Learning Isolation Forest

Project description

BALIF: Bayesian Active Learning Isolation Forest

Version: 0.1.2
License: MIT

Description

Convert unsupervised tree ensembles into Bayesian Anomaly Detectors (BAD) that can be updated dynamically. BAD models are build on top of the popular PyOD and keep the original interface, while adding cool new capabilities for:

  • Weakly Supervised Learning
  • Active Learning
  • Lifelong Learning

Installation

Install BALIF using pip:

pip install balif

Usage

PyOD Compatibility

BAD model maintain the same interface as PyOD, making it easy to integrate into existing workflows. The core methods like fit(), decision_function(), and predict() work exactly the same way as in standard PyOD models. This allows users to seamlessly switch between regular PyOD models and BALIF's Bayesian versions with minimal code changes.

from pyod.models.iforest import IForest
from balif import BADIForest
import numpy as np

# Generate some data
X_inliers = np.random.randn(1000, 5)
X_outliers = np.random.uniform(low=-4, high=4, size=(50, 5))
X_train = np.concatenate([X_inliers, X_outliers], axis=0)

# BAD model follow the PyOD interface
pyod_model = IForest().fit(X_train)
bad_model = BADIForest().fit(X_train)

# Get anomaly scores
scores = pyod_model.decision_function(X_train)
scores = bad_model.decision_function(X_train)

# Predict if points are anomalies
predictions = pyod_model.predict(X_train)
predictions = bad_model.predict(X_train)

Incremental Learning with .update()

BAD models support incremental learning through the .update() method, allowing you to update the model with new data without retraining from scratch:

# New labelled data becomes available
X_new = np.random.randn(100, 5)
y_new = np.array([0] * 90 + [1] * 10)  # 0: normal, >=1: anomaly

# Update the model with the new data
bad_model.update(X_new, y_new)

# The model now incorporates knowledge from both datasets
updated_scores = bad_model.decision_function(X_test)

Note: For some applications, it might be necessary to recompute the contamination threshold after updating the model, especially if the distribution of your data changes significantly over time.

Active Learning with the AL Module

BALIF includes an active learning module that helps identify the most informative instances for labeling:

from balif import active_learning, BADIForest

# Generate data and fit model
X_inliers = np.random.randn(1000, 5)
X_outliers = np.random.uniform(low=-4, high=4, size=(50, 5))
X_train = np.concatenate([X_inliers, X_outliers], axis=0)
model = BADIForest().fit(X_train)

# get top-k most interesting points 
queries_idx = active_learning.get_queries_independent(
    model, X_train, interest_method="margin", batch_size=10
)

The active learning module offers several query strategies:

  • 'margin': Prioritize instances with predictions close to the decision boundary.
  • 'anom': Prioritize instances with high anomaly score
  • 'bald': Prioritize instances with high mutual entropy between prediction and parameters

Active learning can significantly reduce the labeling effort while maintaining high model performance.

Batteries included with ODDS dataset

BALIF provides easy access to benchmark anomaly detection datasets from the Outlier Detection DataSets (ODDS) repository:

from balif import odds_datasets

# Show included Datasets from ODDS
for name in odds_datasets.dataset_names:
    X, y = odds_datasets.load(name)
    print(f"DATASET: {dataset}")
    print(f"X: {X.shape}")
    print(f"contamination: {100*y.mean():.2f}%")
    print()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

balif-0.2.0.tar.gz (5.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

balif-0.2.0-py3-none-any.whl (5.5 MB view details)

Uploaded Python 3

File details

Details for the file balif-0.2.0.tar.gz.

File metadata

  • Download URL: balif-0.2.0.tar.gz
  • Upload date:
  • Size: 5.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for balif-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ebc3420ed3dbbf017cce94e9c1a9e9425d01c22dd56d828c0e4d7fcb80546392
MD5 5be5dcab62eb37ee2935e5119a9425fa
BLAKE2b-256 32b5182322d5dece763fb392f1d820e0ab18aba75f484adf4aff6edfcc874213

See more details on using hashes here.

File details

Details for the file balif-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: balif-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 5.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for balif-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 03941b1a9083d6e9ea60b7dab874da9e0c8851cb34bb0a62ef3e3739ceabf3f1
MD5 49a15e5aa8b3289a7a2a75002713b696
BLAKE2b-256 ca06ce00c90d163a164a4a7cab0360b344274940cd318e86252ccdc7fee1dc1d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page