Python implementation of the bisect algorithm for hidden outlier generation.

These details have not been verified by PyPI

Project links

Project description

Hidden Outlier Generation

A Python library for generating synthetic hidden outliers using the BISECT algorithm.

Background

Douglas Hawkins defined an outlier as "an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism." In two dimensions, outliers are easy to spot. But in high-dimensional data (hundreds or thousands of features), the notion of "far from everything else" breaks down. The curse of dimensionality makes traditional outlier detection unreliable.

Hidden outliers are anomalies that look perfectly normal in the full feature space but reveal their true nature in specific feature subspaces. Consider a financial transaction: the amount, time, and location might each look normal individually, but the specific combination creates an anomaly. These are exactly the outliers that slip through conventional detection methods, and they appear frequently in fraud detection, infrastructure monitoring, and healthcare analytics.

This library implements the BISECT algorithm for generating hidden outliers, combined with manifold learning techniques to make generation tractable in high-dimensional spaces.

How It Works

The BISECT algorithm finds points in the "area of disagreement" between full-space and subspace outlier detection models:

Origin Selection: Start from an inlier point (using strategies like weighted sampling toward the boundary)
Direction: Pick a random direction by sampling from a d-dimensional unit sphere
Bisection: Use binary search along the direction to find the boundary where outlier status changes

For high-dimensional data, the library supports projecting data onto a learned manifold (via autoencoders or PCA), generating hidden outliers in the tractable latent space, then decoding back to the original space.

What are Hidden Outliers?

Hidden outliers are data points that exhibit different outlier behavior depending on which feature subspace you examine:

H1 (Subspace Hidden): Outlier in some feature subspace but NOT in the full feature space
H2 (Full-space Hidden): Outlier in the full feature space but NOT in any subspace

These are useful for benchmarking outlier detection algorithms, especially subspace-aware methods.

Installation

pip install hidden-outlier-generation

Or with uv:

uv add hidden-outlier-generation

Quick Start

import numpy as np
from pyod.models.lof import LOF
from hog_bisect import BisectHOGen

# Your dataset
data = np.random.randn(200, 5)

# Create generator
generator = BisectHOGen(
    data=data,
    outlier_detection_method=LOF,
    seed=42
)

# Generate hidden outliers
hidden_outliers = generator.fit_generate(gen_points=50)

print(f"Generated {len(hidden_outliers)} hidden outliers")
generator.print_summary()

Features

Multiple origin strategies: centroid, least outlier, random, weighted
Flexible detection methods: Any PyOD detector (LOF, KNN, IForest, etc.)
Parallel processing: Use n_jobs=-1 for multi-core execution
Reproducible: Seed parameter for deterministic results
Type hints: Full typing support with py.typed marker

API Reference

BisectHOGen

BisectHOGen(
    data: np.ndarray,                    # Input dataset (n_samples, n_features)
    outlier_detection_method=LOF,        # PyOD detector class
    seed: int = 5,                       # Random seed
    max_dimensions: int = 11             # Threshold for random subspace sampling
)

fit_generate()

generator.fit_generate(
    gen_points: int = 100,               # Number of candidate points
    check_fast: bool = True,             # Fast subspace checking
    is_fixed_interval_length: bool = True,
    get_origin_type: str = "weighted",   # Origin strategy
    verbose: bool = False,
    n_jobs: int = 1                      # Parallel workers (-1 for all cores)
) -> np.ndarray                          # Array of hidden outliers

Origin Types

Type	Description
`"centroid"`	Use data mean as origin (deterministic)
`"least outlier"`	Use most normal point (stable)
`"random"`	Random inlier each iteration (diverse)
`"weighted"`	Weighted random toward normal points (recommended)

Examples

The repository includes example scripts in the examples/ directory:

# Clone the repo to access examples
git clone https://github.com/dschulmeist/hidden-outlier-generation
cd hidden-outlier-generation

# Run examples
python examples/basic_usage.py
python examples/compare_origins.py
python examples/compare_detectors.py
python examples/visualize_outliers.py  # requires matplotlib

Development

# Clone and install with dev dependencies
git clone https://github.com/dschulmeist/hidden-outlier-generation
cd hidden-outlier-generation
uv sync --all-extras

# Run tests
uv run pytest

# Run linting
uv run ruff check

See CONTRIBUTING.md for development guidelines and release process.

Research

This library was developed as part of a bachelor thesis at Karlsruhe Institute of Technology (KIT) exploring hidden outlier generation using deep learning and manifold learning. Key findings:

Autoencoders capture non-linear manifold structure better than linear methods like PCA
Hidden outliers can serve as synthetic positive class for training classifiers, turning unsupervised anomaly detection into a supervised problem
Manifold projection makes high-dimensional hidden outlier generation tractable by reducing the exponential subspace search problem

Experiment notebooks demonstrating these findings are coming soon.

License

MIT License - see LICENSE for details.

Citation

If you use this library in your research, please cite:

@software{hidden_outlier_generation,
  title = {Hidden Outlier Generation},
  author = {Schulmeister, David},
  url = {https://github.com/dschulmeist/hidden-outlier-generation},
  year = {2023}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Jan 13, 2026

1.0.0

Sep 25, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hidden_outlier_generation-1.0.1.tar.gz (207.9 kB view details)

Uploaded Jan 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hidden_outlier_generation-1.0.1-py3-none-any.whl (26.3 kB view details)

Uploaded Jan 13, 2026 Python 3

File details

Details for the file hidden_outlier_generation-1.0.1.tar.gz.

File metadata

Download URL: hidden_outlier_generation-1.0.1.tar.gz
Upload date: Jan 13, 2026
Size: 207.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hidden_outlier_generation-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`7a1529b7c2754ff010989d50672a5b3e724d36ba5077922966e036e8f37f277f`
MD5	`2b59c0da7a82e168ca1d0f4ddd22b936`
BLAKE2b-256	`b4ea9b9b575b9a003672821467e310b59c5376965b56e39d5d96fdce670b5160`

See more details on using hashes here.

File details

Details for the file hidden_outlier_generation-1.0.1-py3-none-any.whl.

File metadata

Download URL: hidden_outlier_generation-1.0.1-py3-none-any.whl
Upload date: Jan 13, 2026
Size: 26.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hidden_outlier_generation-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc368ee2c2388d0d4b3837c931d9263d69cbac542d0f42464043d998ead8ef50`
MD5	`9024d658bd5fdbe1db57ebdc35846575`
BLAKE2b-256	`fd079274d7063e56280451196e9184cefb21f4d297a4bb96877b83f9ffad4d41`

See more details on using hashes here.

hidden-outlier-generation 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hidden Outlier Generation

Background

How It Works

What are Hidden Outliers?

Installation

Quick Start

Features

API Reference

BisectHOGen

fit_generate()

Origin Types

Examples

Development

Research

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes