segmentae

SegmentAE: A Python Library for Anomaly Detection Optimization

These details have not been verified by PyPI

Project links

Homepage

Project description

Framework Overview

SegmentAE is designed to enhance anomaly detection performance through the optimization of reconstruction error by integrating and intersecting clustering methods with tabular autoencoders. Built with enterprise-grade architecture, it provides a versatile, scalable, and robust solution for anomaly detection applications in domains such as financial fraud detection, network security, and industrial monitoring.

Key Architectural Features (v2.0+)

Professional Architecture: Clean separation of concerns with robust principles
Type Safety: Comprehensive Pydantic validation and type hints throughout
Design Patterns: Registry, Strategy, and Template Method patterns
Enum-Based Configuration: Type-safe constants for all parameters
Custom Exceptions: Informative error messages with actionable suggestions

Key Features and Capabilities

1. General Applicability on Tabular Datasets

SegmentAE is engineered to handle a wide range of tabular datasets, making it suitable for various anomaly detection tasks across different use case contexts. It can be seamlessly integrated into diverse applications, ensuring broad utility and adaptability.

2. Optimization and Customization

The framework offers complete configurability for each component of the anomaly detection pipeline, including:

Data Preprocessing: Encoding, scaling, and imputation with Pydantic validation
Clustering Algorithms: Registry-based clustering with easy extensibility
Autoencoder Integration: Support for custom Keras/TensorFlow models or built-in implementations

Each component can be fine-tuned to achieve optimal performance tailored to specific use cases.

3. Enhanced Detection Performance

By leveraging a combination of clustering algorithms and advanced anomaly detection techniques, SegmentAE aims to improve the accuracy and reliability of anomaly detection. The integration of tabular autoencoders with clustering mechanisms ensures that the framework effectively captures and identifies different patterns in the input data, optimizing the reconstruction error for each cluster, thereby enhancing predictive performance.

Main Development Tools

Major frameworks used to build this project:

Where to Get It

Binary installer for the latest released version is available at the Python Package Index (PyPI).

GitHub Project Link: https://github.com/TsLu1s/SegmentAE

Installation

To install this package from the PyPI repository, run the following command:

pip install segmentae

SegmentAE - Technical Components and Pipeline Structure

The SegmentAE framework consists of several integrated components, each playing a critical role in the optimization of anomaly detection through clustering and tabular autoencoders. The pipeline is structured with professional design patterns to ensure seamless data flow and modular customization.

1. Data Preprocessing

Proper preprocessing is crucial for ensuring the quality and consistency of data. The preprocessing module now includes:

Pydantic Validation: Automatic type checking and conversion
Type-Safe Configuration: Enum-based parameter selection
Missing Value Imputation: Simple statistical imputation methods
Normalization: MinMax, Standard, and Robust scaling options
Categorical Encoding: Inverse Frequency, Label, and One-Hot Encoding

Example:

from segmentae.preprocessing import Preprocessing
from segmentae.core import EncoderType, ScalerType

# Type-safe configuration with enums
pr = Preprocessing(
    encoder=EncoderType.IFREQUENCY,  
    scaler=ScalerType.MINMAX,
    imputer="Simple"                # Strings also are supported
)
pr.fit(X_train)
X_transformed = pr.transform(X_test)

2. Clustering

Clustering forms the backbone of the SegmentAE framework, provided with easy extensibility:

Registry Pattern: Clean model registration and instantiation
Type Safety: Pydantic validation for all parameters
Four Algorithms: K-Means, MiniBatch K-Means, Gaussian Mixture, Agglomerative
Extensible Design: Easy to add new clustering algorithms

Example:

from segmentae.clustering import Clustering
from segmentae.core import ClusterModel

cl = Clustering(
    cluster_model=[ClusterModel.KMEANS],  # Enum-based
    n_clusters=3
)
cl.clustering_fit(X_train)

3. Anomaly Detection - Autoencoders

The core of the SegmentAE framework employs advanced autoencoder architectures:

Three Baseline Implementations: Dense, BatchNorm, and Ensemble autoencoders
Custom Model Support: Integrate any Keras/TensorFlow model
Full Customization: Network architecture, training epochs, activation layers, and more
Type-Safe Integration: Validated through protocols

The framework includes three baseline autoencoder algorithms for user application, allowing complete customization of network architecture, training parameters, and activation functions.

Custom Model Integration: You can build your own autoencoder model (Keras-based) and integrate it seamlessly into the SegmentAE pipeline →

Unlabeled Data Support: Application example for totally unlabeled data available here →

SegmentAE - Predictive Application

The following example demonstrates the complete workflow from data loading to anomaly detection using a DenseAutoencoder integrated with KMeans clustering.

import pandas as pd
from segmentae import SegmentAE, Preprocessing, Clustering
from segmentae.autoencoders import DenseAutoencoder
from segmentae.core import EncoderType, ScalerType, ClusterModel, ThresholdMetric
from segmentae.data_sources import load_dataset
from segmentae.metrics import metrics_classification
from sklearn.model_selection import train_test_split

############################################################################################
### Data Loading

train, test, target = load_dataset(
    dataset_selection='htru2_dataset',
    split_ratio=0.75
)

test, future_data = train_test_split(test, train_size=0.9, random_state=5)

# Reset indices (required)
train = train.reset_index(drop=True)
test = test.reset_index(drop=True)
future_data = future_data.reset_index(drop=True)

# Separate features and targets
X_train, y_train = train.drop(columns=[target]).copy(), train[target].astype(int)
X_test, y_test = test.drop(columns=[target]).copy(), test[target].astype(int)
X_future_data = future_data.drop(columns=[target]).copy()
y_future_data = future_data[target].astype(int)

############################################################################################
### Preprocessing with Type-Safe Configuration (v2.0+)

pr = Preprocessing(
    encoder=EncoderType.IFREQUENCY,  # Type-safe enum: EncoderType.LABEL, EncoderType.ONEHOT
    scaler=ScalerType.MINMAX,        # Type-safe enum: ScalerType.STANDARD, ScalerType.ROBUST
    imputer=None
)

pr.fit(X=X_train)
X_train = pr.transform(X=X_train)
X_test = pr.transform(X=X_test)
X_future_data = pr.transform(X=X_future_data)

############################################################################################
### Clustering Implementation with Type-Safe Registry

cl_model = Clustering(
    cluster_model=[ClusterModel.KMEANS],  # Type-safe enum: ClusterModel.MINIBATCH_KMEANS, ClusterModel.GMM, ClusterModel.AGGLOMERATIVE
    n_clusters=3
)
cl_model.clustering_fit(X=X_train)

############################################################################################
### Autoencoder Implementation

denseAutoencoder = DenseAutoencoder(
    hidden_dims=[16, 12, 8, 4],
    encoder_activation='relu',
    decoder_activation='relu',
    optimizer='adam',
    learning_rate=0.001,
    epochs=150,
    val_size=0.15,
    stopping_patient=20,
    dropout_rate=0.1,
    batch_size=None
)
denseAutoencoder.fit(input_data=X_train)
denseAutoencoder.summary()

############################################################################################
### Autoencoder + Clustering Integration

sg = SegmentAE(ae_model=denseAutoencoder, cl_model=cl_model)

############################################################################################
### Train Reconstruction with Type-Safe Metric (v2.0+)

sg.reconstruction(
    input_data=X_train,
    threshold_metric=ThresholdMetric.MSE  # Type-safe enum: ThresholdMetric.MAE, ThresholdMetric.RMSE, ThresholdMetric.MAX_ERROR
)

############################################################################################
### Reconstruction Performance Evaluation

results = sg.evaluation(
    input_data=X_test,
    target_col=y_test,
    threshold_ratio=2.0
)

# Access test metadata by cluster
preds_test, recon_metrics_test = sg.preds_test, sg.reconstruction_test

# View global metrics
print(results['global metrics'])
print(results['clusters metrics'])

############################################################################################
### Multiple Threshold Ratio Evaluation

threshold_ratios = [0.75, 1, 1.5, 2, 3, 4]

global_results = pd.concat([
    sg.evaluation(input_data=X_test, target_col=y_test, threshold_ratio=thr)["global metrics"]
    for thr in threshold_ratios
])

print("\nThreshold Optimization Results:")
print(global_results)

############################################################################################
### Anomaly Detection Predictions

best_ratio = global_results.sort_values(by="Accuracy", ascending=False).iloc[0]["Threshold Ratio"]

predictions = sg.detections(
    input_data=X_future_data,
    threshold_ratio=best_ratio
)

# Use the new metrics module for evaluation
final_metrics = metrics_classification(
    y_true=y_future_data,
    y_pred=predictions["Predicted Anomalies"]
)

print("\nFinal Performance Metrics:")
print(f"Accuracy: {final_metrics['Accuracy']}")
print(f"Precision: {final_metrics['Precision']}")
print(f"Recall: {final_metrics['Recall']}")
print(f"F1 Score: {final_metrics['F1 Score']}")

Grid Search Optimizer

SegmentAE includes a comprehensive optimization methodology through the SegmentAE_Optimizer class to systematically identify optimal configurations.

The optimizer evaluates combinations of:

Multiple autoencoders
Different clustering algorithms
Various cluster numbers
Different threshold ratios

Example:

from segmentae.optimization import SegmentAE_Optimizer
from segmentae.core import ClusterModel

# Type-safe enum-based optimization 
optimizer = SegmentAE_Optimizer(
    autoencoder_models=[autoencoder1, autoencoder2],
    n_clusters_list=[2, 3, 4],
    cluster_models=[ClusterModel.KMEANS, ClusterModel.GMM, ClusterModel.MINIBATCH_KMEANS],  # Type-safe enums
    threshold_ratios=[1, 1.5, 2, 3],
    performance_metric='f1_score'  # or 'Accuracy', 'Precision', 'Recall'
)
# Note: Strings are also supported 
# cluster_models=["KMeans", "GMM", "MiniBatchKMeans"]

# Run grid search
best_model = optimizer.optimize(X_train, X_test, y_test)

# View results
print(f"Best Performance: {optimizer.best_performance}")
print(f"Best Configuration:")
print(f"  - Clusters: {optimizer.best_n_clusters}")
print(f"  - Threshold: {optimizer.best_threshold_ratio}")
print("\nLeaderboard:")
print(optimizer.leaderboard.head(10))

For a complete optimizer example →

Template Example Applications

1. Basic Custom Model

Use your own Keras autoencoder with SegmentAE:

Example: basic_model.py
Shows custom Sequential model integration
Demonstrates multiple threshold evaluation

2. Baseline Autoencoders

Use built-in DenseAutoencoder or BatchNormAutoencoder:

Example: baseline_models.py
Shows built-in autoencoder usage
Includes model summary and training visualization

3. Grid Search Optimization

Find optimal configuration automatically:

Example: optimizer_application.py
Evaluates multiple autoencoders and clustering configs
Multiple clustering algorithms
Generates performance leaderboard

4. Unlabeled Data Detection

Detect anomalies without ground truth labels:

Example: unlabeled_application.py
Shows reconstruction-only workflow
Useful for production deployment

Interactive Notebooks

For a more interactive experience, feel free to explore the Jupyter notebooks with step-by-step execution and guidelines:

📓 Interactive Notebooks

If you use SegmentAE in your research, please cite:

@software{segmentae2024,
  author = {Luís Fernando Santos},
  title = {SegmentAE: A Python Library for Anomaly Detection Optimization},
  year = {2024},
  publisher = {PyPI},
  url = {https://pypi.org/project/segmentae/}
}

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Luis Santos - LinkedIn

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.5.26

Feb 10, 2026

1.5.25

Feb 2, 2026

1.5.21

Jan 26, 2026

1.5.20

Jan 19, 2026

1.5.16

Dec 29, 2025

1.5.10

Dec 29, 2025

1.5.0

Dec 29, 2025

1.0.27

Oct 24, 2024

1.0.26

Jul 31, 2024

1.0.21

Jul 28, 2024

1.0.20

Jul 22, 2024

1.0.10

Jul 16, 2024

1.0.0

Jun 19, 2024

0.9.0

Jun 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

segmentae-1.5.26-py3-none-any.whl (48.7 kB view details)

Uploaded Feb 10, 2026 Python 3

File details

Details for the file segmentae-1.5.26-py3-none-any.whl.

File metadata

Download URL: segmentae-1.5.26-py3-none-any.whl
Upload date: Feb 10, 2026
Size: 48.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for segmentae-1.5.26-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e4268702762bf1af9b41ff0ae1511d498742f0c0a6b6df52bb30546408c4f48b`
MD5	`c0045aeb2afea1c262f4b2cbfc27b80e`
BLAKE2b-256	`ba98b7804655174bd5d03e84c481bd2e2e29c349bba0bb87c6efffad29c79f77`

See more details on using hashes here.

segmentae 1.5.26

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Framework Overview

Key Architectural Features (v2.0+)

Key Features and Capabilities

1. General Applicability on Tabular Datasets

2. Optimization and Customization

3. Enhanced Detection Performance

Main Development Tools

Where to Get It

Installation

SegmentAE - Technical Components and Pipeline Structure

1. Data Preprocessing

2. Clustering

3. Anomaly Detection - Autoencoders

SegmentAE - Predictive Application

Grid Search Optimizer

Template Example Applications

1. Basic Custom Model

2. Baseline Autoencoders

3. Grid Search Optimization

4. Unlabeled Data Detection

Interactive Notebooks

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes