Library with a collection of useful classes and methods to DRY
Project description
Mango Autoencoder
A Python library for anomaly detection in time series using neural autoencoders.
Description
Mango Autoencoder is a specialized tool for time series analysis that uses neural autoencoder networks to detect anomalies and reconstruct data. It is designed to be highly configurable and easy to use, with advanced data processing capabilities.
Key features
- Flexible Neural Architectures: Supports LSTM, GRU, and RNN
- Anomaly Detection: Automatic identification of anomalous patterns in time series
- Data Reconstruction: Ability to reconstruct missing or corrupted data
- Advanced Processing: Normalization, imputation, and handling of missing values
- Integrated Visualization: Plotting tools for result analysis
- Bidirectional Configuration: Support for bidirectional encoders and decoders
- Mask Handling: Intelligent data processing with custom masks
- New Data Reconstruction: Reconstruct unknown data with iterative improvement
Installation
uv add mango-autoencoder
Dependencies
- Python >= 3.10
- TensorFlow >= 2.18.0
- Pandas >= 2.0.3
- Polars >= 1.31.0
- Scikit-learn >= 1.6.1
- Plotly >= 6.2.0
Basic usage
from mango_autoencoder import AutoEncoder
import numpy as np
# Create autoencoder instance
autoencoder = AutoEncoder()
# Configure and train the model
autoencoder.build_and_train(
context_window=10,
data=time_series_data,
time_step_to_check=[0, 1, 2],
feature_to_check=[0, 1],
hidden_dim=64,
form="lstm",
epochs=100
)
# Reconstruct data
reconstruction = autoencoder.reconstruct()
Advanced usage: Reconstructing New Data
The reconstruct_new_data method allows you to reconstruct unknown data using a trained model. This is particularly useful for:
- Missing Data Imputation: Fill in missing values in time series
- Data Quality Improvement: Correct corrupted or noisy data
- Iterative Refinement: Improve reconstruction quality through multiple iterations
Example Usage
from pathlib import Path
from mango_autoencoder import AutoEncoder
# Load a trained model
model = AutoEncoder.load_from_pickle("path/to/model.pkl")
# Set up output directory
reconstruct_output_dir = Path("autoencoder_output/reconstruction")
reconstruct_output_dir.mkdir(parents=True, exist_ok=True)
# Perform reconstruction on new data
reconstructed_results = model.reconstruct_new_data(
id_columns="source_file",
data=data,
iterations=3,
save_path=str(reconstruct_output_dir),
reconstruction_diagnostic=True
)
Parameters
data: Input data (numpy array, pandas DataFrame, or polars DataFrame)iterations: Number of reconstruction iterations (default: 1)- Higher iterations can improve reconstruction quality for data with many missing values
- Each iteration uses the previous reconstruction to improve the next one
id_columns: Column(s) that define IDs to process reconstruction separately- Useful when data contains multiple time series (e.g., different sensors, locations)
- Can be a string, integer, or list of strings/integers
save_path: Path to save reconstruction results and diagnosticsreconstruction_diagnostic: If True, generates error analysis and visualization files
How It Works
- Data Validation: Checks that the new data has the same features as the training data
- ID Processing: Separates data by ID columns if specified
- Iterative Reconstruction:
- For each iteration, the model reconstructs the data
- Missing values (NaN) are filled with reconstructed values
- The process repeats to improve reconstruction quality
- Result Generation: Returns reconstructed data and optionally saves diagnostic files
Output Files
Training Phase
When you train a model with build_and_train(), the following files are created in the specified save_path:
Model Files
models/model.pkl: Main model file containing the trained Keras model and training parametersmodels/{epoch}.pkl: Checkpoint files saved everycheckpointepochs (e.g.,10.pkl,20.pkl)
Visualization Files
loss_history.html: Interactive plot showing training and validation loss over epochs
Reconstruction Files (if reconstruction_diagnostic=True)
actual_vs_reconstructed.html: Interactive plot comparing original vs reconstructed datareconstruction_error.csv: Detailed reconstruction error datareconstruction_error_summary.csv: Summary statistics of reconstruction errorsreconstruction_error_boxplot.html: Box plot visualization of reconstruction errors by feature and data split
Reconstruction Phase (reconstruct_new_data)
When using reconstruct_new_data(), the following files are created in the specified save_path:
Reconstruction Results
reconstruct_new_data/{id}_reconstruction_results.csv: Reconstructed data for each ID (or "global" if no IDs)
Diagnostic Files (if reconstruction_diagnostic=True)
reconstruct_new_data/{id}_reconstruction_error.csv: Reconstruction error data for each IDreconstruct_new_data/{id}_reconstruction_error_summary.csv: Summary statistics for each IDreconstruct_new_data/{id}_reconstruction_error_boxplot.html: Box plot of reconstruction errors for each ID
File Structure Example
autoencoder_output/
├── models/
│ ├── model.pkl
│ ├── 10.pkl
│ ├── 20.pkl
│ └── ...
├── loss_history.html
├── actual_vs_reconstructed.html
├── reconstruction_error.csv
├── reconstruction_error_summary.csv
├── reconstruction_error_boxplot.html
└── reconstruct_new_data/
├── global_reconstruction_results.csv
├── global_reconstruction_error.csv
├── global_reconstruction_error_summary.csv
└── global_reconstruction_error_boxplot.html
Project structure
mango_autoencoder/
├── mango_autoencoder/
│ ├── autoencoder.py # Main autoencoder class
│ ├── modules/
│ │ ├── encoder.py # Encoding module
│ │ ├── decoder.py # Decoding module
│ │ └── anomaly_detector.py # Anomaly detector
│ ├── utils/
│ │ ├── processing.py # Processing utilities
│ │ ├── plots.py # Visualization tools
│ │ └── sequences.py # Sequence processing
│ ├── tests/ # Unit tests
│ │ └── test_autoencoder.py # Autoencoder tests
│ └── logging/ # Logging utilities
├── pyproject.toml # Project configuration
└── uv.lock # Dependency lock file
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mango_autoencoder-0.1.0a2.tar.gz.
File metadata
- Download URL: mango_autoencoder-0.1.0a2.tar.gz
- Upload date:
- Size: 153.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a9235e26b29010d9a1252c04d6c76c39518be3f26c54c55a9ee0b876676c83f
|
|
| MD5 |
5d6be9227132fbb1f73890a034430199
|
|
| BLAKE2b-256 |
a3aefb5ca4a4edbd21883398465a0d3cc3acca6b0047d064db4ea0120c0a309d
|
File details
Details for the file mango_autoencoder-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: mango_autoencoder-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 63.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dadb25f7d86841b1b5d0c43613414daa6742f781224f6781c7353d85df2ffb67
|
|
| MD5 |
89fc91d8cf121388020020498ec2d823
|
|
| BLAKE2b-256 |
95f62d4e31ae461241d179fa7f212a10db02f18292e3510166de3067ed5ad79e
|