Prometheus time series anomaly detection with LSTM Autoencoder
Project description
Prometheus Time Series Anomaly Detection with LSTM Autoencoder
This project implements a system for detecting anomalies in time series data collected from Prometheus. It uses an LSTM (Long Short-Term Memory) autoencoder model built with TensorFlow/Keras to learn normal patterns from your metrics and identify deviations. The system includes scripts for data collection, preprocessing, model training, data filtering, and real-time anomaly detection, exposing results via a Prometheus exporter.
GitHub Repository: https://github.com/vpuhoff/prometheus-anomaly-detection-lstm PyPI Package: https://pypi.org/project/prometheus-anomaly-detection-lstm
Features
- Data Collection: Fetches time series data from a Prometheus instance for specified PromQL queries. The resulting dataset contains
day_of_weekandhour_of_daycolumns derived from timestamps. - Preprocessing: Handles missing values and normalizes/scales values for optimal model training. The day-of-week and hour-of-day features are also ensured at this stage.
- LSTM Autoencoder Training: Trains an LSTM autoencoder on the full preprocessed dataset.
- Data Filtering: An optional script to apply the trained model to filter out anomalous sequences from a dataset for further analysis.
- Real-time Anomaly Detection: Continuously monitors new data and processes it with the trained model to detect anomalies.
- Prometheus Exporter Integration: Exposes key anomaly detection metrics (e.g., reconstruction error, anomaly flag, per-feature errors) that can be scraped by Prometheus and monitored with tools like Grafana.
- Configurable: All stages are highly configurable via a central
config.yamlfile.
WIKI: deepwiki
Project Structure
.
├── config.yaml # Central configuration file for all scripts
├── cli.py # Command-line utility to run workflow stages
├── data_collector.py # Script to collect historical data from Prometheus
├── preprocess_data.py # Script to preprocess the collected data
├── train_autoencoder.py # Script to train the LSTM autoencoder
├── filter_anomalous_data.py # Optional script to filter data using the trained model
├── realtime_detector.py # Script for real-time anomaly detection and Prometheus exporter
├── Pipfile # Dependency declarations
├── Pipfile.lock # Locked versions of dependencies
└── README.md # This file
Prerequisites
- Python 3.12.
- Pipenv for managing dependencies.
- A running Prometheus instance (v2.x or later) that is scraping the metrics you want to analyze.
- (Optional) Exporters configured for your Prometheus to collect the desired metrics (e.g.,
windows_exporter).
Setup & Installation
-
Clone the Repository:
git clone https://github.com/vpuhoff/prometheus-anomaly-detection-lstm cd prometheus-anomaly-detection-lstm
-
Install Dependencies with Pipenv:
pipenv install --dev
After installation you can enter the environment using
pipenv shellor run scripts withpipenv run. -
Prometheus Setup: Ensure your Prometheus server is running and accessible. The scripts will query this server based on the URL and PromQL queries defined in
config.yaml. The example queries inconfig.yamlmight use metrics fromwindows_exporter; adapt these to your own available metrics.
Configuration (config.yaml)
The config.yaml file is central to running this project. Key sections include:
prometheus_url: URL of your Prometheus server.queries: Dictionary of PromQL queries with friendly aliases.data_settings: Parameters fordata_collector.py.collection_periods_iso: (Recommended) A list of specific time ranges to collect data from. This is the best way to create a high-quality training dataset by explicitly including periods of known normal operation and excluding periods with anomalies. If this parameter is present, it will be used instead of the other time settings.collection_periods_iso: - start: "2025-05-20T10:00:00" end: "2025-05-22T18:00:00" - start: "2025-05-25T09:00:00" end: "2025-05-27T12:00:00"
collection_period_hours,start_time_iso,end_time_iso: Legacy parameters for specifying a single data collection window. These are used only ifcollection_periods_isois not defined.step,output_filename: Defines the data sampling interval and the name of the output Parquet file.
preprocessing_settings: Parameters forpreprocess_data.py(e.g.,nan_fill_strategy,scaler_type,processed_output_filename,scaler_output_filename).training_settings: Parameters fortrain_autoencoder.py.model_output_filename: Filename for the trained model.sequence_length,train_split_ratio,epochs,batch_size,learning_rate,early_stopping_patience: Standard training hyperparameters.lstm_units_encoder1, etc.: LSTM autoencoder architecture definition.
data_filtering_settings: Parameters for the optionalfilter_anomalous_data.pyscript.normal_sequences_output_filename: Output file for sequences classified as normal.anomalous_sequences_output_filename: Output file for sequences classified as anomalous.
real_time_anomaly_detection: Parameters forrealtime_detector.py.query_interval_seconds: How often to fetch new data.anomaly_threshold_mse: Crucial! MSE threshold for declaring an anomaly. Tune this based on the error histogram generated during training.exporter_port: Port for the Prometheus exporter.metrics_prefix: Prefix for exposed Prometheus metrics.
Before running any script, review and customize config.yaml thoroughly.
Usage / Workflow
The project follows a sequential workflow. Each stage can be launched via the cli.py utility:
python cli.py collect # сбор данных
python cli.py preprocess # предобработка
python cli.py train # обучение модели
python cli.py detect # запуск realtime детектора
The sequential workflow is as follows:
Step 1: Data Collection (data_collector.py)
Collect historical data from your Prometheus instance. This script can combine data from multiple time ranges if specified in config.yaml under collection_periods_iso.
python data_collector.py
Output: Raw data Parquet file (e.g., prometheus_metrics_data.parquet) which includes day_of_week and hour_of_day columns.
Step 2: Data Preprocessing (preprocess_data.py)
Preprocess the collected data (handles NaNs, scales features).
python preprocess_data.py
Outputs: A processed data Parquet file (e.g., processed_metrics_data.parquet) and a saved scaler (e.g., fitted_scaler.joblib).
Step 3: Train Model (train_autoencoder.py)
Train the LSTM autoencoder on the entire preprocessed dataset from Step 2.
python train_autoencoder.py
Outputs:
- A trained Keras model (e.g.,
lstm_autoencoder_model.keras). - A training history plot (
training_history_loss_...png). - A reconstruction error histogram (
reconstruction_error_histogram_...png). Use this histogram to determine an appropriate value foranomaly_threshold_mseinconfig.yaml.
Step 4: Real-time Anomaly Detection (realtime_detector.py)
Run the real-time detector using the trained model from Step 3.
- Ensure
model_output_filenameintraining_settingspoints to your trained model. - Ensure
anomaly_threshold_mseinreal_time_anomaly_detectionis correctly set based on the histogram from Step 3.
python realtime_detector.py
The detector starts a Prometheus exporter (e.g., on http://localhost:8901/metrics).
Optional Step: Filter Data (filter_anomalous_data.py)
Use the trained model from Step 3 to classify sequences in your dataset as "normal" or "anomalous" for analysis.
- Ensure
anomaly_threshold_mseis appropriately set inconfig.yaml. - Configure output filenames in
data_filtering_settings.
python filter_anomalous_data.py
Outputs: .npy files containing the normal and anomalous sequences.
Monitoring (Prometheus & Grafana)
Configure Prometheus to scrape the metrics endpoint from realtime_detector.py. Visualize metrics like:
anomaly_detector_latest_reconstruction_error_mseanomaly_detector_is_anomaly_detectedanomaly_detector_total_anomalies_count_totalanomaly_detector_feature_reconstruction_error_mse{feature_name="your_alias"}
Interpreting Results
- Monitoring Metrics: Observe the
is_anomaly_detectedandlatest_reconstruction_error_msemetrics in real time to evaluate detection behavior. - Per-Feature Errors: When an anomaly is flagged, check the corresponding
feature_reconstruction_error_msemetrics (and the logs ofrealtime_detector.py) to see which specific time series (features) are contributing most to the anomaly.
Customization & Extending
- Monitoring New Metrics: Add new PromQL queries to
config.yaml. Retrain the model (run steps 2-3) to include these new features. - Tuning Anomaly Threshold: The
anomaly_threshold_msevalue is critical. Adjust it based on the training error histogram and desired sensitivity. - Model Architecture: Modify LSTM parameters in the
training_settingssection ofconfig.yaml.
Troubleshooting
- Python Dependencies: Ensure
Pipfile/Pipfile.lockare in sync and runpipenv installif packages change. - Prometheus Connection: Verify
prometheus_urland query validity inconfig.yaml. - Data Issues: Check for "No data found" errors; inspect PromQL queries and Prometheus scrape targets. Review
nan_fill_strategyif NaNs persist. - Model Training: If loss doesn't decrease, adjust learning rate, batch size, or architecture.
EarlyStoppingis configured to prevent overfitting. - File Not Found: Double-check filenames in
config.yamlagainst actual generated files (models, scalers, datasets). - Port in Use: If
realtime_detector.pyfails, theexporter_portmight be occupied by another process.
Contributing
Contributions are welcome! Please feel free to open an issue or submit a pull request.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prometheus_anomaly_detection_lstm-0.1.9.tar.gz.
File metadata
- Download URL: prometheus_anomaly_detection_lstm-0.1.9.tar.gz
- Upload date:
- Size: 152.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71138b58b6ae8f6a1a5520750b714385c391156ffd8e9e635f41451fd467d47a
|
|
| MD5 |
76ae39fa9f805c33f15518a83a77fd9c
|
|
| BLAKE2b-256 |
16930527c67d1ba5a36de8aa4c2ca917cddd5ea4ad0e3f9116f91a7a853d1835
|
File details
Details for the file prometheus_anomaly_detection_lstm-0.1.9-py3-none-any.whl.
File metadata
- Download URL: prometheus_anomaly_detection_lstm-0.1.9-py3-none-any.whl
- Upload date:
- Size: 27.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2de3631e02e061d868ae4be0c43f75e95b63e6f5d149434ba7d947d4867a201d
|
|
| MD5 |
b898c4d11513744958c3bf4713f2ee8b
|
|
| BLAKE2b-256 |
dbf87f3353ca25daaa60b4707fe0133573bc644b92be4a8ee373043a1788b865
|