Skip to main content

BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection

Project description

🕵️ BARO: Root Cause Analysis for Microservices

DOI pypi package Downloads CircleCI Build and test Upload Python Package

BARO is an end-to-end anomaly detection and root cause analysis approach for microservices's failures. This repository includes artifacts for reuse and reproduction of experimental results presented in our FSE'24 paper titled "BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection".

Installation

Install from PyPI

pip install fse-baro

Or, build from source

git clone https://github.com/phamquiluan/baro.git && cd baro
pip install -e .

BARO has been tested on Linux and Windows, with different Python versions. More details are in INSTALL.md.

How-to-use

Data format

The data must be a pandas.DataFrame that consists of multivariate time series metrics data. We require the data to have a column named time that stores the timestep. Each other column stores a time series for metrics data with the name format of <service>_<metric>. For example, the column cart_cpu stores the CPU utilization of service cart. A sample of valid data could be downloaded using the download_data() method that we will demonstrated shortly below.

Sample Python commands to use BARO

Open In Colab

BARO consists of two modules, namely MultivariateBOCPD (implemented in baro.anomaly_detection.bocpd) and RobustScorer (implemented in baro.root_cause_analysis.robust_scorer). We expose these two functions for users/researchers to reuse them more conveniently. The sample commands to run BARO are presented as follows,

# You can put the code here to a file named test.py
from baro.anomaly_detection import bocpd
from baro.root_cause_analysis import robust_scorer
from baro.utility import download_data, read_data

# download a sample data to data.csv
download_data()

# read data from data.csv
data = read_data("data.csv")

# perform anomaly detection 
anomalies = bocpd(data) 
print("Anomalies are detected at timestep:", anomalies[0])

# perform root cause analysis
root_causes = robust_scorer(data, anomalies=anomalies)["ranks"]

# print the top 5 root causes
print("Top 5 root causes:", root_causes[:5])
Expected output after running the above code (it takes around 1 minute)
$ python test.py
Downloading data.csv..: 100%|████████████████████████████████| 570k/570k [00:00<00:00, 17.1MiB/s]
Anomalies are detected at timestep: 243
Top 5 root causes: ['checkoutservice_latency', 'cartservice_mem', 'cartservice_latency', 'cartservice_cpu', 'main_mem']

Reproducibility

As presented in Table 3, BARO achieves Avg@5 of 0.91, 0.96, 0.95, 0.62, and 0.86 for CPU, MEM, DELAY, LOSS, and ALL fault types on the Online Boutique dataset. To reproduce the RCA performance of our BARO as presented in the Table 3. You can run the following commands:

Reproduce RCA performance on the Online Boutique dataset, fault type CPU

$ python main.py --dataset OnlineBoutique --fault-type cpu

Expected output

====== Reproduce BARO =====
Dataset   : fse-ob
Fault type: cpu
Avg@5 Acc : 0.91

Reproduce RCA performance on the Online Boutique dataset, fault type MEM

$ python main.py --dataset OnlineBoutique --fault-type mem

Expected output

====== Reproduce BARO =====
Dataset   : fse-ob
Fault type: mem
Avg@5 Acc : 0.96

Reproduce RCA performance on the Online Boutique dataset, fault type DELAY

$ python main.py --dataset OnlineBoutique --fault-type delay

Expected output

====== Reproduce BARO =====
Dataset   : fse-ob
Fault type: delay
Avg@5 Acc : 0.95

Reproduce RCA performance on the Online Boutique dataset, fault type LOSS

$ python main.py --dataset OnlineBoutique --fault-type loss

Expected output

====== Reproduce BARO =====
Dataset   : fse-ob
Fault type: loss
Avg@5 Acc : 0.62

Reproduce RCA performance on the Online Boutique dataset, fault type ALL

$ python main.py --dataset OnlineBoutique --fault-type all

Expected output

====== Reproduce BARO =====
Dataset   : fse-ob
Fault type: all
Avg@5 Acc : 0.86

We have prepared two Google Colab Notebooks as follows,

  1. Open In Colab: This notebook reproduces the RCA performance of BARO (also at tutorials/reproducibility.ipynb).
  2. Open In Colab: This nodebook reproduces the output of the Multivariate BOCPD module.

Download Paper

TBD

Download Datasets

Our datasets are publicly available in Zenodo repository with the following information:

Running Time & Instrumentation Cost

Please refer to our docs/running_time_and_instrumentation_cost.md document.

Citation

@inproceedings{pham2024baro,
  title={BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection},
  author={Luan Pham, Huong Ha, and Hongyu Zhang},
  booktitle={Proceedings of the ACM on Software Engineering, Vol 1},
  year={2024},
  organization={ACM}
}

Contact

luan.pham@rmit.edu.au

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fse_baro-0.1.8.tar.gz (27.6 kB view hashes)

Uploaded Source

Built Distribution

fse_baro-0.1.8-py3-none-any.whl (14.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page