Skip to main content

Pulsar-metrics is an open-source Python library for evaluating and monitoring data and concept drift with an extensive set of metrics. It also offers the possibility to use custom metrics defined by the user.

Project description

pulsar-metrics

Pulsar-metrics is an open-source Python library for evaluating and monitoring data and concept drift with an extensive set of metrics. It also offers the possibility to use custom metrics defined by the user.

Getting started

Pulsar-metrics components

There are two core components in pulsar-metrics: metrics and analyzers

Metrics

An API to calculate single metrics for data and concept drift. Metrics results are unified in a single data structure MetricsResults storing not only the metrics value, but also a couple of metadata related to the model and the data used for its calculation

MetricResults(metric_name=None, type='performance', model_id='model_1', model_version='1', data_id=None, feature=None, value=None, status=None, threshold=None, period_start=None, period_end=datetime.datetime(2022, 7, 1, 0, 0), eval_timestamp=datetime.datetime(2022, 9, 26, 10, 28, 27, 846122), conf_int=None)

There are three types of metrics:

- Data drift metrics for the calculation of ditributional changes of the features used in the model. The metrics included so far are:
Kullback-Leibler (KL) divergence: This statistics measures how different is a probability distribution $P$ with respect to a reference probability distributiuon $Q$ (typically the probability distribution of the treaining features). More precisely, the KL divergence $D_{KL}(P||Q)$ is given by the fllowing formula $$D_{KL}(P||Q) = \sum_x P(x) \log \left ( \frac{P(x)}{Q(x)} \right )$$ $D_{KL}(P||Q)$ is always non-negative et is zero when the distributions are identical. Hence, a drift would be detected if its value is larger than a given threshold decided by the use
Wasserstein distance is a distance measure between two probability measures $Q$ and $P$. More precisely, the (first) Wassersetin distance $W_1(P, Q)$ is given by the formula $$W_1(P, Q) = \int_{-\infty}^{+\infty}|F_Q(x) - F_P(x)|dx$$ where $F_Q$ is the cumulative distribution function of $Q$. The metric is strctly non negative and a drift would be detected if its value is larger than a given threshold decided by the user.
T-test is a 2 samples paremetric statistical test to detect a difference in the means of the distributions of the two samples. More precisely, the test used is the Welch test in which the 2 samples do not necessarily have the same variance or size. Since it is a statistical test, a location drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Mann-Whitney U test for location shift is a 2 sample non-parametric statistical test to detect a difference in the medians of the distrbutions of two samples. Since it is a statistical test, a location drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Levene's test is a 2 samples parametric statistical test to detect a difference in the variances of the distributions of the two samples. Since it is a statistical test, a dispersion drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Kolmogorov-Smirnov test is a 2 samples nonparametric statistical test to check whether two samples come from the same distribution. The test statistics is given by $$D_{n, m} = \sup_x |F_{1, n}(x) - F_{2, m}(x)|$$ where $F_{1, m}$ is the empirical cumulative distrbutin functin of sample 1 with size $n$. Since it is a statistical test, a dispersion drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Cramer von Mises test is a 2 samples nonparametric statistical test to check whether two samples come from the same distribution. The test statistics is given by $$T_{n, m} = \frac{nm}{n+m} \int_{-\infty}^{+\infty} |F_{1, n}(x) - F_{2, m}(x)|^2 dF_{n+m}$$ where $F_{1, m}$ is the empirical cumulative distrbutin functin of sample 1 with size $n$ and $F_{n+m}$ is the emprirical distribution function of the two samples together. Since it is a statistical test, a distributios drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Chi-square test to compare the distribution of a categorical feature in 2 samples by comparing the frequencies of unique modalities. Since it is a statistical test, a distribution drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).

Data drift metrics are implemented either in the DriftMetric (For the KL divergence and the Wasserstein distance) or DriftTestMetric classes. The choice of the metric is specified with the name parameter in the init method according to the following table

Metric Name
Kullback-Leibler divergence 'kl'
Wasserstein distance 'wasserstein'
T-test 'ttest'
Mann Whitney U test 'manwu'
Leven's test 'levene'
Kolmgorv-Smirnov 'ks_2samp'
Cramer von Mises test 'CvM'
Chi square test 'chi2'
- Performance metrics for te calculation of the performance of classification and regression models. In particular, the following metrics are implemented:
  • Accuracy
  • Precision
  • Recall
  • f1-score
  • Log loss
  • AUC
  • AUCPR
  • Brier Score
  • Mean squarred error (MSE)
  • Mean absolute error (MAE)
  • Mean absolute percentage error (MAPE)
  • R-square score

Performance metrics are implemented in the PerformanceMetric class.

- Custom metrics. The user has the ability to define his own metric through the @CustomMetric decorator (see below for an example)

All three types of metrics inherit the AbstractMetrics class.

Analyzers

An analyzer groups multiple metrics calculations in a single run. It allows to use which metrics to use and for which features.

Example usage

To use the library, you need a reference dataset, typically the training dataset, and an analysis dataset which we want to compare with former.

Calculating a single metric

For a single metric, we first start by instantiating the appropriate metrics class by specifying the name of the metric ("ttest" in the example below)

from pulsar_metrics.metrics.drift import DriftTestMetric
driftTest = DriftTestMetric(name = 'ttest', data = data_new, feature_name = feature_name)

Then we run the .evaluate() method to calculate the metric

driftTest.evaluate(alpha = 0.05, reference = data_ref[feature_name])

The result is returned through the .get_result() method of te metric object

driftTest.get_result()

Using the analyzer

When multiple metrics are required for different features, the analyzer allows one to calculate all the metrics at once.

First, instantiate an analyzer object

from pulsar_metrics.analyzers.base import Analyzer
analysis = Analyzer(name = 'First Analyzer', description='My first Analyzer', data = data_new)

Then add the metrics of interest

analysis.add_drift_metrics(metrics_list=['wasserstein', 'ttest', 'ks_2samp'], features_list=['Population', 'MedInc']);
analysis.add_performance_metrics(metrics_list=['accuracy'], y_name = 'clf_target');

Then, you can run the analyzer while optionnally specfyings options for each metrics as a dictionnary for the options keywords

analysis.run(data_ref = data_ref, options = {'ttest': {'alpha': 0.01, 'equal_var': False}})

It then possible to get the results of the analysis as a pandas dataFrame

analysis.results_to_pandas()

image

Creating a custom metric

The @CustomMetric decorator allows to transform any function to the AbstractMetrics class

from pulsar_metrics.metrics.base import CustomMetric
@CustomMetric
def test_custom(a, b, **kwargs):
    return np.max(a - b)

About PulsarML

PulsarML is a project helping with monitoring your models and gain powerful insights into its performance.

We released two Open Source packages :

  • pulsar-data-collection : lightweight python SDK enabling data collection of features, predictions and metadata from an ML model serving code/micro-service
  • pulsar-metrics : library for evaluating and monitoring data and concept drift with an extensive set of metrics. It also offers the possibility to use custom metrics defined by the user.

We also created pulsar demo to display an example use-case showing how to leverage both packages to implement model monitoring and performance management.

Want to interact with the community? join our slack channel

Powered by Rocket Science Development

Contributing

  1. Fork this repository, develop, and test your changes
  2. open an issue
  3. Submit a pull request with a reference to the issue

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pulsar_metrics-0.1.3.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pulsar_metrics-0.1.3-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file pulsar_metrics-0.1.3.tar.gz.

File metadata

  • Download URL: pulsar_metrics-0.1.3.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.6 Linux/5.15.0-1033-azure

File hashes

Hashes for pulsar_metrics-0.1.3.tar.gz
Algorithm Hash digest
SHA256 a83bce8e0f36b995519a7b1461697aba229ac0fdf647710ca817a5f495971e6d
MD5 de4334be8f6f300fbe4dc5e3b51c6239
BLAKE2b-256 edef4238f433cca30bde3aab3f830d72450fd0cb396ea920d4b86b6d6bba4031

See more details on using hashes here.

File details

Details for the file pulsar_metrics-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pulsar_metrics-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.6 Linux/5.15.0-1033-azure

File hashes

Hashes for pulsar_metrics-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 91c49ef6e6fed8fdbde39eaad3f78b81a10e41a42f20b4654d381f79dc937946
MD5 20528499cd7fff1ec3e154130b18fc54
BLAKE2b-256 a84836a09a2333a99fb6f05b8c50b6d3ead05577371f641fa3c325282c792373

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page