Skip to main content

Pulsar-metrics is an open-source Python library for evaluating and monitoring data and concept drift with an extensive set of metrics. It also offers the possibility to use custom metrics defined by the user.

Project description

pulsar-metrics

Pulsar-metrics is an open-source Python library for evaluating and monitoring data and concept drift with an extensive set of metrics. It also offers the possibility to use custom metrics defined by the user.

Getting started

Pulsar-metrics components

There are two core components in pulsar-metrics: metrics and analyzers

Metrics

An API to calculate single metrics for data and concept drift. Metrics results are unified in a single data structure MetricsResults storing not only the metrics value, but also a couple of metadata related to the model and the data used for its calculation

MetricResults(metric_name=None, type='performance', model_id='model_1', model_version='1', data_id=None, feature=None, value=None, status=None, threshold=None, period_start=None, period_end=datetime.datetime(2022, 7, 1, 0, 0), eval_timestamp=datetime.datetime(2022, 9, 26, 10, 28, 27, 846122), conf_int=None)

There are three types of metrics:

- Data drift metrics for the calculation of ditributional changes of the features used in the model. The metrics included so far are:
Kullback-Leibler (KL) divergence: This statistics measures how different is a probability distribution $P$ with respect to a reference probability distributiuon $Q$ (typically the probability distribution of the treaining features). More precisely, the KL divergence $D_{KL}(P||Q)$ is given by the fllowing formula $$D_{KL}(P||Q) = \sum_x P(x) \log \left ( \frac{P(x)}{Q(x)} \right )$$ $D_{KL}(P||Q)$ is always non-negative et is zero when the distributions are identical. Hence, a drift would be detected if its value is larger than a given threshold decided by the use
Wasserstein distance is a distance measure between two probability measures $Q$ and $P$. More precisely, the (first) Wassersetin distance $W_1(P, Q)$ is given by the formula $$W_1(P, Q) = \int_{-\infty}^{+\infty}|F_Q(x) - F_P(x)|dx$$ where $F_Q$ is the cumulative distribution function of $Q$. The metric is strctly non negative and a drift would be detected if its value is larger than a given threshold decided by the user.
T-test is a 2 samples paremetric statistical test to detect a difference in the means of the distributions of the two samples. More precisely, the test used is the Welch test in which the 2 samples do not necessarily have the same variance or size. Since it is a statistical test, a location drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Mann-Whitney U test for location shift is a 2 sample non-parametric statistical test to detect a difference in the medians of the distrbutions of two samples. Since it is a statistical test, a location drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Levene's test is a 2 samples parametric statistical test to detect a difference in the variances of the distributions of the two samples. Since it is a statistical test, a dispersion drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Kolmogorov-Smirnov test is a 2 samples nonparametric statistical test to check whether two samples come from the same distribution. The test statistics is given by $$D_{n, m} = \sup_x |F_{1, n}(x) - F_{2, m}(x)|$$ where $F_{1, m}$ is the empirical cumulative distrbutin functin of sample 1 with size $n$. Since it is a statistical test, a dispersion drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Cramer von Mises test is a 2 samples nonparametric statistical test to check whether two samples come from the same distribution. The test statistics is given by $$T_{n, m} = \frac{nm}{n+m} \int_{-\infty}^{+\infty} |F_{1, n}(x) - F_{2, m}(x)|^2 dF_{n+m}$$ where $F_{1, m}$ is the empirical cumulative distrbutin functin of sample 1 with size $n$ and $F_{n+m}$ is the emprirical distribution function of the two samples together. Since it is a statistical test, a distributios drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).
Chi-square test to compare the distribution of a categorical feature in 2 samples by comparing the frequencies of unique modalities. Since it is a statistical test, a distribution drift is detected when the p-value is smaller than a significance level chosen by the user (default is 0.05).

Data drift metrics are implemented either in the DriftMetric (For the KL divergence and the Wasserstein distance) or DriftTestMetric classes. The choice of the metric is specified with the name parameter in the init method according to the following table

Metric Name
Kullback-Leibler divergence 'kl'
Wasserstein distance 'wasserstein'
T-test 'ttest'
Mann Whitney U test 'manwu'
Leven's test 'levene'
Kolmgorv-Smirnov 'ks_2samp'
Cramer von Mises test 'CvM'
Chi square test 'chi2'
- Performance metrics for te calculation of the performance of classification and regression models. In particular, the following metrics are implemented:
  • Accuracy
  • Precision
  • Recall
  • f1-score
  • Log loss
  • AUC
  • AUCPR
  • Brier Score
  • Mean squarred error (MSE)
  • Mean absolute error (MAE)
  • Mean absolute percentage error (MAPE)
  • R-square score

Performance metrics are implemented in the PerformanceMetric class.

- Custom metrics. The user has the ability to define his own metric through the @CustomMetric decorator (see below for an example)

All three types of metrics inherit the AbstractMetrics class.

Analyzers

An analyzer groups multiple metrics calculations in a single run. It allows to use which metrics to use and for which features.

Example usage

To use the library, you need a reference dataset, typically the training dataset, and an analysis dataset which we want to compare with former.

Calculating a single metric

For a single metric, we first start by instantiating the appropriate metrics class by specifying the name of the metric ("ttest" in the example below)

from pulsar_metrics.metrics.drift import DriftTestMetric
driftTest = DriftTestMetric(name = 'ttest', data = data_new, feature_name = feature_name)

Then we run the .evaluate() method to calculate the metric

driftTest.evaluate(alpha = 0.05, reference = data_ref[feature_name])

The result is returned through the .get_result() method of te metric object

driftTest.get_result()

Using the analyzer

When multiple metrics are required for different features, the analyzer allows one to calculate all the metrics at once.

First, instantiate an analyzer object

from pulsar_metrics.analyzers.base import Analyzer
analysis = Analyzer(name = 'First Analyzer', description='My first Analyzer', data = data_new)

Then add the metrics of interest

analysis.add_drift_metrics(metrics_list=['wasserstein', 'ttest', 'ks_2samp'], features_list=['Population', 'MedInc']);
analysis.add_performance_metrics(metrics_list=['accuracy'], y_name = 'clf_target');

Then, you can run the analyzer while optionnally specfyings options for each metrics as a dictionnary for the options keywords

analysis.run(data_ref = data_ref, options = {'ttest': {'alpha': 0.01, 'equal_var': False}})

It then possible to get the results of the analysis as a pandas dataFrame

analysis.results_to_pandas()

image

Creating a custom metric

The @CustomMetric decorator allows to transform any function to the AbstractMetrics class

from pulsar_metrics.metrics.base import CustomMetric
@CustomMetric
def test_custom(a, b, **kwargs):
    return np.max(a - b)

About PulsarML

PulsarML is a project helping with monitoring your models and gain powerful insights into its performance.

We released two Open Source packages :

  • pulsar-data-collection : lightweight python SDK enabling data collection of features, predictions and metadata from an ML model serving code/micro-service
  • pulsar-metrics : library for evaluating and monitoring data and concept drift with an extensive set of metrics. It also offers the possibility to use custom metrics defined by the user.

We also created pulsar demo to display an example use-case showing how to leverage both packages to implement model monitoring and performance management.

Want to interact with the community? join our slack channel

Powered by Rocket Science Development

Contributing

  1. Fork this repository, develop, and test your changes
  2. open an issue
  3. Submit a pull request with a reference to the issue

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pulsar_metrics-0.1.2.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pulsar_metrics-0.1.2-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file pulsar_metrics-0.1.2.tar.gz.

File metadata

  • Download URL: pulsar_metrics-0.1.2.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.6 Linux/5.15.0-1024-azure

File hashes

Hashes for pulsar_metrics-0.1.2.tar.gz
Algorithm Hash digest
SHA256 36de6579bf0d601677041b00c983db17297c6a3e55c44f0315a5ebacfa015bd8
MD5 1018a0551f2aae319d6dab642c1a82f6
BLAKE2b-256 edb9b6b1505c7d6858de900594867ff0562d0267597321a305d16d68c19065d4

See more details on using hashes here.

File details

Details for the file pulsar_metrics-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pulsar_metrics-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.6 Linux/5.15.0-1024-azure

File hashes

Hashes for pulsar_metrics-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 21aae3e9b1138aff099ba028cf37f51b32340133e0c6727d130e11a36f3883db
MD5 5fa6cd61bfbb0d93b263781c4dc27445
BLAKE2b-256 9372ec778aa7b88379c16929e7a314092647c8a02cd5e197fc65084f26d3d2e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page