Skip to main content

Classifier two-sample tests and related tests.

Project description

samesame

Python PyPI - Downloads Static Badge License: LGPLv3 UAI 2022 uv Ruff

Same, same but different ...

samesame implements classifier two-sample tests (CTSTs) and as a bonus extension, a noninferiority test (NIT).

These were either missing or implemented with some tradeoffs (looking at you, sample-splitting) in existing libraries. And so, samesame fills in the gaps :)

Motivation

What is samesame good for? It is for data (model) validation, performance monitoring, drift detection (dataset shift), statistical process control, covariate balance and so on and so forth.

As an example, this motivating example comes from the related R package dsos.

Installation

To install, run the following command:

python -m pip install samesame

Quick Start

Simulate outlier scores to test for no adverse shift when the null (no shift) holds.

from samesame.ctst import CTST
from samesame.nit import DSOS
from sklearn.metrics import roc_auc_score
import numpy as np

n_size = 600
rng = np.random.default_rng(123_456)
os_train = rng.normal(size=n_size)
os_test = rng.normal(size=n_size)
null_ctst = CTST.from_samples(os_train, os_test, metric=roc_auc_score)
null_dsos = DSOS.from_samples(os_train, os_test)

In this example, we reject the null of equal distribution (i.e. CTST)

print(f"{null_ctst.pvalue=:.4f}")
# null_ctst.pvalue=0.0358

However, we fail to reject the null of no adverse shift (i.e. DSOS), meaning that the test sample (os_test) does not seem to contain disproportionally more outliers than the training sample (os_train).

print(f"{null_dsos.pvalue=:.4f}")
# null_dsos.pvalue=0.9500

This is the type of false alarms that samesame can highlight by comparing tests of equal distribution to noninferiority tests.

Usage

Functionality

Below, you will find an overview of common modules in samesame.

Function Module
Bayesian inference samesame.bayes
Classifier two-sample tests (CTSTs) samesame.ctst
Noninferiority tests (NITs) samesame.nit

Attributes

When the method is a statistical test, samesame saves (stores) the results of some potentially computationally intensive results in attributes. These attributes, when available, can be accessed as follows.

Attribute Description
.statistic The test statistic for the hypothesis.
.null The null distribution for the hypothesis.
.pvalue The p-value for the hypothesis.
.posterior The posterior distribution for the hypothesis.
.bayes_factor The bayes factor for the hypothesis.

Examples

To get started, please see the examples in the docs.

Dependencies

samesame has few dependencies beyond the standard library. It will probably work with some older Python versions. It is, in short, a lightweight dependency for most machine learning projects.samesame is built on top of, and is compatible with, scikit-learn and numpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

samesame-0.1.3.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

samesame-0.1.3-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file samesame-0.1.3.tar.gz.

File metadata

  • Download URL: samesame-0.1.3.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for samesame-0.1.3.tar.gz
Algorithm Hash digest
SHA256 c17c896b22d82554a3fbd1ff44a5a0476709d1b9e2efb7a9080b49de9e45880b
MD5 1388ce9066be3d3d1427eb165daee92c
BLAKE2b-256 ce8df098750a520015ebb7efd13389d230cb53acb714437b82570bb66ba4dbfe

See more details on using hashes here.

File details

Details for the file samesame-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: samesame-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for samesame-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 765119cd2c11f9d64881036abb1bf25e944767cf0ed4226879aecd67c9c8c4c8
MD5 55ada5dc245d730e84c4eee1249040d5
BLAKE2b-256 b7e68e15484a3e729e69faf3026f903bb5b40b4ac95c22decdb842df0b447f1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page