Package to work with te Unstable Population Indicator
Project description
Population stability
A package to calculate the Unstable Population Indicator (see below for definition) or the Population Stability Index, see e.g. this paper or this paper, with a (slightly flawed, as explained in the accompanying paper) Python implementation here.
Installation instructions
Clone the repository, then
pip install unstable-populations
The Unstable Population Indicator
The UPI is explained in much more detail in an accompanying paper, but the main idea is that it is an indicator that can be used when comparing two populations, to see if they are similar. Dissimilarity can be interpreted as data drift, for example, and may indicate a difference between a population some algorithm was trained on and a new population, for which the algorithm will be used.
The UPI works for continuous, ordinal and nominal data (continuous data will be binned) and is formally defined as (LaTeX probably not rendered on Github, see your local version or the accompanying paper):
$$
\textrm{UPI} =
\sum_{\textrm{bins, }i} (f_{1,i} - f_{0,i}) \cdot \log \Big( \frac{f_{1,i} + 1/n_\textrm{tot} }{f_{0,i} + 1/n_\textrm{tot} } \Big)
$$
where $f_{0,i}$ and $f_{1,i}$ are the fraction of the entities in bin or category $i$ in the original and new population, respectively, $n_\textrm{tot}$ is the number of entities in the original and new populations together and $N_\textrm{bins}$ is the number of bins or categories (over which the sum is taken).
Usage
The UPI and PSI can be calculated for two populations as follows
from unstable_populations import upi, psi
UPI = upi(pop1, pop2)
PSI = psi(pop1, pop2)
There are optional keyword parameters that can let you do binning of the populations. The data can be provided in lists, numpy arrays, dictionaries, pandas Series or pandas DataFrames, see the documentation for more details.
As a rule of thumb (but test for yourself!), values as low as 0.1 for UPI or PSI indicate a similar population, and the larger the values, the more dissimilar the populations are.
Misc
Developed and written by Marcel Haas, Joris Huese and Lisette Sibbald at the Data Science Center of the University of Amsterdam. Hosted at https://github.com/harcel/unstable_populations/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for unstable_populations-0.6.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f89215a38e316db9ac3fc2ea5c2c2cd322c6ab64b74f756002e587bef05c5be4 |
|
MD5 | 2d725ebdab2860ed1c95d8534fee1559 |
|
BLAKE2b-256 | 8ae5b25a64ca8b6c7b3ceb0db8e2ded2624708d2d50847919fc1ef88c524c825 |