Model drift detection and monitoring for insurance pricing models. PSI, CSI, Gini drift, A/E ratios, calibration checks.
Project description
insurance-monitoring
Deployed insurance pricing models go stale. The portfolio ages, the claims environment shifts, regulators change the rules. Without systematic monitoring you find out about it when the loss ratio deteriorates — typically 12 to 18 months after the model started misfiring.
This library gives UK pricing teams the specific tools to catch that drift early: exposure-weighted PSI for feature distribution, A/E ratios with Poisson confidence intervals for calibration, and the Gini drift z-test from arXiv 2510.04556 — currently the only statistically rigorous actuarial monitoring framework in the literature.
It produces traffic-light outputs (green/amber/red) that match how a Head of Pricing actually reads a monitoring pack, and a decision recommendation based on the Murphy score decomposition: recalibrate (update the intercept, one hour of work) or refit (rebuild the model, weeks of work).
No scikit-learn. No pandas. Polars-native throughout.
Installation
uv add insurance-monitoring
Quick example
import numpy as np
from insurance_monitoring import MonitoringReport, psi, ae_ratio, gini_coefficient
rng = np.random.default_rng(42)
# Reference period (model training window)
pred_ref = rng.uniform(0.05, 0.20, 50_000)
act_ref = rng.poisson(pred_ref).astype(float)
# Current monitoring period (18 months later)
# Portfolio has aged, young drivers more numerous, claim rate up
pred_cur = rng.uniform(0.05, 0.20, 15_000)
act_cur = rng.poisson(pred_cur * 1.08).astype(float) # model is 8% optimistic
# Quick check: feature drift on model score
score_psi = psi(pred_ref, pred_cur)
print(f"Score PSI: {score_psi:.3f}") # < 0.10 = stable, > 0.25 = investigate
# A/E ratio (aggregate)
from insurance_monitoring import ae_ratio_ci
ae_result = ae_ratio_ci(act_cur, pred_cur)
print(f"A/E: {ae_result['ae']:.3f} (95% CI: {ae_result['lower']:.3f}–{ae_result['upper']:.3f})")
# Gini coefficient (discrimination)
gini = gini_coefficient(act_cur, pred_cur)
print(f"Gini: {gini:.3f}")
# Combined monitoring report with traffic lights
report = MonitoringReport(
reference_actual=act_ref,
reference_predicted=pred_ref,
current_actual=act_cur,
current_predicted=pred_cur,
)
print(report.recommendation) # 'NO_ACTION' | 'RECALIBRATE' | 'REFIT' | 'INVESTIGATE'
print(report.to_polars()) # flat DataFrame with metric / value / band columns
Modules
drift - Feature distribution monitoring
from insurance_monitoring.drift import psi, csi, ks_test, wasserstein_distance
import polars as pl
# PSI with exposure weighting (insurance-correct)
score_psi = psi(
reference=score_train,
current=score_q1_2025,
n_bins=10,
exposure_weights=earned_exposure, # car-years, not policy count
)
# CSI heatmap across all rating factors
feature_ref = pl.DataFrame({"driver_age": [...], "vehicle_age": [...], "ncd_years": [...]})
feature_cur = pl.DataFrame({"driver_age": [...], "vehicle_age": [...], "ncd_years": [...]})
csi_table = csi(feature_ref, feature_cur, features=["driver_age", "vehicle_age", "ncd_years"])
# Returns: feature | csi | band
# Wasserstein: report drift in original units
d = wasserstein_distance(driver_ages_train, driver_ages_q1_2025)
print(f"Average driver age shifted by {d:.1f} years")
On exposure-weighted PSI: standard PSI treats every policy equally regardless of how long it was on risk. If your book renews quarterly and mixes 1-month and 12-month policies, unweighted PSI is wrong. The exposure_weights parameter weights bin proportions by earned exposure — correct for insurance.
calibration - A/E ratio and calibration checks
from insurance_monitoring.calibration import ae_ratio, ae_ratio_ci
# Aggregate A/E with Poisson CI (exact Garwood intervals)
result = ae_ratio_ci(actual, predicted, exposure=exposure)
# {'ae': 1.08, 'lower': 1.04, 'upper': 1.12, 'n_claims': 342, 'n_expected': 317}
# Segmented A/E: where is the model misfiring?
from insurance_monitoring.calibration import ae_ratio
seg_ae = ae_ratio(
actual, predicted, exposure=exposure,
segments=driver_age_bands, # np.array(['17-24', '25-39', ...])
)
# Returns Polars DataFrame: segment | actual | expected | ae_ratio | n_policies
On the IBNR problem: the A/E ratio is only reliable on mature accident periods. For motor, that means at least 12 months of claims development. For liability, 24+ months. If you run monthly monitoring on recent accident months, apply chain-ladder development factors first — otherwise you will see artificially low A/E ratios that recover as claims develop.
discrimination - Gini drift test
from insurance_monitoring.discrimination import gini_coefficient, gini_drift_test
gini_ref = gini_coefficient(act_ref, pred_ref, exposure=exp_ref)
gini_cur = gini_coefficient(act_cur, pred_cur, exposure=exp_cur)
# Statistical test: has Gini degraded significantly?
# Implements arXiv 2510.04556 Theorem 1
result = gini_drift_test(
reference_gini=gini_ref,
current_gini=gini_cur,
n_reference=50_000,
n_current=15_000,
reference_actual=act_ref, reference_predicted=pred_ref,
current_actual=act_cur, current_predicted=pred_cur,
)
# {'z_statistic': -1.93, 'p_value': 0.054, 'gini_change': -0.03, 'significant': False}
The Gini drift test is the distinguishing feature of this library. Most monitoring tools will tell you whether A/E has moved. This tells you whether the model's ranking has degraded — the difference between a cheap recalibration and a full refit.
report - Combined monitoring in one call
from insurance_monitoring import MonitoringReport
report = MonitoringReport(
reference_actual=act_ref,
reference_predicted=pred_ref,
current_actual=act_cur,
current_predicted=pred_cur,
exposure=exposure_cur,
reference_exposure=exposure_ref,
feature_df_reference=feat_ref, # Polars DataFrame
feature_df_current=feat_cur,
features=["driver_age", "vehicle_age", "ncd_years"],
)
print(report.recommendation)
# 'REFIT' | 'RECALIBRATE' | 'NO_ACTION' | 'INVESTIGATE' | 'MONITOR_CLOSELY'
df = report.to_polars()
# metric | value | band
# ae_ratio | 1.08 | amber
# gini_current | 0.39 | amber
# gini_p_value | 0.054 | amber
# csi_driver_age | 0.14 | amber
# csi_vehicle_age | 0.03 | green
# recommendation | nan | REFIT
thresholds - Configurable traffic lights
from insurance_monitoring.thresholds import MonitoringThresholds, PSIThresholds
# Tighten PSI thresholds for a large motor book with monthly monitoring
custom = MonitoringThresholds(
psi=PSIThresholds(green_max=0.05, amber_max=0.15),
)
report = MonitoringReport(..., thresholds=custom)
Default thresholds follow industry convention (PSI: 0.1/0.25 from FICO/credit scoring; A/E: 0.95–1.05 green, 0.90–1.10 amber; Gini: p < 0.10 amber, p < 0.05 red).
Decision framework
The recommendation property implements the three-stage decision tree from arXiv 2510.04556, mapped to actuarial practice:
| Signal | Recommendation | Action |
|---|---|---|
| No drift in any test | NO_ACTION | Continue, schedule next review |
| A/E red, Gini stable | RECALIBRATE | Update intercept/offset (hours of work) |
| Gini red | REFIT | Rebuild model on recent data (weeks of work) |
| Both red | INVESTIGATE | Manual review — check data quality first |
| Any amber | MONITOR_CLOSELY | Increase monitoring frequency |
Databricks integration
The demo notebook at notebooks/demo_monitoring.py shows the full workflow on synthetic motor data and runs on Databricks serverless. Upload it to your workspace and schedule it as a monthly job against your MLflow inference table.
Background
The Gini drift test implements the framework from:
"Model Monitoring: A General Framework with an Application to Non-life Insurance Pricing", arXiv 2510.04556 (December 2025)
Read more
Your Pricing Model is Drifting (and You Probably Can't Tell) — why PSI alone is insufficient, and what it means when A/E is stable but the Gini is falling.
Related libraries
| Library | Why it's relevant |
|---|---|
| shap-relativities | Extract rating relativities from GBMs — when monitoring flags REFIT, use SHAP to diagnose which factors have drifted most |
| insurance-interactions | GLM interaction detection — a refit triggered by Gini degradation may need new interactions added |
| insurance-causal-policy | SDID causal evaluation — if monitoring shows deterioration after a rate change, use this to isolate cause |
| insurance-cv | Walk-forward cross-validation — use monitoring outputs to decide when to retrain and validate the retrained model |
| rate-optimiser | Constrained rate change optimisation — monitoring informs when a rate adjustment is needed; rate-optimiser determines the right one |
Licence
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_monitoring-0.2.0.tar.gz.
File metadata
- Download URL: insurance_monitoring-0.2.0.tar.gz
- Upload date:
- Size: 84.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60697df1bf8aa42d40f3991e8ac863b4b8f1fbd6abeeb6fa12b3b3db51ad222c
|
|
| MD5 |
2a60cef96d2195dc2c0789a88cb23244
|
|
| BLAKE2b-256 |
19b09924ea5839ec2b2a5892b599f4066508f610b5d78819d5c24f157ad145d4
|
File details
Details for the file insurance_monitoring-0.2.0-py3-none-any.whl.
File metadata
- Download URL: insurance_monitoring-0.2.0-py3-none-any.whl
- Upload date:
- Size: 29.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25f5304b9e758cb0f06bc8d81bc1851e21bc590456222d934d74612a92027b29
|
|
| MD5 |
1bdd1eb38483a78ba1a6d974b2edad2c
|
|
| BLAKE2b-256 |
e060052a8d32cb95587899a66c4bd62d69e4be5d8d9ad3a900ec136db4f2fb8d
|