Unified model governance for UK insurance pricing: SS1/23-aligned validation and model risk management
Project description
insurance-governance
Questions or feedback? Start a Discussion. Found it useful? A star helps others find it.
Unified model governance for UK insurance pricing teams. Combines model validation and model risk management into one package, with tests and outputs structured to align with the principles of PRA SS1/23 (as adapted for insurance).
Merged from: insurance-validation (model validation reports) and insurance-mrm (model risk management).
Blog post: One Package, One Install: PRA SS1/23 Validation and MRM Governance Unified
The problem this solves: validation tests and MRM governance packs were built separately and had separate installs, separate version pinning, and separate import paths. Pricing teams either installed both and managed the coupling themselves, or skipped one. This package resolves that by providing a single install.
Regulatory note: PRA SS1/23 is a supervisory statement directed at banks and building societies, not insurers. Insurance model risk management is governed directly by PS12/22, Solvency II internal model requirements, and EIOPA validation guidelines. In practice, many UK insurance MRM frameworks reference SS1/23 by analogy — it articulates sound MRM principles regardless of firm type — and the PRA has encouraged insurers to take note. This library uses SS1/23 as a reference framework in that spirit: the validation tests and governance structure reflect its principles, but you should map your own obligations to your actual regulatory basis (PS12/22 or equivalent).
Why use this?
- UK pricing teams managing 10+ production models have no consistent way to produce validation and governance artefacts — every model gets a bespoke analyst notebook, and the outputs are incomparable. One install, one framework.
- Runs a five-test statistical validation suite (Gini with bootstrap CI, A/E with Poisson CI, Hosmer-Lemeshow, lift chart, PSI) and catches segment-level miscalibration that a single aggregate A/E number misses — as demonstrated on synthetic motor data where Model B passes a manual A/E check but fails HL at p < 0.0001.
- Scores model risk tier objectively across six dimensions (GWP, complexity, deployment status, regulatory use, external data, customer-facing) mapped to a 0–100 composite — removes subjective judgement from MRC presentations.
- Generates self-contained HTML validation reports and executive governance packs (model purpose, risk tier rationale, assumptions register, approval conditions) in under one second — print-to-PDF ready.
- Structured around PRA SS1/23 principles, as applied to insurance under PS12/22 and EIOPA guidelines: the audit trail is suitable for a PRA supervisory visit or internal model risk committee.
Subpackages
insurance_governance.validation
Model validation report generator, aligned with the principles of PRA SS1/23 (as adapted for insurance). Runs statistical tests (Gini, PSI, discrimination checks, Hosmer-Lemeshow, lift charts) and produces self-contained HTML reports.
insurance_governance.mrm
Model risk management framework. ModelCard metadata container, RiskTierScorer (objective 0-100 composite score mapping to Tier 1/2/3), ModelInventory (JSON file registry), GovernanceReport (executive committee pack).
Install
uv add insurance-governance
# or
pip install insurance-governance
Quick start
import numpy as np
from insurance_governance import (
ModelValidationReport,
ValidationModelCard,
MRMModelCard,
RiskTierScorer,
ModelInventory,
GovernanceReport,
)
# --- Synthetic model outputs (replace with your real model predictions) ---
rng = np.random.default_rng(42)
n_val = 5_000
y_val = rng.poisson(0.08, n_val).astype(float) # observed claim counts
y_pred_val = np.clip(rng.normal(0.08, 0.02, n_val), 0.001, None) # model predictions
exposure_val = rng.uniform(0.5, 1.0, n_val) # policy years (required for A/E)
# --- Run statistical validation ---
card = ValidationModelCard(
name="Motor Frequency v3.2",
version="3.2.0",
purpose="Predict claim frequency for UK motor portfolio",
methodology="CatBoost gradient boosting with Poisson objective",
target="claim_count",
features=["age", "vehicle_age", "area", "vehicle_group"],
limitations=["No telematics data"],
owner="Pricing Team",
)
report = ModelValidationReport(
model_card=card,
y_val=y_val,
y_pred_val=y_pred_val,
exposure_val=exposure_val,
)
report.generate("validation_report.html")
# --- MRM governance pack ---
mrm_card = MRMModelCard(
model_id="motor-freq-v3",
model_name="Motor TPPD Frequency",
version="3.2.0",
model_class="pricing",
intended_use="Frequency pricing for private motor.",
)
scorer = RiskTierScorer()
tier = scorer.score(
gwp_impacted=125_000_000,
model_complexity="high",
deployment_status="champion",
regulatory_use=False,
external_data=False,
customer_facing=True,
)
GovernanceReport(card=mrm_card, tier=tier).save_html("mrm_pack.html")
Or import from subpackages directly:
from insurance_governance.validation import ModelValidationReport, ModelCard as ValidationModelCard
from insurance_governance.mrm import ModelCard as MRMModelCard, RiskTierScorer, ModelInventory, GovernanceReport
Note on ModelCard
Both subpackages define a ModelCard class, but they serve different purposes:
insurance_governance.validation.ModelCard(ValidationModelCardat top level) — Pydantic schema, anchors the statistical validation report, captures features, methodology, limitations.insurance_governance.mrm.ModelCard(MRMModelCardat top level) — dataclass, anchors the MRM governance pack, captures assumptions, risk tier, Model Risk Committee metadata.
At the top level they are re-exported as ValidationModelCard and MRMModelCard to avoid ambiguity.
Capabilities Demo
Demonstrated on synthetic motor data: 50,000 UK motor policies, CatBoost Poisson frequency model, 60/20/20 temporal train/validation/test split. Full script: benchmarks/benchmark_insurance_governance.py.
- Runs a full validation suite in a single
ModelValidationReportcall: Gini coefficient with bootstrap 95% CI, 10-band lift chart, A/E by predicted decile with Poisson CI, Hosmer-Lemeshow goodness-of-fit, PSI on score distribution (train vs validation), monitoring plan completeness check — all returningTestResultobjects with a pass/fail flag and human-readable detail - Computes an overall RAG status (Green/Amber/Red) from the worst-severity failure across all tests
- Produces a self-contained HTML validation report and JSON sidecar, print-to-PDF ready, in under one second
- Scores model risk tier via
RiskTierScorer: 6 dimensions (GWP, model complexity, deployment status, regulatory use, external data, customer-facing) mapped to a 0-100 composite with documented rules per point — no subjective judgement required at the MRC presentation - Registers models in
ModelInventory(JSON file, check into git alongside your code); records validation run history linked byrun_id; lists overdue reviews - Generates a
GovernanceReportexecutive committee pack (HTML + JSON) covering model purpose, risk tier rationale, last validation RAG, assumptions register with risk ratings, outstanding issues, approval conditions, and next review date
When to use: You have 10+ production pricing models and want consistent, auditable validation and governance output rather than bespoke analyst notebooks that vary by model. The framework is structured around the principles of PRA SS1/23 — insurers should map those principles to their own regulatory basis (PS12/22, EIOPA guidelines). Particularly useful before a PRA supervisory visit.
When NOT to use: You need reserving or capital model governance — this package is scoped to pricing models. It also does not replace independent human review of validation results; it automates the tests, not the judgement.
Databricks Notebook
A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in burning-cost-examples.
Performance
Benchmarked on Databricks (2026-03-16) using synthetic UK motor data: 20,000 training + 8,000 validation policies, three model scenarios — well-specified (Model A), miscalibrated (Model B, A/E=1.18 with age-band bias), and drifted (Model C, trained on a shifted population). The comparison is the library's automated 5-test suite against a manual 4-check checklist. See benchmarks/benchmark_insurance_governance.py for the full script.
Runtime. On an 8,000-row validation set:
| Approach | Time |
|---|---|
| Manual 4-check checklist | 0.09s |
| Automated 5-test suite (Gini + bootstrap CI, A/E + Poisson CI, Hosmer-Lemeshow, lift chart, PSI) | 1.17s |
The automated suite is ~13× slower in wall clock time; that 1-second overhead is entirely the 500-resample bootstrap for the Gini confidence interval.
What the automated suite catches that the manual checklist misses.
The key test is Model B (miscalibrated). Both methods flag the A/E deviation. But only the automated suite runs Hosmer-Lemeshow, which detects the age-band-level miscalibration that averages out in the global A/E: HL p < 0.0001 (reject calibration by group). The manual checklist, which computes one aggregate A/E number, cannot surface this pattern without additional code.
For Model C (drifted population), PSI on the score distribution = 0.189 — below the 0.25 threshold, so the manual checklist passes on PSI. Only the automated suite catches the drift, because it attaches a Poisson confidence interval to the A/E ratio: the CI excludes 1.0, flagging genuine miscalibration that the manual aggregate A/E misses. PSI alone is not sufficient to detect this type of drift; the confidence-interval-based A/E test is what surfaces it.
| Scenario | Manual verdict | Automated verdict | Key diagnostic |
|---|---|---|---|
| Model A (well-specified) | 4/4 pass | 5/5 pass | Gini CI, A/E CI both tight |
| Model B (miscalibrated) | Flags A/E | Flags A/E + HL | HL p<0.0001 — age-band bias |
| Model C (drifted) | Passes PSI | Flags A/E CI | PSI=0.189 (below 0.25 threshold — manual checklist passes); A/E CI excludes 1.0 |
The runtime difference does not matter in practice — governance validation runs once per model release, not in a hot loop. The return is consistent, audit-ready output for all three scenarios: every test produces a TestResult with passed, severity, and a detail string ready for a validation pack.
Related Libraries
| Library | Description |
|---|---|
| insurance-monitoring | Model drift detection — ongoing monitoring evidence feeds into governance review cycles |
| insurance-fairness | Proxy discrimination auditing — fairness audit outputs are a required input to the governance sign-off pack |
| insurance-deploy | Champion/challenger deployment with ENBP audit logging — governance documents the model; deploy manages its lifecycle |
Licence
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_governance-0.1.5.tar.gz.
File metadata
- Download URL: insurance_governance-0.1.5.tar.gz
- Upload date:
- Size: 125.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
752e9cd89aa60c9605718ea743ebde61b66782b88cc8abc3cd1451a9a55ed6ab
|
|
| MD5 |
482b35afdef12db442e51c257dcf27b4
|
|
| BLAKE2b-256 |
b695748d85aa92285b3b5eb20e402a7258f67bcc1934c5d7adb1ae2d1d51ed89
|
File details
Details for the file insurance_governance-0.1.5-py3-none-any.whl.
File metadata
- Download URL: insurance_governance-0.1.5-py3-none-any.whl
- Upload date:
- Size: 71.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b4212471e74af774e25fe9b2812a8883feb831a8556e60ffce00218210f481f
|
|
| MD5 |
4f65fb34d28743a5b8538f6d89376d1a
|
|
| BLAKE2b-256 |
7b69a5e430361c1cc7020871cba97495e40a536564a64ed5fb0413e309989f3d
|