Skip to main content

Unified model governance for UK insurance pricing: SS1/23-aligned validation and model risk management

Project description

insurance-governance

PyPI Python Tests License Open In Colab nbviewer

Questions or feedback? Start a Discussion. Found it useful? A star helps others find it.

Unified model governance for UK insurance pricing teams. Combines model validation and model risk management into one package, with tests and outputs structured to align with the principles of PRA SS1/23 (as adapted for insurance).

Merged from: insurance-validation (model validation reports) and insurance-mrm (model risk management).

Blog post: One Package, One Install: PRA SS1/23 Validation and MRM Governance Unified

The problem this solves: validation tests and MRM governance packs were built separately and had separate installs, separate version pinning, and separate import paths. Pricing teams either installed both and managed the coupling themselves, or skipped one. This package resolves that by providing a single install.

Regulatory note: PRA SS1/23 is a supervisory statement directed at banks and building societies, not insurers. Insurance model risk management is governed directly by PS12/22, Solvency II internal model requirements, and EIOPA validation guidelines. In practice, many UK insurance MRM frameworks reference SS1/23 by analogy — it articulates sound MRM principles regardless of firm type — and the PRA has encouraged insurers to take note. This library uses SS1/23 as a reference framework in that spirit: the validation tests and governance structure reflect its principles, but you should map your own obligations to your actual regulatory basis (PS12/22 or equivalent).

Why use this?

  • UK pricing teams managing 10+ production models have no consistent way to produce validation and governance artefacts — every model gets a bespoke analyst notebook, and the outputs are incomparable. One install, one framework.
  • Runs a five-test statistical validation suite (Gini with bootstrap CI, A/E with Poisson CI, Hosmer-Lemeshow, lift chart, PSI) and catches segment-level miscalibration that a single aggregate A/E number misses — as demonstrated on synthetic motor data where Model B passes a manual A/E check but fails HL at p < 0.0001.
  • Scores model risk tier objectively across six dimensions (GWP, complexity, deployment status, regulatory use, external data, customer-facing) mapped to a 0–100 composite — removes subjective judgement from MRC presentations.
  • Generates self-contained HTML validation reports and executive governance packs (model purpose, risk tier rationale, assumptions register, approval conditions) in under one second — print-to-PDF ready.
  • Structured around PRA SS1/23 principles, as applied to insurance under PS12/22 and EIOPA guidelines: the audit trail is suitable for a PRA supervisory visit or internal model risk committee.

Subpackages

insurance_governance.validation

Model validation report generator, aligned with the principles of PRA SS1/23 (as adapted for insurance). Runs statistical tests (Gini, PSI, discrimination checks, Hosmer-Lemeshow, lift charts) and produces self-contained HTML reports.

insurance_governance.mrm

Model risk management framework. ModelCard metadata container, RiskTierScorer (objective 0-100 composite score mapping to Tier 1/2/3), ModelInventory (JSON file registry), GovernanceReport (executive committee pack).

Install

uv add insurance-governance
# or
pip install insurance-governance

Quick start

import numpy as np
from insurance_governance import (
    ModelValidationReport,
    ValidationModelCard,
    MRMModelCard,
    RiskTierScorer,
    ModelInventory,
    GovernanceReport,
)

# --- Synthetic model outputs (replace with your real model predictions) ---
rng = np.random.default_rng(42)
n_val = 5_000
y_val        = rng.poisson(0.08, n_val).astype(float)          # observed claim counts
y_pred_val   = np.clip(rng.normal(0.08, 0.02, n_val), 0.001, None)  # model predictions
exposure_val = rng.uniform(0.5, 1.0, n_val)                    # policy years (required for A/E)

# --- Run statistical validation ---
card = ValidationModelCard(
    name="Motor Frequency v3.2",
    version="3.2.0",
    purpose="Predict claim frequency for UK motor portfolio",
    methodology="CatBoost gradient boosting with Poisson objective",
    target="claim_count",
    features=["age", "vehicle_age", "area", "vehicle_group"],
    limitations=["No telematics data"],
    owner="Pricing Team",
)
report = ModelValidationReport(
    model_card=card,
    y_val=y_val,
    y_pred_val=y_pred_val,
    exposure_val=exposure_val,
)
report.generate("validation_report.html")

# --- MRM governance pack ---
mrm_card = MRMModelCard(
    model_id="motor-freq-v3",
    model_name="Motor TPPD Frequency",
    version="3.2.0",
    model_class="pricing",
    intended_use="Frequency pricing for private motor.",
)
scorer = RiskTierScorer()
tier = scorer.score(
    gwp_impacted=125_000_000,
    model_complexity="high",
    deployment_status="champion",
    regulatory_use=False,
    external_data=False,
    customer_facing=True,
)
GovernanceReport(card=mrm_card, tier=tier).save_html("mrm_pack.html")

Or import from subpackages directly:

from insurance_governance.validation import ModelValidationReport, ModelCard as ValidationModelCard
from insurance_governance.mrm import ModelCard as MRMModelCard, RiskTierScorer, ModelInventory, GovernanceReport

Note on ModelCard

Both subpackages define a ModelCard class, but they serve different purposes:

  • insurance_governance.validation.ModelCard (ValidationModelCard at top level) — Pydantic schema, anchors the statistical validation report, captures features, methodology, limitations.
  • insurance_governance.mrm.ModelCard (MRMModelCard at top level) — dataclass, anchors the MRM governance pack, captures assumptions, risk tier, Model Risk Committee metadata.

At the top level they are re-exported as ValidationModelCard and MRMModelCard to avoid ambiguity.

Capabilities Demo

Demonstrated on synthetic motor data: 50,000 UK motor policies, CatBoost Poisson frequency model, 60/20/20 temporal train/validation/test split. Full script: benchmarks/benchmark_insurance_governance.py.

  • Runs a full validation suite in a single ModelValidationReport call: Gini coefficient with bootstrap 95% CI, 10-band lift chart, A/E by predicted decile with Poisson CI, Hosmer-Lemeshow goodness-of-fit, PSI on score distribution (train vs validation), monitoring plan completeness check — all returning TestResult objects with a pass/fail flag and human-readable detail
  • Computes an overall RAG status (Green/Amber/Red) from the worst-severity failure across all tests
  • Produces a self-contained HTML validation report and JSON sidecar, print-to-PDF ready, in under one second
  • Scores model risk tier via RiskTierScorer: 6 dimensions (GWP, model complexity, deployment status, regulatory use, external data, customer-facing) mapped to a 0-100 composite with documented rules per point — no subjective judgement required at the MRC presentation
  • Registers models in ModelInventory (JSON file, check into git alongside your code); records validation run history linked by run_id; lists overdue reviews
  • Generates a GovernanceReport executive committee pack (HTML + JSON) covering model purpose, risk tier rationale, last validation RAG, assumptions register with risk ratings, outstanding issues, approval conditions, and next review date

When to use: You have 10+ production pricing models and want consistent, auditable validation and governance output rather than bespoke analyst notebooks that vary by model. The framework is structured around the principles of PRA SS1/23 — insurers should map those principles to their own regulatory basis (PS12/22, EIOPA guidelines). Particularly useful before a PRA supervisory visit.

When NOT to use: You need reserving or capital model governance — this package is scoped to pricing models. It also does not replace independent human review of validation results; it automates the tests, not the judgement.

Databricks Notebook

A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in burning-cost-examples.

Performance

Benchmarked on Databricks (2026-03-16) using synthetic UK motor data: 20,000 training + 8,000 validation policies, three model scenarios — well-specified (Model A), miscalibrated (Model B, A/E=1.18 with age-band bias), and drifted (Model C, trained on a shifted population). The comparison is the library's automated 5-test suite against a manual 4-check checklist. See benchmarks/benchmark_insurance_governance.py for the full script.

Runtime. On an 8,000-row validation set:

Approach Time
Manual 4-check checklist 0.09s
Automated 5-test suite (Gini + bootstrap CI, A/E + Poisson CI, Hosmer-Lemeshow, lift chart, PSI) 1.17s

The automated suite is ~13× slower in wall clock time; that 1-second overhead is entirely the 500-resample bootstrap for the Gini confidence interval.

What the automated suite catches that the manual checklist misses.

The key test is Model B (miscalibrated). Both methods flag the A/E deviation. But only the automated suite runs Hosmer-Lemeshow, which detects the age-band-level miscalibration that averages out in the global A/E: HL p < 0.0001 (reject calibration by group). The manual checklist, which computes one aggregate A/E number, cannot surface this pattern without additional code.

For Model C (drifted population), PSI on the score distribution = 0.189 — below the 0.25 threshold, so the manual checklist passes on PSI. Only the automated suite catches the drift, because it attaches a Poisson confidence interval to the A/E ratio: the CI excludes 1.0, flagging genuine miscalibration that the manual aggregate A/E misses. PSI alone is not sufficient to detect this type of drift; the confidence-interval-based A/E test is what surfaces it.

Scenario Manual verdict Automated verdict Key diagnostic
Model A (well-specified) 4/4 pass 5/5 pass Gini CI, A/E CI both tight
Model B (miscalibrated) Flags A/E Flags A/E + HL HL p<0.0001 — age-band bias
Model C (drifted) Passes PSI Flags A/E CI PSI=0.189 (below 0.25 threshold — manual checklist passes); A/E CI excludes 1.0

The runtime difference does not matter in practice — governance validation runs once per model release, not in a hot loop. The return is consistent, audit-ready output for all three scenarios: every test produces a TestResult with passed, severity, and a detail string ready for a validation pack.

Related Libraries

Library Description
insurance-monitoring Model drift detection — ongoing monitoring evidence feeds into governance review cycles
insurance-fairness Proxy discrimination auditing — fairness audit outputs are a required input to the governance sign-off pack
insurance-deploy Champion/challenger deployment with ENBP audit logging — governance documents the model; deploy manages its lifecycle

Licence

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_governance-0.1.5.tar.gz (125.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_governance-0.1.5-py3-none-any.whl (71.2 kB view details)

Uploaded Python 3

File details

Details for the file insurance_governance-0.1.5.tar.gz.

File metadata

  • Download URL: insurance_governance-0.1.5.tar.gz
  • Upload date:
  • Size: 125.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for insurance_governance-0.1.5.tar.gz
Algorithm Hash digest
SHA256 752e9cd89aa60c9605718ea743ebde61b66782b88cc8abc3cd1451a9a55ed6ab
MD5 482b35afdef12db442e51c257dcf27b4
BLAKE2b-256 b695748d85aa92285b3b5eb20e402a7258f67bcc1934c5d7adb1ae2d1d51ed89

See more details on using hashes here.

File details

Details for the file insurance_governance-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for insurance_governance-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 4b4212471e74af774e25fe9b2812a8883feb831a8556e60ffce00218210f481f
MD5 4f65fb34d28743a5b8538f6d89376d1a
BLAKE2b-256 7b69a5e430361c1cc7020871cba97495e40a536564a64ed5fb0413e309989f3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page