Temporal cross-validation for insurance pricing models

These details have not been verified by PyPI

Project links

Project description

insurance-cv

Python License: MIT

Temporal cross-validation for insurance pricing models. Walk-forward splits that respect policy year, accident year, and IBNR development structure - because standard k-fold gives you overoptimistic CV scores that don't survive contact with a live rating year.

uv add insurance-cv

The problem with standard k-fold in insurance

K-fold cross-validation randomly partitions data into folds. For insurance pricing, this is wrong in at least three ways.

Temporal leakage. Insurance claims develop over time. A motor claim reported 18 months after the accident may still be open. If you train on 2022 data and test on 2020 data, your model sees future development patterns that wouldn't have been available at the 2020 pricing date. K-fold does this routinely.

IBNR contamination. For any accident date near your training cutoff, some claims will not yet be reported or fully developed (Incurred But Not Reported). If those claims appear in your training set, the model learns from targets that are systematically understated. The fix is a development buffer - exclude claims with accident dates in the N months before your test window from both training and test sets.

Seasonal confounding. Motor claims peak in winter. Property claims follow weather cycles. If a randomly-selected test fold contains a disproportionate share of December policies, the test loss will look different to what you'd see prospectively. A prospective evaluation should test on a contiguous future period with the same seasonal mix the model will face in deployment.

The result of using k-fold on insurance data is a model that looks better in CV than it performs in the rating year. Prospective monitoring then shows a gap between modelled and actual loss ratios that is partly attributable to the leaky evaluation methodology.

Blog post

Why Your Cross-Validation is Lying to You

How this library fixes it

All splits in insurance-cv are walk-forward (or boundary-aligned): training data always precedes test data in calendar time, with a configurable gap for IBNR development.

Three split generators cover the main use cases:

Function	When to use it
`walk_forward_split`	General-purpose. Expanding training window, rolling test. Standard choice for motor, home, commercial.
`policy_year_split`	When rate changes align to policy year boundaries and you want clean PY-aligned folds.
`accident_year_split`	Long-tail lines (liability, PI) where accident year development varies across the triangle.

All generators return TemporalSplit objects and yield (train_idx, test_idx) tuples that index into your DataFrame. They are also wrapped by InsuranceCV, which implements the sklearn BaseCrossValidator interface so you can pass them directly to GridSearchCV, cross_val_score, etc.

Quickstart

This example is self-contained — no external files needed.

import polars as pl
import numpy as np
from datetime import date, timedelta
from insurance_cv import walk_forward_split
from insurance_cv.diagnostics import temporal_leakage_check, split_summary
from insurance_cv.splits import InsuranceCV

# Generate a synthetic UK motor portfolio: 1000 policies over 3 years
rng = np.random.default_rng(42)
n = 1_000
start = date(2021, 1, 1)
inception_dates = [start + timedelta(days=int(d)) for d in rng.integers(0, 365 * 3, n)]

df = pl.DataFrame({
    "policy_id": [f"POL{i:05d}" for i in range(n)],
    "inception_date": inception_dates,
    "exposure": rng.uniform(0.1, 1.0, n).round(4).tolist(),
    "claim_count": rng.poisson(0.08, n).tolist(),
    "claim_amount": (rng.exponential(1500, n) * rng.binomial(1, 0.08, n)).round(2).tolist(),
    "vehicle_age": rng.integers(0, 15, n).tolist(),
    "driver_age": rng.integers(18, 80, n).tolist(),
    "ncd_years": rng.integers(0, 9, n).tolist(),
}).with_columns(pl.col("inception_date").cast(pl.Date))

splits = walk_forward_split(
    df,
    date_col="inception_date",
    min_train_months=12,    # need at least a year to cover seasonality
    test_months=6,          # evaluate on 6-month windows
    step_months=6,          # non-overlapping test periods
    ibnr_buffer_months=3,   # exclude claims in the 3 months before each test window
)

# Always validate before running the model
check = temporal_leakage_check(splits, df, date_col="inception_date")
if check["errors"]:
    raise RuntimeError("\n".join(check["errors"]))

print(split_summary(splits, df, date_col="inception_date"))
# fold  train_n  test_n  train_end   test_start  gap_days
#    1      312     167  2021-12-31  2022-04-01        91
#    2      479     161  2022-06-30  2022-10-01        93
#  ...

# sklearn-compatible: pass to cross_val_score or GridSearchCV
# Define X and y from the full dataframe (numeric features only for this example)
from sklearn.linear_model import PoissonRegressor
from sklearn.model_selection import cross_val_score
import numpy as np

X = df.select(["vehicle_age", "driver_age", "ncd_years"]).to_numpy()
y = df["claim_count"].to_numpy().astype(float)
model = PoissonRegressor()

cv = InsuranceCV(splits, df)
scores = cross_val_score(model, X, y, cv=cv, scoring="neg_mean_poisson_deviance")

If you already have policy data in a parquet file, replace the synthetic DataFrame block with:

df = pl.read_parquet("policies.parquet")

API

`walk_forward_split`

walk_forward_split(
    df,
    date_col: str,
    min_train_months: int = 12,
    test_months: int = 3,
    step_months: int = 3,
    ibnr_buffer_months: int = 3,
) -> list[TemporalSplit]

Generates an expanding-window walk-forward split. The earliest data is always included in training. Each fold advances the test window by step_months. The IBNR buffer excludes rows in the ibnr_buffer_months months before test_start from both train and test.

Setting step_months == test_months gives non-overlapping test windows (the usual choice for insurance). Smaller values increase fold count but introduce correlation between adjacent test periods.

For long-tail lines, ibnr_buffer_months should be 12-24 months. For motor it is typically 3-6 months.

`policy_year_split`

policy_year_split(
    df,
    date_col: str,
    n_years_train: int,
    n_years_test: int = 1,
    step_years: int = 1,
) -> list[TemporalSplit]

Splits aligned to 1 Jan - 31 Dec policy year boundaries. Use this when your rate changes are annual and you want clean year-aligned train/test boundaries. There is no IBNR buffer because the year boundary is treated as a natural development cutoff - if you need one, adjust n_years_train to leave a gap year.

`accident_year_split`

accident_year_split(
    df,
    date_col: str,
    development_col: str,
    min_development_months: int = 12,
) -> list[TemporalSplit]

Generates one fold per accident year, filtering out years where median claim development is below min_development_months. The development_col should contain months from accident date to valuation date. This is the right approach for liability and professional indemnity where the development triangle matters.

`TemporalSplit`

TemporalSplit(
    date_col: str,
    train_start,
    train_end,
    test_start,
    test_end,
    ibnr_buffer_months: int = 0,
    label: str = "",
)

A single split definition. Call .get_indices(df) to get (train_idx, test_idx) as numpy integer arrays.

`InsuranceCV`

InsuranceCV(splits: list[TemporalSplit], df)

Wraps a list of TemporalSplit objects as a sklearn-compatible CV splitter. Implements split() and get_n_splits(). Pass to cross_val_score, GridSearchCV, or any other sklearn utility that accepts a CV splitter.

`temporal_leakage_check`

temporal_leakage_check(
    splits: list[TemporalSplit],
    df,
    date_col: str,
) -> dict[str, list[str]]

Returns {"errors": [...], "warnings": [...]}. Run this before any model fitting. An empty errors list means no temporal leakage was detected.

`split_summary`

split_summary(
    splits: list[TemporalSplit],
    df,
    date_col: str,
) -> pl.DataFrame

Returns a DataFrame with one row per fold: fold number, train/test sizes, actual date boundaries, gap days, and IBNR buffer months. Useful for confirming that your splits look sensible before committing compute to model fitting.

IBNR buffer: choosing the right value

The IBNR buffer is the most consequential parameter in walk_forward_split. A buffer that is too short means partially-developed claims contaminate your test evaluation; too long reduces the amount of usable test data.

Rough guidelines by line:

Line	Typical buffer
Motor own damage	3-6 months
Motor third party property	6-12 months
Motor third party bodily injury	12-24 months
Home buildings	6-12 months
Employers' liability	24-36 months
Professional indemnity	24-48 months

These are starting points. The right value depends on your claims handling speed, the proportion of large/complex claims, and how you define your loss target (paid vs. incurred vs. ultimate).

Performance

Benchmarked against random 5-fold KFold (sklearn, shuffle=True) on synthetic UK motor insurance data — 50,000 policies, temporal split by accident year: CV pool 2019–2022, true out-of-time test 2023. The same model (CatBoost Poisson, or statsmodels Poisson GLM if CatBoost is unavailable) is fitted under both CV strategies and the resulting CV estimate is compared against the true 2023 holdout deviance.

The core question: which CV strategy produces an estimate closer to what the model actually delivers on future data?

Metric	Random KFold CV	Temporal walk-forward CV	True OOT (2023)
Mean Poisson deviance	measured at runtime	measured at runtime	measured at runtime
Gap to true OOT deviance	measured at runtime	measured at runtime	0.00000
Optimism bias (random − temporal)	measured at runtime	—	—
Temporal leakage	Yes (future years in training folds)	No (verified by leakage check)	—
Structured audit trail	No	Yes (split_summary output)	—

Expected results on this dataset: random KFold produces a CV deviance estimate that is 0.002–0.015 deviance units more optimistic than the true OOT performance. The temporal CV estimate is expected to be 50–80% closer to the true OOT deviance. The optimism bias is driven by the mild frequency trend in the DGP: knowing future years helps predict past years, inflating apparent CV performance under random splitting.

The temporal CV fold-level variance is typically 2–5x higher than random KFold, because each fold genuinely tests on a different time period rather than averaging across all periods. This higher variance is informative — it shows whether model performance degrades as the validation period moves further from the training window.

temporal_leakage_check catches 100% of forward-looking splits. split_summary produces the fold structure documentation that model governance reviewers will ask for.

Run notebooks/benchmark.py on Databricks to reproduce.

Databricks Notebook

A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in burning-cost-examples.

Related Burning Cost libraries

insurance-monitoring - Once you have a properly evaluated model and deploy it, use insurance-monitoring to track Gini drift, PSI, and A/E ratios prospectively. The walk-forward splits here produce the baseline metrics; monitoring tracks how the model holds up after deployment.
insurance-conformal - Prediction intervals for your GBM. Uses temporal splits (same logic as this library) to calibrate the conformal quantile on recent data.

Model building

Library	Description
shap-relativities	Extract rating relativities from GBMs using SHAP
insurance-interactions	Automated GLM interaction detection via CANN and NID scores

Uncertainty quantification

Library	Description
insurance-conformal	Distribution-free prediction intervals for Tweedie models
insurance-distributional	Full conditional distribution per risk: mean, variance, CoV

Deployment and optimisation

Library	Description
insurance-optimise	Constrained rate change optimisation with FCA PS21/5 compliance
insurance-demand	Conversion, retention, and price elasticity modelling

Governance

Library	Description
insurance-fairness	Proxy discrimination auditing for UK insurance models
insurance-monitoring	Model monitoring: PSI, A/E ratios, Gini drift test

All libraries

Capabilities

The notebook at notebooks/demo_insurance_cv.py runs a complete demonstration on a 5-year synthetic UK motor portfolio with known seasonal and trend structure, and shows:

Walk-forward vs random k-fold gap: Poisson deviance is measurably lower (better-looking) under random k-fold because future data leaks into training. Walk-forward gives the honest prospective estimate.
Per-fold trajectory: Walk-forward fold scores trend with the data's temporal structure; random k-fold averages across all periods and hides this signal.
IBNR buffer effect: Test set size and cleanliness trade off against each other. The notebook shows how buffer length from 0 to 12 months changes both.
Policy-year splits: Clean 1 Jan boundaries keep training and test sets on opposite sides of rate changes, demonstrated on 5 policy years.
sklearn drop-in compatibility: InsuranceCV passes directly to cross_val_score with no code changes beyond swapping the CV object.

Development

git clone https://github.com/burning-cost/insurance-cv
cd insurance-cv
uv sync --dev
uv run pytest -v

Tests are designed to run on Databricks (serverless) for the compute-heavy cases. On a local machine uv run pytest -v covers the full test suite in seconds since the fixtures use synthetic data.

Related Libraries

Library	What it does
shap-relativities	Extract rating relativities from GBMs — combine with walk-forward CV to evaluate GBM-derived factor tables
insurance-conformal	Conformal prediction intervals — uses temporal splits (same logic as this library) to calibrate coverage guarantees
insurance-monitoring	Model monitoring — walk-forward splits here produce the baseline metrics; monitoring tracks performance after deployment
insurance-datasets	Synthetic UK insurance datasets with known DGPs — use to benchmark CV strategies against a controlled ground truth

Licence

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Apr 1, 2026

0.2.4

Mar 25, 2026

0.2.3

Mar 17, 2026

This version

0.2.2

Mar 15, 2026

0.2.1

Mar 15, 2026

0.2.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_cv-0.2.2.tar.gz (100.2 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_cv-0.2.2-py3-none-any.whl (16.1 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file insurance_cv-0.2.2.tar.gz.

File metadata

Download URL: insurance_cv-0.2.2.tar.gz
Upload date: Mar 15, 2026
Size: 100.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_cv-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`ebd9b5bb1b08cd9ea368dec22b5267a597da73102971860a10f45e30db1546b7`
MD5	`e1e7b2b233d02aa7df95ac4d3999dd06`
BLAKE2b-256	`2f9a419778eb8d93e3252132aab589090516dd36ad25fb4f01a73845caf33c40`

See more details on using hashes here.

File details

Details for the file insurance_cv-0.2.2-py3-none-any.whl.

File metadata

Download URL: insurance_cv-0.2.2-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 16.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_cv-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2e4c5f59b1bf1c8cb00bd34ff73bf0d18f8666517fcf995b4af4ec338b4c0e33`
MD5	`cac9651e6a920af10364df219cf67a3c`
BLAKE2b-256	`762398feaea8e5bf63c02da8db3c40574c4bd02bf91a7e1a5daa46819230b6b1`

See more details on using hashes here.

insurance-cv 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-cv

The problem with standard k-fold in insurance

Blog post

How this library fixes it

Quickstart

API

walk_forward_split

policy_year_split

accident_year_split

TemporalSplit

InsuranceCV

temporal_leakage_check

split_summary

IBNR buffer: choosing the right value

Performance

Databricks Notebook

Related Burning Cost libraries

Capabilities

Development

Related Libraries

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`walk_forward_split`

`policy_year_split`

`accident_year_split`

`TemporalSplit`

`InsuranceCV`

`temporal_leakage_check`

`split_summary`