Skip to main content

Transfer learning for thin-segment insurance pricing

Project description

insurance-transfer

Transfer learning for thin-segment insurance pricing.

The problem

Pricing actuaries routinely face the thin-data problem: you want to price young drivers, a new business class, or a pet breed, but you have fewer than 200 claims in the target segment. A model fitted on that data alone will overfit. Credibility blending helps, but it is a blunt instrument that does not respect covariate structure.

Transfer learning is a better answer. You have a large portfolio — say 50,000 motor policies. Some of that information is relevant to your thin segment. The question is how much to borrow, and how to correct for the fact that young drivers are not just a small random sample of all drivers.

This library implements three transfer methods adapted for insurance pricing, plus diagnostics to detect when the transfer is helping versus hurting.

What it does

Covariate shift detection (CovariateShiftTest): Before you transfer anything, test whether the source and target distributions are meaningfully different. Uses Maximum Mean Discrepancy with a mixed kernel — RBF for continuous features (age, vehicle value), indicator for categorical ones (fuel type, body style). Returns a permutation-based p-value and per-feature drift scores so you can see which features are driving the divergence.

Penalised GLM transfer (GLMTransfer): Implements the two-step algorithm from Tian and Feng (JASA 2023). Step 1 pools target and source data and fits an l1-penalised GLM. Step 2 refines the estimate on target data only, penalising the adjustment to prevent overfitting. Supports Poisson (frequency), Gamma (severity), and Gaussian families. Source auto-detection excludes sources where the transfer direction is harmful.

GBM transfer (GBMTransfer): CatBoost source-as-offset. Generates log-predictions from a fitted source CatBoost model, uses them as a fixed baseline offset when training a residual GBM on target data. Works in two modes: offset (explicit offset, more interpretable) or init_model (CatBoost warm-start, fewer parameters to tune). CatBoost only.

CANN transfer (CANNTransfer, requires PyTorch): Pre-train a Combined Actuarial Neural Network on source data, fine-tune on the target segment. Three fine-tuning strategies: head_only (safe default for very thin segments), all (full fine-tune), progressive (head-only then full). Optional dependency.

Negative transfer diagnostics (NegativeTransferDiagnostic): Compares the transfer model against a target-only baseline and optionally against the source model applied directly. Reports Poisson deviance, the Negative Transfer Gap (NTG = deviance_transfer - deviance_target_only), and per-feature residual patterns.

Pipeline (TransferPipeline): Orchestrates the full workflow: shift test, method selection, fit, diagnostics. Use it when you want sensible defaults without chaining components manually.

Install

pip install insurance-transfer

With CatBoost support:

pip install insurance-transfer[catboost]

With PyTorch (CANN):

pip install insurance-transfer[torch]

Quick start

import numpy as np
from insurance_transfer import (
    CovariateShiftTest,
    GLMTransfer,
    NegativeTransferDiagnostic,
    TransferPipeline,
)

# Shift test
tester = CovariateShiftTest(categorical_cols=[3, 4], n_permutations=500)
result = tester.test(X_source, X_target)
print(result)
# ShiftTestResult(MMD2=0.0312, p=0.004 [significant], n_source=8000, n_target=150)

# See which features drift most
tester.most_drifted_features(result, top_n=3)

# GLM transfer
model = GLMTransfer(family='poisson', lambda_pool=0.01, lambda_debias=0.05)
model.fit(
    X_target, y_target, exposure_target,
    X_source=X_source, y_source=y_source, exposure_source=exposure_source,
)
predictions = model.predict(X_target, exposure_target)

# Full pipeline
pipeline = TransferPipeline(
    method='glm', shift_test=True, run_diagnostic=True,
    glm_params={'family': 'poisson', 'lambda_pool': 0.01},
)
result = pipeline.run(
    X_target, y_target, exposure_target,
    X_source=X_source, y_source=y_source,
)
print(result)

GBM transfer (CatBoost)

from catboost import CatBoostRegressor
from insurance_transfer import GBMTransfer

source_model = CatBoostRegressor(loss_function='Poisson', iterations=500)
source_model.fit(X_source, y_source)

transfer = GBMTransfer(
    source_model=source_model,
    mode='offset',
    catboost_params={'iterations': 100, 'depth': 4},
)
transfer.fit(X_target, y_target, exposure=exposure_target)
predictions = transfer.predict(X_target, exposure=exposure_target)

CANN transfer (PyTorch)

from insurance_transfer import CANNTransfer

model = CANNTransfer(
    hidden_sizes=[32, 16],
    finetune_strategy='head_only',
    pretrain_epochs=50,
    finetune_epochs=30,
)
model.fit_source(X_source, y_source, exposure_source)
model.fit(X_target, y_target, exposure_target)
predictions = model.predict(X_target, exposure_target)

Design choices

Poisson deviance as primary metric. Mean squared error is wrong for count data. We use Poisson deviance throughout, including in the NTG calculation.

Exposure as first-class parameter. Every method takes exposure as a dedicated argument, not as sample_weight. The two are not equivalent: exposure enters the log-offset, sample_weight scales the gradient contribution.

Mixed kernel for MMD. Insurance data is always mixed: continuous (driver age, vehicle value) and categorical (fuel type, body style, NCD band). A pure RBF kernel on label-encoded categoricals would be meaningless. The mixed kernel treats each type correctly.

l1 penalty not l2. The debiasing step in GLMTransfer uses l1 so that zero-correction is exact. If a feature transfers perfectly, its delta coefficient goes exactly to zero rather than shrinking towards it.

Auto-detection is greedy, not exhaustive. Checking all 2^k subsets of sources is infeasible for large source sets. The implementation checks each source individually and keeps those where the delta norm is below threshold.

References

Tian, Y. and Feng, Y. (2023). Transfer Learning under High-Dimensional Generalized Linear Models. Journal of the American Statistical Association, 118(544), 2684-2697.

Loke, S.-H. and Bauer, D. (2025). Transfer Learning in the Actuarial Domain: Foundations and Applications. North American Actuarial Journal. DOI: 10.1080/10920277.2025.2489637.

Schelldorfer, J. and Wuthrich, M. V. (2019). Nesting Classical Actuarial Models into Neural Networks. SSRN 3325285.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_transfer-0.1.1.tar.gz (167.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_transfer-0.1.1-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file insurance_transfer-0.1.1.tar.gz.

File metadata

  • Download URL: insurance_transfer-0.1.1.tar.gz
  • Upload date:
  • Size: 167.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_transfer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 53bf388de32706b1864d054ddb213dd34f337e85eec6f74cb4af8fb2734ffa4f
MD5 6c66d7d47e65fb531ac99e096358500e
BLAKE2b-256 65dbc217dd5772730a1d6edc0ad975e68d65087c421ef288a8f75752bc9f4bb4

See more details on using hashes here.

File details

Details for the file insurance_transfer-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: insurance_transfer-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_transfer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1e19a797a550872f3cb525a8b8819562497c813a702e2ac646ab36eaa7e855df
MD5 697bad2529148605aa76a2b3f401cedd
BLAKE2b-256 822a10838d70c3424f7eecf0f97268c827409cac3a99ba6e4ce90015149e753b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page