Transfer learning for thin-segment insurance pricing

Project description

insurance-transfer

Transfer learning for thin-segment insurance pricing.

The problem

Pricing actuaries routinely face the thin-data problem: you want to price young drivers, a new business class, or a pet breed, but you have fewer than 200 claims in the target segment. A model fitted on that data alone will overfit. Credibility blending helps, but it is a blunt instrument that does not respect covariate structure.

Transfer learning is a better answer. You have a large portfolio — say 50,000 motor policies. Some of that information is relevant to your thin segment. The question is how much to borrow, and how to correct for the fact that young drivers are not just a small random sample of all drivers.

This library implements three transfer methods adapted for insurance pricing, plus diagnostics to detect when the transfer is helping versus hurting.

What it does

Covariate shift detection (CovariateShiftTest): Before you transfer anything, test whether the source and target distributions are meaningfully different. Uses Maximum Mean Discrepancy with a mixed kernel — RBF for continuous features (age, vehicle value), indicator for categorical ones (fuel type, body style). Returns a permutation-based p-value and per-feature drift scores so you can see which features are driving the divergence.

Penalised GLM transfer (GLMTransfer): Implements the two-step algorithm from Tian and Feng (JASA 2023). Step 1 pools target and source data and fits an l1-penalised GLM. Step 2 refines the estimate on target data only, penalising the adjustment to prevent overfitting. Supports Poisson (frequency), Gamma (severity), and Gaussian families. Source auto-detection excludes sources where the transfer direction is harmful.

GBM transfer (GBMTransfer): CatBoost source-as-offset. Generates log-predictions from a fitted source CatBoost model, uses them as a fixed baseline offset when training a residual GBM on target data. Works in two modes: offset (explicit offset, more interpretable) or init_model (CatBoost warm-start, fewer parameters to tune). CatBoost only.

CANN transfer (CANNTransfer, requires PyTorch): Pre-train a Combined Actuarial Neural Network on source data, fine-tune on the target segment. Three fine-tuning strategies: head_only (safe default for very thin segments), all (full fine-tune), progressive (head-only then full). Optional dependency.

Negative transfer diagnostics (NegativeTransferDiagnostic): Compares the transfer model against a target-only baseline and optionally against the source model applied directly. Reports Poisson deviance, the Negative Transfer Gap (NTG = deviance_transfer - deviance_target_only), and per-feature residual patterns.

Pipeline (TransferPipeline): Orchestrates the full workflow: shift test, method selection, fit, diagnostics. Use it when you want sensible defaults without chaining components manually.

Install

pip install insurance-transfer

With CatBoost support:

pip install insurance-transfer[catboost]

With PyTorch (CANN):

pip install insurance-transfer[torch]

Quick start

import numpy as np
from insurance_transfer import (
    CovariateShiftTest,
    GLMTransfer,
    NegativeTransferDiagnostic,
    TransferPipeline,
)

# Shift test
tester = CovariateShiftTest(categorical_cols=[3, 4], n_permutations=500)
result = tester.test(X_source, X_target)
print(result)
# ShiftTestResult(MMD2=0.0312, p=0.004 [significant], n_source=8000, n_target=150)

# See which features drift most
tester.most_drifted_features(result, top_n=3)

# GLM transfer
model = GLMTransfer(family='poisson', lambda_pool=0.01, lambda_debias=0.05)
model.fit(
    X_target, y_target, exposure_target,
    X_source=X_source, y_source=y_source, exposure_source=exposure_source,
)
predictions = model.predict(X_target, exposure_target)

# Full pipeline
pipeline = TransferPipeline(
    method='glm', shift_test=True, run_diagnostic=True,
    glm_params={'family': 'poisson', 'lambda_pool': 0.01},
)
result = pipeline.run(
    X_target, y_target, exposure_target,
    X_source=X_source, y_source=y_source,
)
print(result)

GBM transfer (CatBoost)

from catboost import CatBoostRegressor
from insurance_transfer import GBMTransfer

source_model = CatBoostRegressor(loss_function='Poisson', iterations=500)
source_model.fit(X_source, y_source)

transfer = GBMTransfer(
    source_model=source_model,
    mode='offset',
    catboost_params={'iterations': 100, 'depth': 4},
)
transfer.fit(X_target, y_target, exposure=exposure_target)
predictions = transfer.predict(X_target, exposure=exposure_target)

CANN transfer (PyTorch)

from insurance_transfer import CANNTransfer

model = CANNTransfer(
    hidden_sizes=[32, 16],
    finetune_strategy='head_only',
    pretrain_epochs=50,
    finetune_epochs=30,
)
model.fit_source(X_source, y_source, exposure_source)
model.fit(X_target, y_target, exposure_target)
predictions = model.predict(X_target, exposure_target)

Design choices

Poisson deviance as primary metric. Mean squared error is wrong for count data. We use Poisson deviance throughout, including in the NTG calculation.

Exposure as first-class parameter. Every method takes exposure as a dedicated argument, not as sample_weight. The two are not equivalent: exposure enters the log-offset, sample_weight scales the gradient contribution.

Mixed kernel for MMD. Insurance data is always mixed: continuous (driver age, vehicle value) and categorical (fuel type, body style, NCD band). A pure RBF kernel on label-encoded categoricals would be meaningless. The mixed kernel treats each type correctly.

l1 penalty not l2. The debiasing step in GLMTransfer uses l1 so that zero-correction is exact. If a feature transfers perfectly, its delta coefficient goes exactly to zero rather than shrinking towards it.

Auto-detection is greedy, not exhaustive. Checking all 2^k subsets of sources is infeasible for large source sets. The implementation checks each source individually and keeps those where the delta norm is below threshold.

References

Tian, Y. and Feng, Y. (2023). Transfer Learning under High-Dimensional Generalized Linear Models. Journal of the American Statistical Association, 118(544), 2684-2697.

Loke, S.-H. and Bauer, D. (2025). Transfer Learning in the Actuarial Domain: Foundations and Applications. North American Actuarial Journal. DOI: 10.1080/10920277.2025.2489637.

Schelldorfer, J. and Wuthrich, M. V. (2019). Nesting Classical Actuarial Models into Neural Networks. SSRN 3325285.

Project details

Release history Release notifications | RSS feed

0.1.1

Mar 15, 2026

This version

0.1.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_transfer-0.1.0.tar.gz (165.9 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_transfer-0.1.0-py3-none-any.whl (27.3 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file insurance_transfer-0.1.0.tar.gz.

File metadata

Download URL: insurance_transfer-0.1.0.tar.gz
Upload date: Mar 12, 2026
Size: 165.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_transfer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c815018a9b3cf8a89ca396f6ebc54420fd7be534d424fd166de7f33752df984c`
MD5	`56b82bffce7f660f6303ccd88da7f619`
BLAKE2b-256	`bdfd115ca224e4e9a80767fec9262b48d6abe0e97f6db1e75c607a1d6a931d5f`

See more details on using hashes here.

File details

Details for the file insurance_transfer-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_transfer-0.1.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 27.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_transfer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd249ca983fde2cb56a26d20ab1eeafa2e9dacfc4590f919bfc15cb805839dad`
MD5	`79d52c13ab555b1bd8905e0940dbec6b`
BLAKE2b-256	`e642f8228d5e51d5f0613b5f30ec6457c5d573975b83b2de5bca6d36fdebc73c`

See more details on using hashes here.

insurance-transfer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

insurance-transfer

The problem

What it does

Install

Quick start

GBM transfer (CatBoost)

CANN transfer (PyTorch)

Design choices

References

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes