Skip to main content

PyTorch implementation of the Credibility Transformer (arXiv:2409.16653) with ICL extension (arXiv:2509.08122)

Project description

insurance-credibility-transformer

PyTorch implementation of the Credibility Transformer from Richman, Scognamiglio & Wüthrich (2024) with the ICL extension from Padayachy et al. (2026).

The problem

Classical Bühlmann-Straub credibility assigns a fixed weight to individual experience versus the portfolio mean. That weight is a function of claim volume only — it doesn't care what the covariates look like. A policy with 5 years of claim-free history in a low-risk segment gets the same credibility as 5 years in a high-volatility segment.

GLMs ignore this problem entirely. Every prediction is treated as equally reliable.

Transformers don't have a natural answer either. The FT-Transformer (Gorishniy et al. 2021) is excellent at tabular feature interaction, but it has no mechanism for expressing "I'm uncertain about this risk, default to the portfolio mean".

The solution

The Credibility Transformer solves this by repurposing the [CLS] token. In standard Transformers, the CLS token is a learned summary of all features. In the CT, it plays a dual role:

  • c_trans: CLS after full self-attention (sees all covariates). This is the individual risk estimate.
  • c_prior: CLS through the FNN only, without attention. This is the portfolio mean.

The CLS self-attention weight P = a_{T+1,T+1} is the Bühlmann-Straub prior weight. (1-P) is the individual credibility weight. Both are learned and policy-specific. A policy with unusual feature combinations gets low P (high individual credibility). A policy that looks like the average gets high P.

During training, a Bernoulli(alpha) switch alternates between the two paths, forcing c_prior to encode the portfolio mean and c_trans to encode covariate signal.

The result: on French MTPL (610K policies), the base CT with 1,746 parameters outperforms a GLM (Poisson deviance 23.711 vs 24.102 × 10^-2), and beats models with 15x more parameters.

Installation

pip install insurance-credibility-transformer

Optional FAISS for ICL retrieval:

pip install "insurance-credibility-transformer[faiss]"

Quick start

from insurance_credibility_transformer import (
    CredibilityTransformer,
    CredibilityTransformerTrainer,
    AttentionExplainer,
)

# Base CT (1,746 params, trains on CPU in minutes)
ct = CredibilityTransformer(
    cat_cardinalities=[6, 2, 11, 22],  # levels per categorical feature
    n_num_features=5,
    embed_dim=5,           # b in paper
    n_heads=1,             # M (base CT = 1)
    n_layers=1,            # L (base CT = 1)
    alpha=0.90,            # credibility parameter
    dropout=0.01,
    link="log",            # frequency model
)

# Training
trainer = CredibilityTransformerTrainer(
    model=ct,
    loss="poisson",
    lr=1e-3,
    batch_size=1024,
    early_stopping_patience=20,
    n_ensemble=20,         # paper uses 20 runs averaged
)
trainer.fit(X_cat, X_num, y, exposure)
preds = trainer.predict(X_cat_test, X_num_test, exposure_test)

# Explainability: who gets individual credibility?
explainer = AttentionExplainer(ct)
P = explainer.cls_attention(X_cat, X_num)           # prior weights
z = explainer.individual_credibility(X_cat, X_num)  # individual weights (1-P)

Deep CT

The deep CT uses multi-head attention (M=2), three layers (L=3), SwiGLU gating, and differentiable Piecewise Linear Encoding for continuous features. ~320K parameters. GPU recommended.

deep_ct = CredibilityTransformer(
    cat_cardinalities=[6, 2, 11, 22],
    n_num_features=5,
    embed_dim=40,          # b=40 (paper)
    n_heads=2,
    n_layers=3,
    alpha=0.98,            # paper: alpha=98% for deep CT
    use_ple=True,          # PLE for continuous features
    n_ple_bins=16,
    use_swiglu=True,       # SwiGLU gating
)

ICL extension

The ICL-CT augments inference with a context batch of similar policies whose claim history is known. The ICL layer attends over this context with causal masking (target policies cannot attend to each other).

from insurance_credibility_transformer import ICLCredibilityTransformer, ICLTrainer

# Phase 1: train base CT first
trainer.fit(x_cat_train, x_num_train, y_train, exposure_train)

# Phase 2 + 3: ICL training
icl_ct = ICLCredibilityTransformer(base_ct=ct, icl_layers=2)
icl_trainer = ICLTrainer(icl_ct, lr_phase2=3e-4, lr_phase3=3e-5)
icl_trainer.fit(x_cat_train, x_num_train, y_train, exposure_train, run_phase3=True)

# Predict with context
preds = icl_trainer.predict(
    x_cat_target, x_num_target, exposure_target,
    x_cat_context, x_num_context, y_context, exposure_context,
)

Data format

X_cat:    (n, n_cat)   integer-encoded categorical features (0-indexed)
X_num:    (n, n_num)   float32 continuous features
y:        (n,)         claim counts (integer or float)
exposure: (n,)         policy years (v_i in paper)

Pass None for X_cat or X_num if there are no features of that type.

Results (French MTPL)

From arXiv:2409.16653, out-of-sample Poisson deviance × 10^-2:

Model Deviance Parameters
Null model 25.445
GLM 24.102
FNN ensemble 23.783
CT nadam ensemble 23.711 1,746
Deep CT ensemble 23.577 ~320K
CAFTT (Brauer 2024) 23.726 27,133

The base CT gets within 0.13 units of the best deep model with 0.5% as many parameters.

Architecture decisions

Why a separate c_prior FNN, not the Transformer FNN: The credibility mechanism requires c_prior to be independent of attention. Reusing the Transformer FNN would contaminate c_prior with attention-processed information in multi-layer models. A dedicated FNN keeps the computation graphs clean.

Why Bernoulli sampling, not mixing: The paper is explicit — Z is binary, not a soft interpolation. This forces the decoder to work with either the individual or portfolio embedding in each gradient step, preventing the model from learning to average them.

Why NormFormer by default: The CT training instability reported in the paper (Section 4) comes from gradient magnitude mismatch between attention heads. Per-head scaling coefficients (NormFormer, Shleifer et al. 2021) are applied inside the attention module by default, even for the base CT. The cost is two extra parameters per head.

Why no sklearn dependency: The API is sklearn-compatible (fit/predict) but doesn't depend on sklearn. The library targets actuaries who may not have sklearn installed, and the Transformer training loop doesn't benefit from sklearn's cross-validation infrastructure.

References

  • Richman, Scognamiglio & Wüthrich (2024). The Credibility Transformer. arXiv:2409.16653
  • Padayachy, Richman, Scognamiglio & Wüthrich (2026). ICL-Enhanced Credibility Transformer. arXiv:2509.08122
  • Gorishniy et al. (2021). Revisiting Deep Learning Models for Tabular Data. arXiv:2106.11959
  • Bühlmann & Straub (1970). Credibility for Loss Ratios.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_credibility_transformer-0.1.0.tar.gz (100.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file insurance_credibility_transformer-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_credibility_transformer-0.1.0.tar.gz
  • Upload date:
  • Size: 100.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_credibility_transformer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 93f20ac0000d0cacfb69feb67b93314b37dad06820ed62d7647a343cc75a4579
MD5 58d7e697b2663210d54710a3a874bad9
BLAKE2b-256 8e42f02914fbf42642792083611646bec157dd598dfdd15c89a4a9de75924152

See more details on using hashes here.

File details

Details for the file insurance_credibility_transformer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_credibility_transformer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_credibility_transformer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d8882f1d5acff272528905893629dfaf579e2a8f6ce935ad69118a59eebaa81a
MD5 7e8c7ef370ff57d6bcef7579303baf70
BLAKE2b-256 8e3f475fa1bf226b0c4085d1356925f4eedf566480bc16566357ab94aeff4392

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page