PyTorch implementation of the Credibility Transformer (arXiv:2409.16653) with ICL extension (arXiv:2509.08122)

These details have not been verified by PyPI

Project links

Project description

insurance-credibility-transformer

PyTorch implementation of the Credibility Transformer from Richman, Scognamiglio & Wüthrich (2024) with the ICL extension from Padayachy et al. (2026).

The problem

Classical Bühlmann-Straub credibility assigns a fixed weight to individual experience versus the portfolio mean. That weight is a function of claim volume only — it doesn't care what the covariates look like. A policy with 5 years of claim-free history in a low-risk segment gets the same credibility as 5 years in a high-volatility segment.

GLMs ignore this problem entirely. Every prediction is treated as equally reliable.

Transformers don't have a natural answer either. The FT-Transformer (Gorishniy et al. 2021) is excellent at tabular feature interaction, but it has no mechanism for expressing "I'm uncertain about this risk, default to the portfolio mean".

The solution

The Credibility Transformer solves this by repurposing the [CLS] token. In standard Transformers, the CLS token is a learned summary of all features. In the CT, it plays a dual role:

c_trans: CLS after full self-attention (sees all covariates). This is the individual risk estimate.
c_prior: CLS through the FNN only, without attention. This is the portfolio mean.

The CLS self-attention weight P = a_{T+1,T+1} is the Bühlmann-Straub prior weight. (1-P) is the individual credibility weight. Both are learned and policy-specific. A policy with unusual feature combinations gets low P (high individual credibility). A policy that looks like the average gets high P.

During training, a Bernoulli(alpha) switch alternates between the two paths, forcing c_prior to encode the portfolio mean and c_trans to encode covariate signal.

The result: on French MTPL (610K policies), the base CT with 1,746 parameters outperforms a GLM (Poisson deviance 23.711 vs 24.102 × 10^-2), and beats models with 15x more parameters.

Installation

pip install insurance-credibility-transformer

Optional FAISS for ICL retrieval:

pip install "insurance-credibility-transformer[faiss]"

Quick start

from insurance_credibility_transformer import (
    CredibilityTransformer,
    CredibilityTransformerTrainer,
    AttentionExplainer,
)

# Base CT (1,746 params, trains on CPU in minutes)
ct = CredibilityTransformer(
    cat_cardinalities=[6, 2, 11, 22],  # levels per categorical feature
    n_num_features=5,
    embed_dim=5,           # b in paper
    n_heads=1,             # M (base CT = 1)
    n_layers=1,            # L (base CT = 1)
    alpha=0.90,            # credibility parameter
    dropout=0.01,
    link="log",            # frequency model
)

# Training
trainer = CredibilityTransformerTrainer(
    model=ct,
    loss="poisson",
    lr=1e-3,
    batch_size=1024,
    early_stopping_patience=20,
    n_ensemble=20,         # paper uses 20 runs averaged
)
trainer.fit(X_cat, X_num, y, exposure)
preds = trainer.predict(X_cat_test, X_num_test, exposure_test)

# Explainability: who gets individual credibility?
explainer = AttentionExplainer(ct)
P = explainer.cls_attention(X_cat, X_num)           # prior weights
z = explainer.individual_credibility(X_cat, X_num)  # individual weights (1-P)

Deep CT

The deep CT uses multi-head attention (M=2), three layers (L=3), SwiGLU gating, and differentiable Piecewise Linear Encoding for continuous features. ~320K parameters. GPU recommended.

deep_ct = CredibilityTransformer(
    cat_cardinalities=[6, 2, 11, 22],
    n_num_features=5,
    embed_dim=40,          # b=40 (paper)
    n_heads=2,
    n_layers=3,
    alpha=0.98,            # paper: alpha=98% for deep CT
    use_ple=True,          # PLE for continuous features
    n_ple_bins=16,
    use_swiglu=True,       # SwiGLU gating
)

ICL extension

The ICL-CT augments inference with a context batch of similar policies whose claim history is known. The ICL layer attends over this context with causal masking (target policies cannot attend to each other).

from insurance_credibility_transformer import ICLCredibilityTransformer, ICLTrainer

# Phase 1: train base CT first
trainer.fit(x_cat_train, x_num_train, y_train, exposure_train)

# Phase 2 + 3: ICL training
icl_ct = ICLCredibilityTransformer(base_ct=ct, icl_layers=2)
icl_trainer = ICLTrainer(icl_ct, lr_phase2=3e-4, lr_phase3=3e-5)
icl_trainer.fit(x_cat_train, x_num_train, y_train, exposure_train, run_phase3=True)

# Predict with context
preds = icl_trainer.predict(
    x_cat_target, x_num_target, exposure_target,
    x_cat_context, x_num_context, y_context, exposure_context,
)

Data format

X_cat:    (n, n_cat)   integer-encoded categorical features (0-indexed)
X_num:    (n, n_num)   float32 continuous features
y:        (n,)         claim counts (integer or float)
exposure: (n,)         policy years (v_i in paper)

Pass None for X_cat or X_num if there are no features of that type.

Results (French MTPL)

From arXiv:2409.16653, out-of-sample Poisson deviance × 10^-2:

Model	Deviance	Parameters
Null model	25.445	—
GLM	24.102	—
FNN ensemble	23.783	—
CT nadam ensemble	23.711	1,746
Deep CT ensemble	23.577	~320K
CAFTT (Brauer 2024)	23.726	27,133

The base CT gets within 0.13 units of the best deep model with 0.5% as many parameters.

Architecture decisions

Why a separate c_prior FNN, not the Transformer FNN: The credibility mechanism requires c_prior to be independent of attention. Reusing the Transformer FNN would contaminate c_prior with attention-processed information in multi-layer models. A dedicated FNN keeps the computation graphs clean.

Why Bernoulli sampling, not mixing: The paper is explicit — Z is binary, not a soft interpolation. This forces the decoder to work with either the individual or portfolio embedding in each gradient step, preventing the model from learning to average them.

Why NormFormer by default: The CT training instability reported in the paper (Section 4) comes from gradient magnitude mismatch between attention heads. Per-head scaling coefficients (NormFormer, Shleifer et al. 2021) are applied inside the attention module by default, even for the base CT. The cost is two extra parameters per head.

Why no sklearn dependency: The API is sklearn-compatible (fit/predict) but doesn't depend on sklearn. The library targets actuaries who may not have sklearn installed, and the Transformer training loop doesn't benefit from sklearn's cross-validation infrastructure.

References

Richman, Scognamiglio & Wüthrich (2024). The Credibility Transformer. arXiv:2409.16653
Padayachy, Richman, Scognamiglio & Wüthrich (2026). ICL-Enhanced Credibility Transformer. arXiv:2509.08122
Gorishniy et al. (2021). Revisiting Deep Learning Models for Tabular Data. arXiv:2106.11959
Bühlmann & Straub (1970). Credibility for Loss Ratios.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_credibility_transformer-0.1.0.tar.gz (100.5 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_credibility_transformer-0.1.0-py3-none-any.whl (32.4 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file insurance_credibility_transformer-0.1.0.tar.gz.

File metadata

Download URL: insurance_credibility_transformer-0.1.0.tar.gz
Upload date: Mar 11, 2026
Size: 100.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_credibility_transformer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`93f20ac0000d0cacfb69feb67b93314b37dad06820ed62d7647a343cc75a4579`
MD5	`58d7e697b2663210d54710a3a874bad9`
BLAKE2b-256	`8e42f02914fbf42642792083611646bec157dd598dfdd15c89a4a9de75924152`

See more details on using hashes here.

File details

Details for the file insurance_credibility_transformer-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_credibility_transformer-0.1.0-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 32.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_credibility_transformer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d8882f1d5acff272528905893629dfaf579e2a8f6ce935ad69118a59eebaa81a`
MD5	`7e8c7ef370ff57d6bcef7579303baf70`
BLAKE2b-256	`8e3f475fa1bf226b0c4085d1356925f4eedf566480bc16566357ab94aeff4392`

See more details on using hashes here.

insurance-credibility-transformer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-credibility-transformer

The problem

The solution

Installation

Quick start

Deep CT

ICL extension

Data format

Results (French MTPL)

Architecture decisions

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes