Neural Poisson mixture model separating structural and stochastic zero-claimers for UK insurance pricing
Project description
insurance-poisson-mixture-nn
Neural Poisson mixture model that separates structural zero-claimers from stochastic zero-claimers in UK personal lines insurance.
The problem
Your telematics motor book has a lot of zero-claim policies. Some of those zeros are structural: the driver installed the black box for the discount but barely drives. These policyholders will never claim regardless of how long you cover them. Others are stochastic: active drivers who happened not to have an accident this year. A longer policy period or worse luck and they would have claimed.
Standard frequency models — Poisson GLM, Poisson GBM — treat all zeros the same. Zero-inflated Poisson (ZIP) separates them using a single inflation parameter, but ZIP cannot identify which zero-claim policyholders are structural vs stochastic at the individual level.
This matters for pricing. Charging a structural zero the same frequency load as a stochastic zero means you are systematically overcharging low-risk policyholders. Under FCA Consumer Duty, that is a problem.
The solution
A two-component Poisson mixture estimated end-to-end with gradient descent:
P(Y=k | x) = (1 - pi(x)) * Poisson(k; lambda_0(x) * t)
+ pi(x) * Poisson(k; lambda_1(x) * t)
pi(x): probability the policyholder is in the risky (at-risk) group — estimated by a neural networklambda_0(x): claim rate for the safe/structural-zero group — kept near zero by the datalambda_1(x): claim rate for the risky group — always constrained above lambda_0t: exposure in policy years
The ordering constraint lambda_1 > lambda_0 is enforced via reparameterisation:
lambda_0 = softplus(a)
lambda_1 = lambda_0 + softplus(b)
This eliminates label-switching without any hard clipping.
The output pi(x) is a continuous structural zero score. A telematics driver with pi = 0.05 and zero claims is almost certainly a structural zero. A driver with pi = 0.8 and zero claims is a stochastic zero who was lucky this year.
Based on: Poisson Mixture Deep Learning Neural Network Models for the Prediction of Drivers' Claims with Excessive Zero Claims Using Telematics Data, North American Actuarial Journal (NAAJ), 2025.
Installation
pip install insurance-poisson-mixture-nn
With optional comparison baselines (requires statsmodels):
pip install insurance-poisson-mixture-nn[comparison]
Quick start
from insurance_poisson_mixture_nn import PoissonMixtureNN, PoissonMixtureTrainer, PoissonMixturePredictor
from insurance_poisson_mixture_nn.synthetic import SyntheticMixtureData
# Generate synthetic telematics data with known mixture structure
data = SyntheticMixtureData(n_policies=10_000, seed=42)
X_train, y_train, exp_train = data.training_split()
X_val, y_val, exp_val = data.validation_split()
X_test, y_test, exp_test = data.test_split()
# Build the model
model = PoissonMixtureNN(
n_features=X_train.shape[1],
hidden_sizes=[64, 64, 32, 32, 16], # paper architecture
dropout=0.1,
batch_norm=True,
activation='elu',
)
# Train
trainer = PoissonMixtureTrainer(
model,
lr=1e-3,
batch_size=512,
max_epochs=200,
patience=15,
)
history = trainer.fit(X_train, y_train, exp_train, X_val, y_val, exp_val)
print(f"Best val NLL: {history.best_val_nll:.4f} at epoch {history.best_epoch}")
# Predict
predictor = PoissonMixturePredictor(model)
# Expected claim frequency per policy (the pricing output)
expected_freq = predictor.predict_expected(X_test, exp_test)
# At-risk probability (pi score — high = risky group)
pi_scores = predictor.predict_pi(X_test)
# Structural zero score (1 - pi — high = likely never-claimer)
sz_scores = predictor.predict_structural_zero_score(X_test)
# Hard classification: structural vs stochastic
labels = predictor.classify_zero(X_test, threshold=0.5)
Diagnostics
from insurance_poisson_mixture_nn.diagnostics import MixtureDiagnostics
diag = MixtureDiagnostics(predictor)
# Component separation: distributions of lambda_0, lambda_1, pi
fig = diag.component_separation(X_test, exp_test)
# Pi calibration by decile
fig = diag.pi_calibration(X_test, y_test, exp_test)
# For zero-claim policies: structural vs stochastic attribution
fig = diag.zero_decomposition(X_test, y_test, exp_test)
# Training curves
fig = diag.training_curves(history)
Model comparison
from insurance_poisson_mixture_nn.comparison import ModelComparison
comp = ModelComparison(model, verbose=True)
results = comp.compare(X_train, y_train, exp_train, X_test, y_test, exp_test)
df = comp.results_dataframe()
print(df)
# Compares: Poisson GLM, ZIP GLM, Poisson DNN, PM-DNN
Architecture
The shared trunk approach is a deliberate design choice. Three separate sub-networks for pi, lambda_0, and lambda_1 would have three times the parameters and would not share the feature representations learned from the shared training signal. The shared trunk learns a single latent representation; the three heads then specialise it.
Architecture by default:
- 5 hidden layers: [64, 64, 32, 32, 16]
- ELU activation (paper default — avoids dead neurons better than ReLU)
- BatchNorm + Dropout (0.1)
- Adam with ReduceLROnPlateau
- Gradient clipping (max_norm=1.0) for stability in early epochs
When to use this
Use it when:
- You have telematics data and believe some policies are near-zero-exposure structural zeros
- Your zero-claim fraction is high and you suspect a genuine mixture (not just overdispersion)
- You want a per-policy structural zero score for pricing or NCD ladder adjustment
Do not use it when:
- You have no telematics or occupancy data — the model cannot identify structural zeros without informative covariates
- Your excess zeros are driven by overdispersion rather than a genuine two-group structure (use Negative Binomial instead)
- You want an interpretable GLM-style model — this is a black-box neural network
Requirements
- Python >= 3.10
- PyTorch >= 2.0
- Polars >= 0.20
- NumPy >= 1.24
- scikit-learn >= 1.3
Licence
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_poisson_mixture_nn-0.1.0.tar.gz.
File metadata
- Download URL: insurance_poisson_mixture_nn-0.1.0.tar.gz
- Upload date:
- Size: 38.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4baf35beae947d1c38f743d7b4cb8f76d763075b9c5e3c921a9d92be86912f4d
|
|
| MD5 |
82967c325f4da44ad37bdded3ad992f7
|
|
| BLAKE2b-256 |
a071ba5dc53d25f6ce435c98bf0d5331c86dd93fd16641eb045366dff9a5344b
|
File details
Details for the file insurance_poisson_mixture_nn-0.1.0-py3-none-any.whl.
File metadata
- Download URL: insurance_poisson_mixture_nn-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e43390ccc8d33a9a792b2494ae339c2a1c91acfbf496ca7e5cdbb778e9d0984b
|
|
| MD5 |
9be8233121a7a8d67f43339b52af4018
|
|
| BLAKE2b-256 |
e453b73a9e6b6a318c8957a8c679522d4564705ea9b162c72bddcd8bca43993f
|