Deprecated. Use insurance-glm-tools instead.
Project description
insurance-glm-cluster
DEPRECATED. This repository is archived. All functionality has been reconciled into insurance-glm-tools, which is the canonical home for GLM factor clustering going forward.
Migrate by replacing:
# Old from insurance_glm_cluster import FactorClusterer from insurance_glm_cluster.constraints import enforce_min_claims, enforce_monotonicity, check_monotonicity from insurance_glm_cluster.utils import build_split_coding_matrix, apply_split_coding # New from insurance_glm_tools.cluster import FactorClusterer from insurance_glm_tools.cluster import enforce_min_claims, enforce_monotonicity, check_monotonicity from insurance_glm_tools.cluster import build_split_coding_matrix, apply_split_codingThe
insurance-glm-toolscluster subpackage is a superset of this library: it has a betterDiagnosticPath, exposure-weighted coefficient averaging in the constraint enforcement, and the full R2VFFactorClustererAPI. Everything unique to this repo (enforce_min_claims,enforce_monotonicity,check_monotonicity,build_split_coding_matrix,apply_split_coding) was ported in full on 2026-03-14.
Automated GLM factor level clustering for insurance pricing.
The problem
You've got 500 vehicle makes in your motor book. Your pricing GLM needs to handle them. You can't fit 500 dummies — the data is too thin, the model will overfit, and you'll end up with nonsense relativities for rare makes.
The traditional fix is manual grouping: spend a week in Excel, consult a book of makes and models, build a lookup table, argue with underwriters. This works but doesn't scale, introduces analyst bias, and has to be redone every model cycle.
insurance-glm-cluster automates this. It collapses high-cardinality categorical factors into pricing bands using regularised regression, with proper statistical underpinning and no arbitrary decisions.
How it works
The library implements the R2VF algorithm (Ben Dror, arXiv:2503.01521, 2025). The key insight is that the standard fused lasso approach — penalising differences between adjacent factor level coefficients — requires a natural ordering. Ordinal factors (vehicle age, NCD years) have one; nominal factors (vehicle make, occupation) don't.
R2VF solves this in two steps:
Step 1 — Ranking. Fit a Ridge GLM on all factor dummies simultaneously. The resulting coefficients give a data-driven ordering for each nominal factor: levels with similar risk profiles end up adjacent, levels with different profiles end up far apart.
Step 2 — Fusion. Re-encode each nominal factor as ordinal using the Step 1 ranking. Apply a standard fused lasso (via the split-coding trick) to all factors. Where the fused lasso penalty drives adjacent-level differences to zero, those levels are merged.
Step 3 — Refit. Fit an unpenalised GLM on the merged groupings to remove shrinkage bias from Step 2.
The split-coding trick is what makes this practical without cvxpy or specialised solvers: transform the design matrix so that standard L1 (sklearn Lasso) achieves the fused lasso objective. No quadratic programming required.
Installation
Use insurance-glm-tools instead. This package is no longer maintained.
pip install insurance-glm-tools
References
- Ben Dror, I. (2025). Variable Fusion for Insurance Pricing: R2VF Algorithm. arXiv:2503.01521.
- Tibshirani, R. J., & Taylor, J. (2011). The solution path of the generalized lasso. Annals of Statistics, 39(3), 1335–1371.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_glm_cluster-0.2.0.tar.gz.
File metadata
- Download URL: insurance_glm_cluster-0.2.0.tar.gz
- Upload date:
- Size: 89.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d194c1438ae9510cb621083e71b94bd3f966e7c7ace93637acc090ea1377f40
|
|
| MD5 |
e4b872f880bca158b9442f807979e34e
|
|
| BLAKE2b-256 |
0f8ae69cf45e1926878b4c830a45d2456a335b512c3aa0e02490dc9bbe03153e
|
File details
Details for the file insurance_glm_cluster-0.2.0-py3-none-any.whl.
File metadata
- Download URL: insurance_glm_cluster-0.2.0-py3-none-any.whl
- Upload date:
- Size: 3.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1edbda8be7b9715d7a86e99cb09c3d120cc2a63176d64c36427456c5a0f0073e
|
|
| MD5 |
2ed2dfd41cd39128927d41d5772d7895
|
|
| BLAKE2b-256 |
a3eb8ce2a2aaa86dea6faaa7bd7bba40509ca6bb7ec776b89251760ae9322410
|