Skip to main content

Deprecated. Use insurance-glm-tools instead.

Project description

insurance-glm-cluster

DEPRECATED. This repository is archived. All functionality has been reconciled into insurance-glm-tools, which is the canonical home for GLM factor clustering going forward.

Migrate by replacing:

# Old
from insurance_glm_cluster import FactorClusterer
from insurance_glm_cluster.constraints import enforce_min_claims, enforce_monotonicity, check_monotonicity
from insurance_glm_cluster.utils import build_split_coding_matrix, apply_split_coding

# New
from insurance_glm_tools.cluster import FactorClusterer
from insurance_glm_tools.cluster import enforce_min_claims, enforce_monotonicity, check_monotonicity
from insurance_glm_tools.cluster import build_split_coding_matrix, apply_split_coding

The insurance-glm-tools cluster subpackage is a superset of this library: it has a better DiagnosticPath, exposure-weighted coefficient averaging in the constraint enforcement, and the full R2VF FactorClusterer API. Everything unique to this repo (enforce_min_claims, enforce_monotonicity, check_monotonicity, build_split_coding_matrix, apply_split_coding) was ported in full on 2026-03-14.


Automated GLM factor level clustering for insurance pricing.

The problem

You've got 500 vehicle makes in your motor book. Your pricing GLM needs to handle them. You can't fit 500 dummies — the data is too thin, the model will overfit, and you'll end up with nonsense relativities for rare makes.

The traditional fix is manual grouping: spend a week in Excel, consult a book of makes and models, build a lookup table, argue with underwriters. This works but doesn't scale, introduces analyst bias, and has to be redone every model cycle.

insurance-glm-cluster automates this. It collapses high-cardinality categorical factors into pricing bands using regularised regression, with proper statistical underpinning and no arbitrary decisions.

How it works

The library implements the R2VF algorithm (Ben Dror, arXiv:2503.01521, 2025). The key insight is that the standard fused lasso approach — penalising differences between adjacent factor level coefficients — requires a natural ordering. Ordinal factors (vehicle age, NCD years) have one; nominal factors (vehicle make, occupation) don't.

R2VF solves this in two steps:

Step 1 — Ranking. Fit a Ridge GLM on all factor dummies simultaneously. The resulting coefficients give a data-driven ordering for each nominal factor: levels with similar risk profiles end up adjacent, levels with different profiles end up far apart.

Step 2 — Fusion. Re-encode each nominal factor as ordinal using the Step 1 ranking. Apply a standard fused lasso (via the split-coding trick) to all factors. Where the fused lasso penalty drives adjacent-level differences to zero, those levels are merged.

Step 3 — Refit. Fit an unpenalised GLM on the merged groupings to remove shrinkage bias from Step 2.

The split-coding trick is what makes this practical without cvxpy or specialised solvers: transform the design matrix so that standard L1 (sklearn Lasso) achieves the fused lasso objective. No quadratic programming required.

Installation

Use insurance-glm-tools instead. This package is no longer maintained.

pip install insurance-glm-tools

References

  • Ben Dror, I. (2025). Variable Fusion for Insurance Pricing: R2VF Algorithm. arXiv:2503.01521.
  • Tibshirani, R. J., & Taylor, J. (2011). The solution path of the generalized lasso. Annals of Statistics, 39(3), 1335–1371.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_glm_cluster-0.2.0.tar.gz (89.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_glm_cluster-0.2.0-py3-none-any.whl (3.2 kB view details)

Uploaded Python 3

File details

Details for the file insurance_glm_cluster-0.2.0.tar.gz.

File metadata

  • Download URL: insurance_glm_cluster-0.2.0.tar.gz
  • Upload date:
  • Size: 89.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_glm_cluster-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7d194c1438ae9510cb621083e71b94bd3f966e7c7ace93637acc090ea1377f40
MD5 e4b872f880bca158b9442f807979e34e
BLAKE2b-256 0f8ae69cf45e1926878b4c830a45d2456a335b512c3aa0e02490dc9bbe03153e

See more details on using hashes here.

File details

Details for the file insurance_glm_cluster-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_glm_cluster-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 3.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_glm_cluster-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1edbda8be7b9715d7a86e99cb09c3d120cc2a63176d64c36427456c5a0f0073e
MD5 2ed2dfd41cd39128927d41d5772d7895
BLAKE2b-256 a3eb8ce2a2aaa86dea6faaa7bd7bba40509ca6bb7ec776b89251760ae9322410

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page