Skip to main content

Automate insurance pricing with LASSO-regularised GLMs – blueprint generation, preprocessing, model fitting, rate table extraction & plotting. Built on glum.

Project description

easy_glm

Python package to automate building insurance ratetables using (fused) LASSO regularised GLMs. Internally it leverages glum for fitting, providing a higher-level interface tailored to insurance pricing workflows (blueprints, preprocessing, model fitting, rate table extraction & plotting). Inspired by the R package aglm. Packaged with a modern src/ layout.

Installation & Setup

This project uses uv for fast dependency management and venv for virtual environments to ensure reproducibility.

Prerequisites

  1. Python 3.10–3.13 - CI tests these versions.
  2. uv - Fast Python package installer and resolver

Install uv:

# On Unix/Linux/macOS
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Installing from Git (Single Command)

You can install the package directly from Git using a single command:

uv pip install git+https://github.com/serband/easy_glm.git

This is the fastest way to get started with easy_glm without cloning the repository.

Quick Setup

Choose one of the following methods to set up your development environment:

Option 1: Cross-platform Python script (Recommended)

python setup_dev.py

Option 2: Direct installation from Git

uv pip install git+https://github.com/serband/easy_glm.git

Option 3: Platform-specific scripts

On Windows (PowerShell):

.\setup_dev.ps1

On Unix/Linux/macOS:

chmod +x setup_dev.sh
./setup_dev.sh

Manual Setup

If you prefer to set up manually:

  1. Create virtual environment:

    python -m venv venv
    
  2. Activate virtual environment:

    # On Windows
    venv\Scripts\activate
    
    # On Unix/Linux/macOS
    source venv/bin/activate
    
  3. Install dependencies:

    uv pip install -r requirements-dev.txt
    uv pip install -e .
    

Usage Example

Here's a complete example of how to use easy_glm to build and visualize insurance rate tables.

For a minimal runnable script, see examples/basic_usage.py.

1. Import Libraries and Load Data

First, import the necessary libraries and load the sample dataset. The package includes a function to load a sample French motor insurance dataset.

import easy_glm 
import polars as pl
import numpy as np

# Load the sample dataset
df = easy_glm.load_external_dataframe()

# Create a train-test split for validation
df = df.with_columns(
    pl.when(pl.lit(np.random.rand(df.height) < 0.7))
    .then(1)
    .otherwise(0)
    .alias("traintest")
)

2. Generate a Preprocessing Blueprint

The generate_blueprint function analyzes the dataframe and creates a "blueprint" that defines how each variable should be preprocessed for modeling.

  • Numeric columns: It computes quantile breakpoints.
  • Categorical columns: It identifies the levels to keep, lumping rare ones into an 'Other' category.
# Generate the blueprint for the dataset 
blueprint = easy_glm.generate_blueprint(df)

3. Prepare Data for Modeling

Using the blueprint, the prepare_data function transforms the raw data into a feature matrix suitable for the GLM. It applies the transformations defined in the blueprint (binning for numerics, lumping for categoricals).

# Define predictor variables
predictor_variables = ['VehAge', 'Region', 'VehGas', 'DrivAge', 'BonusMalus', 'Density']

# Prepare the dataset for modelling
prepped_data = easy_glm.prepare_data(
    df=df, 
    modelling_variables=predictor_variables, 
    additional_columns=['Exposure', 'ClaimNb'], 
    formats=blueprint, 
    traintest_column='traintest', 
    table_name='cars'
)

4. Fit the LASSO GLM

Fit a LASSO-regularized Generalized Linear Model (GLM) using the prepared data. The fit_lasso_glm function uses cross-validation to find the optimal regularization strength.

# Fit the model
model = easy_glm.fit_lasso_glm(
    dataframe=prepped_data, 
    target="ClaimNb", 
    model_type="Poisson", 
    weight_col="Exposure", 
    train_test_col="traintest",
    divide_target_by_weight=True
)

5. Predict on New Data (Optional)

If you have already prepared data (i.e. ran prepare_data with the same blueprint & predictors) you can obtain predictions using the helper:

# Assume `prepped_data` as above and `model` fitted
new_rows_prepped = prepped_data.head(10).select(pl.all().exclude(["ClaimNb", "Exposure", "traintest"]))
preds = easy_glm.predict_with_model(model, new_rows_prepped)

If you start from raw rows, run prepare_data first with the same formats (blueprint) and predictor list.

6. Generate All Rate Tables

With a fitted model, you can now generate the rate tables for all predictor variables. The generate_all_ratetables function loops through each variable and calculates its relativity.

# Generate rate tables for all predictor variables
all_tables = easy_glm.generate_all_ratetables(
    model=model,
    dataset=df,
    predictor_variables=predictor_variables,
    blueprint=blueprint
)

# You can access the rate table for a specific variable like this:
print(all_tables['VehAge'])

7. Plot the Rate Tables

Finally, visualize the relativities using the plot_all_ratetables function. This will generate a plot for each variable, making it easy to interpret the model's results.

  • Numeric variables are shown as line plots.
  • Categorical variables are shown as bar plots.
# Plot all rate tables
easy_glm.plot_all_ratetables(all_tables, blueprint)

This will produce a series of plots, one for each variable.

Development

Activating the Environment

After initial setup, activate your environment:

# On Windows
venv\Scripts\activate

# On Unix/Linux/macOS
source venv/bin/activate

Code Quality

The project includes code quality tools:

# Format code
black .

# Lint code
ruff check .

# Run tests
pytest

Project Structure

easy_glm/
├── src/easy_glm/        # Library code (packaged)
│   └── core/            # Core implementation modules
├── tests/               # Pytest test suite
├── examples/            # Usage examples
├── test.py              # Lightweight smoke script
├── pyproject.toml       # Packaging configuration
├── requirements*.txt    # Dependency constraint files
├── setup_dev.*          # Dev environment helpers
└── README.md

Dependencies

Core Dependencies

  • duckdb: Fast analytical database for data processing (v1.3+)
  • polars: Fast dataframes library (v1.17+)
  • numpy: Numerical computing
  • pyarrow: Columnar data format
  • glum: GLM implementation (v3.0+)
  • pandas: Data manipulation and analysis
  • matplotlib: Plotting library
  • seaborn: Statistical data visualization
  • scikit-learn: Machine learning utilities

Development Dependencies

  • pytest: Testing framework
  • black: Code formatter
  • ruff: Fast Python linter
  • jupyter: Notebook environment

Additional Usage Ideas

Roadmap ideas:

  • Export all ratetables to CSV / Parquet bundle
  • Inverse transform scoring for new raw data (auto-prepare + predict)
  • Automated monotonic binning / isotonic smoothing option
  • CLI entry point (python -m easy_glm build ...)
  • Optional caching of downloaded demo dataset
  • Configurable blueprint strategies (equal-frequency vs fixed breaks)

Test Performance Tuning

CI sets EASY_GLM_MAX_ROWS=500 to limit dataset size for quicker tests. You can mimic locally:

export EASY_GLM_MAX_ROWS=500
pytest -q

Contributing

See CONTRIBUTING.md for the full guide. Quick checklist:

ruff check .
black .
pytest

License

MIT – see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_glm-0.1.0.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easy_glm-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file easy_glm-0.1.0.tar.gz.

File metadata

  • Download URL: easy_glm-0.1.0.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for easy_glm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3568467f49b400d43f8e39fdeb9d38d88b4611d69fe470c9578221da4812dcf4
MD5 d6f409aaf5a324f352493c3a585be38c
BLAKE2b-256 bd853c7e549991819268ea0f2cb615dd5d41e0bb2413b48a93b4c70c8d276da9

See more details on using hashes here.

File details

Details for the file easy_glm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: easy_glm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for easy_glm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f25bf75bdea584ee101b72817b303259463ed6d6ebe6fb203dbade7aca88bd2d
MD5 91100778577b214cde47219273829d09
BLAKE2b-256 cf9a6c90d3028231d92b7832b56ec7b425a1af139a0c2cb7176f1d77f68440c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page