High performance Python GLMs with all the features!
Project description
glum
Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed glum
, a fast Python-first GLM library. The development was based on a fork of scikit-learn, so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!
The goal of glum
is to be at least as feature-complete as existing GLM libraries like glmnet
or h2o
. It supports
- Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”
- L1 regularization, which produces sparse and easily interpretable solutions
- L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects
- Elastic net regularization
- Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions
- Box constraints, linear inequality constraints, sample weights, offsets
This repo also includes tools for benchmarking GLM implementations in the glum_benchmarks
module. For details on the benchmarking, see here. Although the performance of glum
relative to glmnet
and h2o
depends on the specific problem, we find that when N >> K (there are more observations than predictors), it is consistently much faster for a wide range of problems.
For more information on glum
, including tutorials and API reference, please see the documentation.
Why did we choose the name glum
? We wanted a name that had the letters GLM and wasn't easily confused with any existing implementation. And we thought glum sounded like a funny name (and not glum at all!). If you need a more professional sounding name, feel free to pronounce it as G-L-um. Or maybe it stands for "Generalized linear... ummm... modeling?"
A classic example predicting housing prices
>>> from sklearn.datasets import fetch_openml
>>> from glum import GeneralizedLinearRegressor
>>>
>>> # This dataset contains house sale prices for King County, which includes
>>> # Seattle. It includes homes sold between May 2014 and May 2015.
>>> house_data = fetch_openml(name="house_sales", version=3, as_frame=True)
>>>
>>> # Use only select features
>>> X = house_data.data[
... [
... "bedrooms",
... "bathrooms",
... "sqft_living",
... "floors",
... "waterfront",
... "view",
... "condition",
... "grade",
... "yr_built",
... "yr_renovated",
... ]
... ].copy()
>>>
>>>
>>> # Model whether a house had an above or below median price via a Binomial
>>> # distribution. We'll be doing L1-regularized logistic regression.
>>> price = house_data.target
>>> y = (price < price.median()).values.astype(int)
>>> model = GeneralizedLinearRegressor(
... family='binomial',
... l1_ratio=1.0,
... alpha=0.001
... )
>>>
>>> _ = model.fit(X=X, y=y)
>>>
>>> # .report_diagnostics shows details about the steps taken by the iterative solver.
>>> diags = model.get_formatted_diagnostics(full_report=True)
>>> diags[['objective_fct']]
objective_fct
n_iter
0 0.693091
1 0.489500
2 0.449585
3 0.443681
4 0.443498
5 0.443497
>>>
>>> # Models can also be built with formulas from formulaic.
>>> model_formula = GeneralizedLinearRegressor(
... family='binomial',
... l1_ratio=1.0,
... alpha=0.001,
... formula="bedrooms + np.log(bathrooms + 1) + bs(sqft_living, 3) + C(waterfront)"
... )
>>> _ = model_formula.fit(X=house_data.data, y=y)
Installation
Please install the package through conda-forge:
conda install glum -c conda-forge
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for glum-3.0.1-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90e6e93f2c1b22193e8704ee15b871633ed1a4e55f2e58b522102d283c2e1b76 |
|
MD5 | 76338689519e97be3d2f4912b055107b |
|
BLAKE2b-256 | 7b2dbacd25e1f804a1f4d28aead9a222f7a650f2b88ec840458d294fcf951dd9 |
Hashes for glum-3.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a9b1e32712f5a74f988efac6e44963707b24476b5bb772250fb23896c08731d |
|
MD5 | 0e2d36bd78fec278fa7eb1cef6021793 |
|
BLAKE2b-256 | 46e6c9bc2b5e9f69c018bb284c194d8f52b640a9e45d2dc9f814f89d946de4da |
Hashes for glum-3.0.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 534b0447d02be5b04c0f95ef87e62f393e6b758b444e6b659d402fd14bbf1daa |
|
MD5 | 8156f56df55345f9e3bdbb92c2967922 |
|
BLAKE2b-256 | e76e5eefba3d56821a72117cd148d5533aebd86e967b3f7ffaa1e5f245dacbc2 |
Hashes for glum-3.0.1-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00e149b476f12d2614fe1a63db0a7f034c8532728c9beaad014180f41952d35e |
|
MD5 | 605d34072bc12f3d6ba4f2b73b2cf271 |
|
BLAKE2b-256 | 2a11fc87ec62effd07dd4aab2f8babf63c2cf8d2417ed3d82821357782eeb9cb |
Hashes for glum-3.0.1-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d51783d767dca58f14606298f67de9d7b26361cd3ed8283e72a7f439377a960 |
|
MD5 | 77bb8d7ad82441d7e1e33969ef836833 |
|
BLAKE2b-256 | c1485a2b0638c884f9d1a6e63b095b69a0cf4f45b0dfb94f5bdb415d11cc8ce5 |
Hashes for glum-3.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b270292b4e1884e2cf8a532011c33f335286bd83dbb30169fcb165c1be916b0 |
|
MD5 | 0a9fee5d121b090a8559937e507e98db |
|
BLAKE2b-256 | ee5a4c1108dc1b46e1b6fa189f08f997783ed544c4792669e55716eb7e1dee2f |
Hashes for glum-3.0.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b188cec2efaac1048abb0a7ba3d6294cae0d33a80435b20c209a8ade55d6d1ef |
|
MD5 | 80fcebdd694b8e6fdcaec4c9cdd44a9e |
|
BLAKE2b-256 | a5f2bb699550755d25c3f05506cde8e1b945f6f20a9a09e8fd3cd22e1b9ce5f3 |
Hashes for glum-3.0.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5269437977aa50a8e80bcb9a231eb4600b596de24a8c42b42d51fc2a43549b27 |
|
MD5 | 1df1a3a82b59547a706829cf9a003a47 |
|
BLAKE2b-256 | 607502103531260a475ba77f17dff3e7ab088ef21d26c116dc730b8b7afaa92d |
Hashes for glum-3.0.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5f8c74e67f7277e52cc2dec917d06107a2b5b9633fb9a7261fc36d01f4acb28 |
|
MD5 | 8ccae164e38616a60cdc52f1a7edb421 |
|
BLAKE2b-256 | 821e45ce3ce526651796fa9c1a1abac573914ab96a15e903d3a7c584607736a6 |
Hashes for glum-3.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed9c8cb69366a52d9f188220abcfd5738ea0ebbf3cbef73771b4fc942e9c76db |
|
MD5 | 9a66361ff5083a995052fbabf5ed7886 |
|
BLAKE2b-256 | 391c920db0ebfbf6893b2ab87bf93f5c2710e4a5c19d10f739e8abd244a8ec01 |
Hashes for glum-3.0.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d01b48f6d5282a8517eba46a2ae33ecbd9fa4e2e41fcacf3e36b183038739ed3 |
|
MD5 | cdfb18ec0dba9e0136d6d281c4b2d726 |
|
BLAKE2b-256 | 3953e2d1d23696165e3547a7c54ff25d48a79ebda05312a8b9e8776c73d827a5 |
Hashes for glum-3.0.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40d8eba95570fe197f3a4c8295fc2b3d6766d406f8f6786b9f3d3f473905df5f |
|
MD5 | bec4fc9de5ece7bb93e84b2f7dfa92e1 |
|
BLAKE2b-256 | 51d8f8ba669bba56d8acec5d6585653790bf1bef22f8ad4b8e15164b0dbe204d |
Hashes for glum-3.0.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5cc5c2bf8c0e6f208b2b588379faadd1daf37829e10f6606731007fdbec299c5 |
|
MD5 | 90a94333384698f355e9c7e1b27e24d4 |
|
BLAKE2b-256 | 21e8e956ae6cd01f9dc3aa29f63a528d0044854daaa8320e35b74a6ec8f7adc4 |
Hashes for glum-3.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d900eaabd2bc2497a56ab2f6d010edc11282731332ff0bee9e17bb5caa2a9d53 |
|
MD5 | 27eae423744f1a14416ab9bbe5e2e696 |
|
BLAKE2b-256 | 8740e4e7879b276c2fbd05eecc2cca0c4a0edd08b482ed2ee88f0ad9f14015d8 |
Hashes for glum-3.0.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5e2c4fc2c8305f69bdfcf3ea690f53e9d1c4e3b03936c4898dbd57aa3bc0e06 |
|
MD5 | f29ad36f8d3425a1c2f8af0c948d84a7 |
|
BLAKE2b-256 | cb9b6cc1bea2b604822629fe357659b456a20a622756674d224ff7d5023b7c63 |
Hashes for glum-3.0.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92b0122b948d6379d3df649c0caa74fb0ef196b054ade271aaf65329bd2836e7 |
|
MD5 | 9021d798045a3195529d293bcf2f9ef0 |
|
BLAKE2b-256 | 3d82eeb93bad858cf8ced8a4b15714fd346f7a8882ea7621beb9d7ee30b1ced6 |