High performance Python GLMs with all the features!
Project description
glum
Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed glum
, a fast Python-first GLM library. The development was based on a fork of scikit-learn, so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!
The goal of glum
is to be at least as feature-complete as existing GLM libraries like glmnet
or h2o
. It supports
- Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”
- L1 regularization, which produces sparse and easily interpretable solutions
- L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects
- Elastic net regularization
- Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions
- Box constraints, linear inequality constraints, sample weights, offsets
This repo also includes tools for benchmarking GLM implementations in the glum_benchmarks
module. For details on the benchmarking, see here. Although the performance of glum
relative to glmnet
and h2o
depends on the specific problem, we find that when N >> K (there are more observations than predictors), it is consistently much faster for a wide range of problems.
For more information on glum
, including tutorials and API reference, please see the documentation.
Why did we choose the name glum
? We wanted a name that had the letters GLM and wasn't easily confused with any existing implementation. And we thought glum sounded like a funny name (and not glum at all!). If you need a more professional sounding name, feel free to pronounce it as G-L-um. Or maybe it stands for "Generalized linear... ummm... modeling?"
A classic example predicting housing prices
>>> from sklearn.datasets import fetch_openml
>>> from glum import GeneralizedLinearRegressor
>>>
>>> # This dataset contains house sale prices for King County, which includes
>>> # Seattle. It includes homes sold between May 2014 and May 2015.
>>> house_data = fetch_openml(name="house_sales", version=3, as_frame=True)
>>>
>>> # Use only select features
>>> X = house_data.data[
... [
... "bedrooms",
... "bathrooms",
... "sqft_living",
... "floors",
... "waterfront",
... "view",
... "condition",
... "grade",
... "yr_built",
... "yr_renovated",
... ]
... ].copy()
>>>
>>>
>>> # Model whether a house had an above or below median price via a Binomial
>>> # distribution. We'll be doing L1-regularized logistic regression.
>>> price = house_data.target
>>> y = (price < price.median()).values.astype(int)
>>> model = GeneralizedLinearRegressor(
... family='binomial',
... l1_ratio=1.0,
... alpha=0.001
... )
>>>
>>> _ = model.fit(X=X, y=y)
>>>
>>> # .report_diagnostics shows details about the steps taken by the iterative solver.
>>> diags = model.get_formatted_diagnostics(full_report=True)
>>> diags[['objective_fct']]
objective_fct
n_iter
0 0.693091
1 0.489500
2 0.449585
3 0.443681
4 0.443498
5 0.443497
>>>
>>> # Models can also be built with formulas from formulaic.
>>> model_formula = GeneralizedLinearRegressor(
... family='binomial',
... l1_ratio=1.0,
... alpha=0.001,
... formula="bedrooms + np.log(bathrooms + 1) + bs(sqft_living, 3) + C(waterfront)"
... )
>>> _ = model_formula.fit(X=house_data.data, y=y)
Installation
Please install the package through conda-forge:
conda install glum -c conda-forge
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for glum-3.0.0-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89cd25d99acbe7555188d42c6882b394119aa5dde949abb4674c59e5cecfe232 |
|
MD5 | c6e37d95b3913dd1441a7fa6708a0d95 |
|
BLAKE2b-256 | 0ab2030d597730bcd42276769dbbad0e69a4bd00f1edc1ff51210726ec937315 |
Hashes for glum-3.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f073cab818caabffc705cf47c1f06d88021b7ea786d62e5d129ebdae9bcf9f5f |
|
MD5 | ac739d8a486cb4ef97c10e8c9402e4ac |
|
BLAKE2b-256 | dcb4c57a389c01588a67345d438253e1f917a2f66826b2f33cc0b5cf0b85ecfb |
Hashes for glum-3.0.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d7c9370b7a3c486d7f098c707642b47104891fa1b95329f61fbd374ed98a1dd |
|
MD5 | 8329293c35aa75a62842079e560af55a |
|
BLAKE2b-256 | 0e74ea03743c0741d08f53ea4df90e2cf3ebeb04555f1e4302d629c1fd0b9618 |
Hashes for glum-3.0.0-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0bd37fd1c1056f013a3c081a579e651e6fe87c05cb484849ab93bf28bca212a8 |
|
MD5 | d717cf81af30f547952d2df272f71270 |
|
BLAKE2b-256 | d9703732fc353dbe2278a1f09eafa51a6aa9efaa3dc6aba0eacbc33427d8eb3c |
Hashes for glum-3.0.0-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 692bbfce2f9d55963eae02f73fe2cbb9bfe0b557ecf9dfd4a2ee0e571bcaf3a5 |
|
MD5 | 24c0a120307c7b6568f1379a3e0d41cb |
|
BLAKE2b-256 | 3fb784caad8bd49bcaefc11b2ff5a6fc7995cc5cb8f31a59f6d5dc822e0191a1 |
Hashes for glum-3.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e3f627b3527a94e1546c5ed8cee9b15d2949fd065322a5334d188b11b0a9a2c |
|
MD5 | bd2968dc7b628dd58abdf743982de86a |
|
BLAKE2b-256 | 5987bfc895716d56bd54eaaa65f9f6f039249782cbd1a9ea9fc4f7c2ff130e99 |
Hashes for glum-3.0.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a0421d9f37b83a3df6de91d06f9aa415ba3d568285898e9c267268e5134b167 |
|
MD5 | 128b22c2a511ae9493afe9b1d2c9899e |
|
BLAKE2b-256 | 829cfc2cc79d311aa6f28e12c4a887178b6ef752531540c8da4aea98fe03bcec |
Hashes for glum-3.0.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 563d723794d6223ae6b50500b2c091259eeee185ca541213d6b803fba0d664c5 |
|
MD5 | 061c70c7eeec968e05c157e3c5093ef5 |
|
BLAKE2b-256 | b69870f0e3ed9d78a9de6ddbd3e997579ffd6a6f1422bf0d7eac8e7ddb68badc |
Hashes for glum-3.0.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41f36de58e2db2cecc1e610756319e3bd9c86dd195b8fae433084a32c909c6b7 |
|
MD5 | 90769853244739cbeccb523535d0fc79 |
|
BLAKE2b-256 | ffd214c16d8e989056155b1d28ee989bb6e200feaae0210b8806b1c75eb8d087 |
Hashes for glum-3.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0901606b6ae4d80132cfcc2dc51eeb4d45c6de22d605ac8be0a83a3b775dd23c |
|
MD5 | c5568cc17ba050d0cf1c39654619abc0 |
|
BLAKE2b-256 | 115316220315264b1b24c205fb77fcce5e4fb6ea1e3ee919fef43d39a99da222 |
Hashes for glum-3.0.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82fd6260020851bd41acf7e5707dceea3eadf6565235323ffb791e8003e60fd5 |
|
MD5 | 04364e05185232cb9340158aaa866c7f |
|
BLAKE2b-256 | 2ca86044745e2dc0696d3336007471eb626156f01ac022c5788a247a8e4122cf |
Hashes for glum-3.0.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 441dd8ca7fd3e18c0b200c054d9895b5880d0f4261cd06d78fd964a9d5665a61 |
|
MD5 | 86d07d60698d0d2e484ad0bcf9d58fbd |
|
BLAKE2b-256 | 23fb292cd937d8919d5f146ee652cc5b93a555b38d90763640cf438875458f19 |
Hashes for glum-3.0.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47258ef10739a9e538e6a394581c78421b19212ed1c7170868ac5f7a7093f479 |
|
MD5 | 2d8396af0934b23c8ccb48ceab7d1a2c |
|
BLAKE2b-256 | 0fc0983f7f11074359938c880fff305ebc4dc8258c8df9e3e683a4f2ad909d9c |
Hashes for glum-3.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fcdaf3917711d5e7a19ccb97d4fd1a794f4f7a2c6cf29770204a6bdec36d414c |
|
MD5 | f550ca72790c6fc94d608c7afa35aba5 |
|
BLAKE2b-256 | 8ea8c87f815edefb186bc8b8ec3849ece2cf88d1958efbcf048cae68d768ef6b |
Hashes for glum-3.0.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2475ff19d21fed146f8b11d3ed3607c0b7b79ea1d163caa6bc963be025358c6 |
|
MD5 | 2f6428a99f61a90a5972c70a7c71a23b |
|
BLAKE2b-256 | a0382abdc3d69221dbc9d6c6077edf3e78fbaaf26c8c6a45ef61136cea49300e |
Hashes for glum-3.0.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51476ffcfec9c12557a5eed514fea7fd522e26a0665210dad7b3e0577bd18636 |
|
MD5 | b922de9542fbebb19eb605a3ff7f8152 |
|
BLAKE2b-256 | bda8daf960cdd013368d5b2b7bd3035ea443287a94a32d3f43a289a788e5ad82 |