High performance Python GLMs with all the features!
Project description
glum
Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed glum
, a fast Python-first GLM library. The development was based on a fork of scikit-learn, so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!
The goal of glum
is to be at least as feature-complete as existing GLM libraries like glmnet
or h2o
. It supports
- Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”
- L1 regularization, which produces sparse and easily interpretable solutions
- L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects
- Elastic net regularization
- Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions
- Box constraints, linear inequality constraints, sample weights, offsets
This repo also includes tools for benchmarking GLM implementations in the glum_benchmarks
module. For details on the benchmarking, see here. Although the performance of glum
relative to glmnet
and h2o
depends on the specific problem, we find that when N >> K (there are more observations than predictors), it is consistently much faster for a wide range of problems.
For more information on glum
, including tutorials and API reference, please see the documentation.
Why did we choose the name glum
? We wanted a name that had the letters GLM and wasn't easily confused with any existing implementation. And we thought glum sounded like a funny name (and not glum at all!). If you need a more professional sounding name, feel free to pronounce it as G-L-um. Or maybe it stands for "Generalized linear... ummm... modeling?"
A classic example predicting housing prices
>>> from sklearn.datasets import fetch_openml
>>> from glum import GeneralizedLinearRegressor
>>>
>>> # This dataset contains house sale prices for King County, which includes
>>> # Seattle. It includes homes sold between May 2014 and May 2015.
>>> house_data = fetch_openml(name="house_sales", version=3, as_frame=True)
>>>
>>> # Use only select features
>>> X = house_data.data[
... [
... "bedrooms",
... "bathrooms",
... "sqft_living",
... "floors",
... "waterfront",
... "view",
... "condition",
... "grade",
... "yr_built",
... "yr_renovated",
... ]
... ].copy()
>>>
>>>
>>> # Model whether a house had an above or below median price via a Binomial
>>> # distribution. We'll be doing L1-regularized logistic regression.
>>> price = house_data.target
>>> y = (price < price.median()).values.astype(int)
>>> model = GeneralizedLinearRegressor(
... family='binomial',
... l1_ratio=1.0,
... alpha=0.001
... )
>>>
>>> _ = model.fit(X=X, y=y)
>>>
>>> # .report_diagnostics shows details about the steps taken by the iterative solver
>>> diags = model.get_formatted_diagnostics(full_report=True)
>>> diags[['objective_fct']]
objective_fct
n_iter
0 0.693091
1 0.489500
2 0.449585
3 0.443681
4 0.443498
5 0.443497
Installation
Please install the package through conda-forge:
conda install glum -c conda-forge
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for glum-2.1.2-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 354cb42031fafdbc82f95ed3b0e6c7275368b392275ac87af92a0bc99622353a |
|
MD5 | bee562ee3985feeaaed150960d944f66 |
|
BLAKE2b-256 | f1f6c0a30f07a9f4811302c19681cd9d5de83088d2592b6c2abbb8ee3c6ef2a6 |
Hashes for glum-2.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 005ebdedab3d6c533f07fbb74568a70cf517b98ad6b408144b4c0d0c6f10477a |
|
MD5 | 760a6a7b08646733c31a058af3dd9a5c |
|
BLAKE2b-256 | 25a4117b938b1437c362d96fdf13f55c309347e0e858c3c3d0e9dc36839d4966 |
Hashes for glum-2.1.2-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | afb954cbd6d21b2ac645dbbabd852ca965f693c149675085aac4b2cdd49f49b8 |
|
MD5 | 57b0888aa497a4cd871471550d355309 |
|
BLAKE2b-256 | 41a4af1997c08496002d7336ec4bd6ed260582311607af3e47ba85e2d016347c |
Hashes for glum-2.1.2-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2d67e90717dcc86e7aeefd5d090e1442e008214d91421fd7065459aa6547d09 |
|
MD5 | ebc042211be062e14c6ebfbc06359775 |
|
BLAKE2b-256 | 7a447665dfb9f826ab5044619450ef503d2306c38adeb1d7beddbb0192dfcba6 |
Hashes for glum-2.1.2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07e339bd1106d44311bc2cb21987240fcf020dd29e9fa85d51fb09d02eb90cb8 |
|
MD5 | 378772c46000fa55ee81c45e143cf178 |
|
BLAKE2b-256 | d274da5bd1d238ff6c36aae156c139b97e22989cb3b5aaae0b8a0a8352223949 |
Hashes for glum-2.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 577328d6c731703444ef61ec412fec32c60b83fb6c8544cdecfaf322389ca4f6 |
|
MD5 | 52a34415e08ff2610885c60046088894 |
|
BLAKE2b-256 | 299f85043c60703b3c6a9e3720e2706e172b8c0e8a1a78bdcd6d6786e17c3bda |
Hashes for glum-2.1.2-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f53c74d3cbf39bce4267799652890139c3f19289f065c6c46ff46ec172e4632d |
|
MD5 | b255278b4a375fc7e6f0b9e9ffb23f08 |
|
BLAKE2b-256 | a7284223eb3422e3b62302615d6f23292547dc52d8979c5d8a8b02737e592381 |
Hashes for glum-2.1.2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36b1fc3409d46d9fa0271bf07891409514dc50826804af1f26406a69c86a867b |
|
MD5 | 36d0ea38130195f1e55e6845c419dd76 |
|
BLAKE2b-256 | a8fa8271d3ef86d4fbe13a0687f150e1952202cfbbcc7c9178a574f4936476d6 |
Hashes for glum-2.1.2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f653576ead0c18f1872856d733659bc6f584af86060ff898398d1b9d80475ca4 |
|
MD5 | 8bae4b7d14e571ff6c2eb3bdcbdf805e |
|
BLAKE2b-256 | 39870d704a14b502ac60f41d8d2d2d4e282d32ebbe513fd296348c2fb844d40c |
Hashes for glum-2.1.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f01be31b3a2fd6c1c9d5f497f3c52faa41033c93f296ff2bc1ae959db5eb5941 |
|
MD5 | 58d7da3328c293eac4465759046bf3f8 |
|
BLAKE2b-256 | ada3db6bc7cbe4cb0cbd978823ff740d010773354506824f217ae3b2fb67d375 |
Hashes for glum-2.1.2-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 442f072cef9edc5fe4ee12c7bb85024c0e3d24b617133c14f1bd50e1343a43b4 |
|
MD5 | 8ca873d251438e2e6984663ed805068a |
|
BLAKE2b-256 | ad477c11bed23f065376732b3be6754c4753b56c241a97fe00ffda71dd3cb45c |
Hashes for glum-2.1.2-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 450351014ea5e06673ff7b6a168696cdf35970f67b6df50d419048059bb84b13 |
|
MD5 | 0dd6f460c6ac586ea58a37822b5e7d7e |
|
BLAKE2b-256 | b9aca37495bc4f15a447e17ab7567862e692db88fcc1c2e9b7a7ab8b338a4aa8 |
Hashes for glum-2.1.2-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e313ccf3464bb9f6b5d79c16b5b3a1a2c3a91767ee60234b8e4b66bff33424cf |
|
MD5 | dd79a5d25aef362f2da9c363cbf368c9 |
|
BLAKE2b-256 | 2ebe0123fa92d30bccbf210a78d0fe198ddf35638de55ba6c922d3c0170f99cd |
Hashes for glum-2.1.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e8406317a51e35c3b1008d2008c733d04c1e85b210245f66df79ef7f6a644b6 |
|
MD5 | 01deb56cdb784cea8c5f9c3f3f413486 |
|
BLAKE2b-256 | 511bc9dca18240d5fd965608525503153324731671050f329f9bed8499dcfc85 |
Hashes for glum-2.1.2-cp36-cp36m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c69b31c96909898df4cd6394cab88e3925d8e6e8ad3560b49e10c2f316f995a4 |
|
MD5 | f24b84e2c6f75601a1916e9aac50e8cd |
|
BLAKE2b-256 | 81a0fd473e7c66727d2edf2f69ed618046abfc341a0bdc0a95a96751dd5ccef3 |
Hashes for glum-2.1.2-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 792c70eb614f86a899f6f89beb2047f66f9c35c3798cc0c4c2b02c43b3728276 |
|
MD5 | e616e762d30ea32142b0655bbb65b66d |
|
BLAKE2b-256 | fba0d71973457dbc85aef32aae0ea39248db29439cabd0cb08f892ee00c95a9c |