Skip to main content

Generalized Linear Models in Python

Project description

Glimpy

CircleCI

glimpy is a Python module for fitting generalized linear models. It's based on the scikit-learn API to facilitate use with other scikit-learn tools (pipelines, cross-validation, etc.). Models are fit using the statsmodels package.

Installation

pip install git+https://github.com/KSafran/glimpy

Getting Started

Here is an example of a poisson GLM to help get you started

We will simulate an experiment where we want to determine how an individual's age and weight influence the number of hospital visits they can expect to have in a given year.

Start with basic imports and setup

>>> import numpy as np
>>> from scipy.stats import poisson
>>> from glimpy import GLM, Poisson
>>>
>>> np.random.seed(10)
>>> n_samples = 1000

Now we will simulate some data where observed individuals have ages ranging from 30 to 70, and weights normally distributed centered around 150 lbs.

>>> age = np.random.uniform(30, 70, n_samples)
>>> weight = np.random.normal(150, 20, n_samples)

Then we will have the expected number of hospital visits vary according to the following equation. We will sample from a poisson distribution with those means to get a sample of observed hospital visits

>>> expected_visits = np.exp(-10 + age * 0.05 + weight * 0.08)
>>> observed_visits = poisson.rvs(expected_visits)

Now we can fit a GLM object to try to recover the formula we specified above

>>> X = np.vstack([age, weight]).T
>>> y = observed_visits
>>> pglm = GLM(fit_intercept=True, family=Poisson())
>>> pglm.fit(X, y)
>>> print(pglm.summary())
                 Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:                      y   No. Observations:                 1000
Model:                            GLM   Df Residuals:                      997
Model Family:                 Poisson   Df Model:                            2
Link Function:                    log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -3619.1
Date:                Thu, 09 Jan 2020   Deviance:                       967.43
Time:                        22:31:35   Pearson chi2:                     961.
No. Iterations:                     6
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        -10.0132      0.020   -509.601      0.000     -10.052      -9.975
x1             0.0499      0.000    301.142      0.000       0.050       0.050
x2             0.0801      0.000    800.720      0.000       0.080       0.080
==============================================================================

The upshot of glimpy is that you can use easily use your favorite scikit-learn tools with glimpy GLMs. For example, you can use the scikit-learn cross_val_score

>>> from sklearn.model_selection import cross_val_score
>>> print(cross_val_score(pglm, X, y, cv=4))
[263.11969239 288.58713533 205.7032204  220.68304592]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glimpy-0.0.2.tar.gz (4.8 kB view details)

Uploaded Source

File details

Details for the file glimpy-0.0.2.tar.gz.

File metadata

  • Download URL: glimpy-0.0.2.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.4

File hashes

Hashes for glimpy-0.0.2.tar.gz
Algorithm Hash digest
SHA256 15a4b0581e1a6b2ca1ea9483fc5caeddc7cf10698fd86a65bf4b70d76813da0c
MD5 af1494e6f40d94f24e8432165f80cfb2
BLAKE2b-256 00e2174a766a1e8f74599edd1819e39a5f8b478042e66ed7a23dd87bca1b5e9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page