Skip to main content

Very easy Bayesian regression using numpyro.

Project description

Shabadoo: very easy Bayesian regression.

Imgur

"That's the worst name I ever heard."

badge codecov PyPI - Python Version PyPI

Shabadoo is the worst kind of machine learning. It automates nothing; your models will not perform well and it will be your own fault.

Shabadoo is for people who want to do Bayesian regression but who do not want to write probabilistic programming code. You only need to assign priors to features and pass your pandas dataframe to a .fit() / .predict() API.

Shabadoo runs on numpyro and is basically a wrapper around the numpyro Bayesian regression tutorial.

Quickstart

Install

pip install shabadoo

or

pip install git+https://github.com/nolanbconaway/shabadoo

Specifying a Shabadoo Bayesian model

Shabadoo was designed to make it as easy as possible to test ideas about features and their priors. Models are defined using a class which contains configuration specifying how the model should behave.

You need to define a new class which inherits from one of the Shabadoo models. Currently, Normal, Poisson, and Bernoulli are implemented.

import pandas as pd
from numpyro import distributions as dist
from shabadoo import Normal

# fake data
df = pd.DataFrame(dict(x=[1, 2, 2, 3, 4, 5], y=[1, 2, 3, 4, 3, 5]))

class Model(Normal):
    dv = "y"
    features = dict(
        const=dict(transformer=1, prior=dist.Normal(0, 1)),
        x=dict(transformer=lambda df: df.x, prior=dist.Normal(0, 1)),
    )

The dv attribute specifies the variable you are predicting. features is a dictionary of dictionaries, with one item per feature. Above, two features are defined (const and x). Each feature needs a transformer and a prior.

The transformer specifies how to obtain the feature given a source dataframe. The prior specifies your beliefs about the model's coefficient for that feature.

Fitting & predicting the model

Shabadoo models implement the well-known .fit / .predict api pattern.

model = Model().fit(df)
# sample: 100%|██████████| 1500/1500 [00:05<00:00, 282.76it/s, 7 steps of size 4.17e-01. acc. prob=0.88]

model.predict(df)

"""
0    1.309280
1    2.176555
2    2.176555
3    3.043831
4    3.911106
5    4.778381
"""

Inspecting the model

Shabadoo's model classes come with a number of model inspection methods. It should be easy to understsand your model's composition and with Shabadoo it is!

Print the model formula

The average and standard deviation of the MCMC samples are used to provide a rough sense of the coefficient in general.

print(model.formula)

"""
y = (
	const * 0.44200(+-0.63186)
	x * 0.86728(+-0.22604)
)
"""

Measure prediction accuracy.

The Model.metrics() method is packed with functionality. You should not have to write a lot of code to evaluate your model's prediction accuracy!

Obtaining aggregate statistics is as easy as:

model.metrics(df)

{'r': 0.8646920305474705,
 'rsq': 0.7476923076923075,
 'mae': 0.5663623639121652,
 'mape': 0.20985123644135573}

For per-point errors, use aggerrs=False. A pandas dataframe will be returned that you can join on your source data using its index.

model.metrics(df, aggerrs=False)

"""
   residual         pe        ape
0 -0.309280 -30.928012  30.928012
1 -0.176555  -8.827769   8.827769
2  0.823445  27.448154  27.448154
3  0.956169  23.904233  23.904233
4 -0.911106 -30.370198  30.370198
5  0.221619   4.432376   4.432376
"""

You can use grouped_metrics to understand within-group errors. Under the hood, the predicted and actual dv are groupby-aggregated (default sum) and metrics are computed within each group.

df["group"] = [1, 1, 1, 2, 2, 2]
model.grouped_metrics(df, 'group')

{'r': 1.0, 'rsq': 1.0, 'mae': 0.30214565559127315, 'mape': 0.03924585080786096}
model.grouped_metrics(df, "group", aggerrs=False)

"""
       residual        pe       ape
group                              
1     -0.337609 -5.626818  5.626818
2     -0.266682 -2.222352  2.222352
"""

Saving and recovering a saved model

Shabadoo models have a from_samples method which allows a model to be save and recovered exactly.

Samples from fitted models can be accessed using model.samples and model.samples_df.

model.samples['x']
"""
DeviceArray([0.65721655, 0.7644873 , 0.8724553 , 0.6285299 , 0.681262  ,
...
"""

model.samples_df.head()
"""
      const         x
0  0.689248  0.657217
1  0.524834  0.764487
2  1.093962  0.872455
3  1.253354  0.628530
4  1.021025  0.681262
"""

Use the samples to recover your model:

model_recovered = Model.from_samples(model.samples)

model_recovered.predict(df).equals(model.predict(df))
True

Model samples can be saved as JSON using model.samples_json:

import json

with open('model.json', 'w') as f:
    f.write(model.samples_json)

with open('model.json', 'r') as f:
    model_recovered = Model.from_samples(json.load(f))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shabadoo-0.0.2.1.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

shabadoo-0.0.2.1-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file shabadoo-0.0.2.1.tar.gz.

File metadata

  • Download URL: shabadoo-0.0.2.1.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for shabadoo-0.0.2.1.tar.gz
Algorithm Hash digest
SHA256 13c778e6288ce10f4cf96ce68437ed3f3c4ce70b9ccfbe0801be8a52c010f525
MD5 89c208b6aae79996a4e4b66a848a2b07
BLAKE2b-256 514255bf10d8db0398c74da699cb7357db027a8b5080f0665cec0f2ac11aa643

See more details on using hashes here.

File details

Details for the file shabadoo-0.0.2.1-py3-none-any.whl.

File metadata

  • Download URL: shabadoo-0.0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for shabadoo-0.0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 78c97e7002825dff84b03f5899fb905434fb09d32d687887cb8f0a34c88f1567
MD5 ebb1f4c5207c1eac0c2fa1c532d3edb9
BLAKE2b-256 1ae6234102871aaa372b3e2137f14d194514d07a9076a7c95349465dd7c1025b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page