Very easy Bayesian regression using numpyro.
Project description
Shabadoo: very easy Bayesian regression.
"That's the worst name I ever heard."
Shabadoo is the worst kind of machine learning. It automates nothing; your models will not perform well and it will be your own fault.
Shabadoo is for people who want to do Bayesian regression but who do not want to write probabilistic programming code. You only need to assign priors to features and pass your pandas dataframe to a .fit()
/ .predict()
API.
Shabadoo runs on numpyro and is basically a wrapper around the numpyro Bayesian regression tutorial.
Quickstart
Install
pip install shabadoo
or
pip install git+https://github.com/nolanbconaway/shabadoo
Specifying a Shabadoo Bayesian model
Shabadoo was designed to make it as easy as possible to test ideas about features and their priors. Models are defined using a class which contains configuration specifying how the model should behave.
You need to define a new class which inherits from one of the Shabadoo models. Currently, Normal, Poisson, and Bernoulli are implemented.
import pandas as pd
from numpyro import distributions as dist
from shabadoo import Normal
# fake data
df = pd.DataFrame(dict(x=[1, 2, 2, 3, 4, 5], y=[1, 2, 3, 4, 3, 5]))
class Model(Normal):
dv = "y"
features = dict(
const=dict(transformer=1, prior=dist.Normal(0, 1)),
x=dict(transformer=lambda df: df.x, prior=dist.Normal(0, 1)),
)
The dv
attribute specifies the variable you are predicting. features
is a dictionary of dictionaries, with one item per feature. Above, two features are defined (const
and x
). Each feature needs a transformer
and a prior
.
The transformer specifies how to obtain the feature given a source dataframe. The prior specifies your beliefs about the model's coefficient for that feature.
Fitting & predicting the model
Shabadoo models implement the well-known .fit
/ .predict
api pattern.
model = Model().fit(df)
# sample: 100%|██████████| 1500/1500 [00:05<00:00, 282.76it/s, 7 steps of size 4.17e-01. acc. prob=0.88]
model.predict(df)
"""
0 1.309280
1 2.176555
2 2.176555
3 3.043831
4 3.911106
5 4.778381
"""
Inspecting the model
Shabadoo's model classes come with a number of model inspection methods. It should be easy to understsand your model's composition and with Shabadoo it is!
Print the model formula
The average and standard deviation of the MCMC samples are used to provide a rough sense of the coefficient in general.
print(model.formula)
"""
y = (
const * 0.44200(+-0.63186)
x * 0.86728(+-0.22604)
)
"""
Measure prediction accuracy.
The Model.metrics()
method is packed with functionality. You should not have to write a lot of code to evaluate your model's prediction accuracy!
Obtaining aggregate statistics is as easy as:
model.metrics(df)
{'r': 0.8646920305474705,
'rsq': 0.7476923076923075,
'mae': 0.5663623639121652,
'mape': 0.20985123644135573}
For per-point errors, use aggerrs=False
. A pandas dataframe will be returned that you can join on your source data using its index.
model.metrics(df, aggerrs=False)
"""
residual pe ape
0 -0.309280 -30.928012 30.928012
1 -0.176555 -8.827769 8.827769
2 0.823445 27.448154 27.448154
3 0.956169 23.904233 23.904233
4 -0.911106 -30.370198 30.370198
5 0.221619 4.432376 4.432376
"""
You can use grouped_metrics
to understand within-group errors. Under the hood, the predicted and actual dv
are groupby-aggregated (default sum) and metrics are computed within each group.
df["group"] = [1, 1, 1, 2, 2, 2]
model.grouped_metrics(df, 'group')
{'r': 1.0, 'rsq': 1.0, 'mae': 0.30214565559127315, 'mape': 0.03924585080786096}
model.grouped_metrics(df, "group", aggerrs=False)
"""
residual pe ape
group
1 -0.337609 -5.626818 5.626818
2 -0.266682 -2.222352 2.222352
"""
Saving and recovering a saved model
Shabadoo models have a from_samples
method which allows a model to be save and recovered exactly.
Samples from fitted models can be accessed using model.samples
and model.samples_df
.
model.samples['x']
"""
DeviceArray([0.65721655, 0.7644873 , 0.8724553 , 0.6285299 , 0.681262 ,
...
"""
model.samples_df.head()
"""
const x
0 0.689248 0.657217
1 0.524834 0.764487
2 1.093962 0.872455
3 1.253354 0.628530
4 1.021025 0.681262
"""
Use the samples to recover your model:
model_recovered = Model.from_samples(model.samples)
model_recovered.predict(df).equals(model.predict(df))
True
Model samples can be saved as JSON using model.samples_json
:
import json
with open('model.json', 'w') as f:
f.write(model.samples_json)
with open('model.json', 'r') as f:
model_recovered = Model.from_samples(json.load(f))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file shabadoo-0.0.2.1.tar.gz
.
File metadata
- Download URL: shabadoo-0.0.2.1.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13c778e6288ce10f4cf96ce68437ed3f3c4ce70b9ccfbe0801be8a52c010f525 |
|
MD5 | 89c208b6aae79996a4e4b66a848a2b07 |
|
BLAKE2b-256 | 514255bf10d8db0398c74da699cb7357db027a8b5080f0665cec0f2ac11aa643 |
File details
Details for the file shabadoo-0.0.2.1-py3-none-any.whl
.
File metadata
- Download URL: shabadoo-0.0.2.1-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78c97e7002825dff84b03f5899fb905434fb09d32d687887cb8f0a34c88f1567 |
|
MD5 | ebb1f4c5207c1eac0c2fa1c532d3edb9 |
|
BLAKE2b-256 | 1ae6234102871aaa372b3e2137f14d194514d07a9076a7c95349465dd7c1025b |