Skip to main content

A collection of tools for analysing cmdspanpy output, written in Python

Project description

A Python library for analysing cmdstanpy output

This is a collection of functions for analysing output of cmdstanpy library. The main idea is to do a quick data analysis by calling a single function that makes:

  • traceplots of samples,

  • text and plots of the summaries of model parameters,

  • histograms and pair plots of posterior distributions of parameters.

Picture of Tarpan

The only known illustration of a tarpan made from life, depicting a five month old colt (Borisov, 1841). Source: Wikimedia Commons.

Setup

First, run:

pip install tarpan

Finally, install cmdstan by running:

install_cmdstan

Complete analysis: save_analysis

This is the main function of the library that saves summaries and trace/pair/tree plots in model_info directory. The function is useful when you want to generate all types of summaries and plots at once.

from tarpan.cmdstanpy.analyse import save_analysis

model = CmdStanModel(stan_file="your_model.stan")
fit = model.sample(data=your_data)
save_analysis(fit, param_names=['mu', 'sigma'])

If you don't need everything, you can call individual functions described below to make just one type of plot or a summary.

Summary: save_summary

Creates a summary of parameter distributions and saves it in text and CSV files.

from tarpan.cmdstanpy.summary import save_summary

model = CmdStanModel(stan_file="your_model.stan")
fit = model.sample(data=your_data)
save_summary(fit, param_names=['mu', 'tau', 'eta.1'])

The text summary format is such that the text can be pasted into Github/Gitlab/Bitbucket's Markdown file, like this:

Name Mean Std Mode + - 68CI- 68CI+ 95CI- 95CI+ N_Eff R_hat
mu 8.05 5.12 7.53 4.63 4.59 2.93 12.16 -1.84 18.74 1540 1.00
tau 6.41 5.72 2.36 5.41 2.35 0.00 7.76 0.00 17.07 1175 1.00
eta.1 0.39 0.92 0.60 0.71 1.13 -0.53 1.31 -1.48 2.19 3505 1.00

Summary columns

  • Name, Mean, Std are the name of the parameter, its mean and standard deviation.

  • 68CI-, 68CI+, 95CI-, 95CI+ are the 68% and 95% HPDIs (highest probability density intervals). These values are configurable.

  • Mode, +, - is a mode of distribution with upper and lower uncertainties, which are calculated as distances to 68% HPDI.

  • N_Eff is Stan's number of effective samples, the higher the better.

  • R_hat is a Stan's parameter representing the quality of the sampling. This value needs to be smaller than 1.00. After generating a model I usually immediately look at this R_hat column to see if the sampling was good.

Tree plot: save_tree_plot

This function shows exactly the same information as save_summary, but in the form a plot. The markers are the modes of the distributions, and the two error bars indicate 68% and 95% HPDIs (highest posterior density intervals).

from tarpan.cmdstanpy.tree_plot import save_tree_plot

model = CmdStanModel(stan_file="your_model.stan")
fit = model.sample(data=your_data)
save_tree_plot([fit], param_names=['mu', 'sigma'])
Tree plot

Comparing multiple models on a tree plot

Supply multiple fits in order to compare parameters from multiple models.

from tarpan.cmdstanpy.tree_plot import save_tree_plot
from tarpan.shared.tree_plot import TreePlotParams

# Sample from two models
model1 = CmdStanModel(stan_file="your_model1.stan")
fit1 = model1.sample(data=your_data)
model2 = CmdStanModel(stan_file="your_model2.stan")
fit2 = model2.sample(data=your_data)

# Supply legend labels (optional)
tree_params = TreePlotParams()
tree_params.labels = ["Model 1", "Model 2", "Exact"]
data = [{ "mu": 2.2, "tau": 1.3 }]  # Add extra markers (optional)

save_tree_plot([fit1, fit2], extra_values=data, param_names=['mu', 'tau'],
               tree_params=tree_params)
Tree plot with multiple models

Trace plot: save_traceplot

The plot shows the values of parameters samples. Different colors correspond to samples form different chains. Ideally, the lines of different colors on the left plots are well mixed, and the right plot is fairly uniform.

from tarpan.cmdstanpy.traceplot import save_traceplot

model = CmdStanModel(stan_file="your_model.stan")
fit = model.sample(data=your_data)
save_traceplot(fit, param_names=['mu', 'tau', 'eta.1'])
Traceplot

Pair plot: save_pair_plot

The plot helps to see correlations between parameters and spot funnel shaped distributions that can result in sampling problems.

from tarpan.cmdstanpy.pair_plot import save_pair_plot
model = CmdStanModel(stan_file="your_model.stan")
fit = model.sample(data=your_data)
save_pair_plot(fit, param_names=['mu', 'tau', 'eta.1'])
Pair plot

Histogram: save_histogram

Show histograms of parameter distributions.

from tarpan.cmdstanpy.histogram import save_histogram
model = CmdStanModel(stan_file="your_model.stan")
fit = model.sample(data=your_data)
save_histogram(fit, param_names=['mu', 'tau', 'eta.1', 'theta.1'])
Histogram

Saving cmdstan samples to disk

It saves a lot of time to sample the model and save the results to disk, so they can be used on the next run instead of waiting for the sampling again. This can be done with run function:

from tarpan.cmdstanpy.cache import run

# Your function that creates CmdStanModel, runs its `sample` method
# and returns the result.
#
# This function must take `output_dir` input parameter and pass it to `sample`.
#
# It may also have any other parameters you wish to pass from `run`.
def run_stan(output_dir, other_param):
    model = CmdStanModel(stan_file="my_model.stan")

    fit = model.sample(
        data=data,
        output_dir=output_dir  # Pass to make CSVs in correct location
    )

    return fit  # Return the fit

# Will run `run_stan` once, save model to disk and read it on next calls
fit = run(func=run_stan, other_param="some data")

Scatter and KDE plot

The save_scatter_and_kde saves a scatter and corresponding KDE (kernel density estimate) plot. The KDE plot takes into account uncertainties of individual value:

from tarpan.plot.kde import gaussian_kde, save_scatter_and_kde

save_scatter_and_kde(values=[1, 1.3, 1.5, 7, 4.9],
                     uncertainties=[0.1, 0.6, 0.35, 0.41, 0.03])

There is gaussian_kde function available that returns the values for a KDE plot:

from tarpan.plot.kde import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 1, 100)
y = gaussian_kde(x, values, uncert)
plt.fill_between(x, y)

Common questions

Run unit tests

pytest

The unlicense

This work is in public domain.

🐴🐴🐴

This work is dedicated to Tarpan, an extinct subspecies of wild horse.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tarpan-0.2.8.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tarpan-0.2.8-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file tarpan-0.2.8.tar.gz.

File metadata

  • Download URL: tarpan-0.2.8.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.4

File hashes

Hashes for tarpan-0.2.8.tar.gz
Algorithm Hash digest
SHA256 bc737aafe1f36e74fbe0e33a20e106ca92a540aaac8924b4beee36bdb196174f
MD5 443f139d0f51226f3ecbf1a10ad89abe
BLAKE2b-256 64b46da0b0fa64a93df94b7b0d69a852375d02cc5f7c7fee4f6d171bf0e352c8

See more details on using hashes here.

File details

Details for the file tarpan-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: tarpan-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.4

File hashes

Hashes for tarpan-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 cbd79f41d10eab0162cb94bd0ba889fe1be7f4bf5251625da02795c86c97a58c
MD5 3d4471d71ff57dfd97d04c0b83993eb2
BLAKE2b-256 ec31fefe474af3c211602fe11f0e493eb294e465de9c51171e9e5b2e58864493

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page