Skip to main content

A collection of tools for analysing cmdspanpy output, written in Python

Project description

A Python library for analysing cmdstanpy output

This is a collection of function for analysing output of cmdstanpy library. The main idea is to do a quick data analysis by calling a single function that makes:

  • traceplots of samples,

  • text and plots of the summaries of model parameters,

  • histograms and pair plots of posterior distributions of parameters.

Picture of Tarpan

The only known illustration of a tarpan made from life, depicting a five month old colt (Borisov, 1841). Source: Wikimedia Commons.

Setup

First, install cmdstanpy library, and then do:

pip install tarpan

Usage

save_analysis

This is the main function of the library that analyses Stan's output samples and saves results in model_info directory.

from tarpan.cmdstanpy.analyse import save_analysis

model = CmdStanModel(stan_file="your_model.stan")
fit = model.sample(data=your_data)

# Create plots and summaries in `model_info` directory
save_analysis(fit, param_names=['mu', 'sigma'])

Sample code

See the example code of using save_analysis function. To run the example, download analyse.py and eight_schools.stan files into the same directory and run

python analyse.py

The function generates the following files in model_info/analyse directory:

1. Model's diagnostic info diagnostic.txt

Stan's diagnostic output. Usually, this is the first thing I look at, to see if there were any problems with sampling.

2. Text summary summary.txt

A table showing summaries of distributions for all parameters. The table's format is such that the text can be pasted in Github's Markdown file, like this:

Name Mean Std Mode + - 68CI- 68CI+ 95CI- 95CI+ N_Eff R_hat
mu 7.88 4.90 7.09 5.46 4.23 2.85 12.54 -1.46 18.19 2438 1.00
tau 6.58 5.64 2.16 5.72 2.16 0.00 7.88 0.00 17.44 1394 1.00
eta.1 0.40 0.94 0.46 0.89 0.94 -0.49 1.35 -1.46 2.32 3811 1.00
eta.2 -0.01 0.88 -0.05 0.84 0.86 -0.91 0.79 -1.82 1.76 4484 1.00

The summary columns are:

  • Name, Mean, Std are the name of the parameter, its mean and standard deviation.

  • 68CI-, 68CI+, 95CI-, 95CI+ are the 68% and 95% HPDIs (highest probability density intervals). These values are configurable, see below.

  • Mode, +, - is a mode of distribution with upper and lower uncertainties, which are calculated as distances to 68% HPDI.

  • N_Eff is Stan's number of effective samples, the higher the better.

  • R_hat is a Stan's parameter representing the quality of the sampling. This value needs to be smaller than 1.00. After generating a model I usually immediately look at this R_hat column to see if the sampling was good.

3. summary.csv

Same as summary.txt but in CSV format.

4. Summary tree plot summary.pdf

The plot of the summary. The error bars correspond to the 68% and 95% HPDIs:

Summary plot

5. Traceplots: traceplot_01.pdf, traceplot_02.pdf, traceplot_03.pdf

The traceplots of the samples for all parameters, the colors show samples from different chains. The first plot for "lp__" is the most important one, this is the traceplot of the log probability values. Usually, I look "lp__" traceplot to make sure the lines of different colors are well-mixed and are mostly concentrated at some value of the parameter, which means that the sampling converged.

Traceplot

6. Histograms: posterior_01.pdf, posterior_02.pdf, posterior_03.pdf

These are histograms that show distributions of values for all parameters:

Histograms of posterior distributions

7. Pair plot: pair_plot.pdf

The pair plot of parameter distributions. It help to see if there are correlations between parameters.

Pair plot of posterior distributions

Run unit tests

pytest

The unlicense

This work is in public domain.

🐴🐴🐴

This work is dedicated to Tarpan, an extinct subspecies of wild horse.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tarpan-0.1.8.tar.gz (15.4 kB view hashes)

Uploaded Source

Built Distribution

tarpan-0.1.8-py3-none-any.whl (19.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page