Skip to main content

A fast implementation of bootstrapping supporting multi-columns data.

Project description

Strapping Build Status

Strapping is a library containing a fast implementation of bootstrapping sampling algorithm. Along the sampling algorithms you will find a set of helper functions used to compute basic statistics useful in bootstrapping-based analysis.

Library supports:

  • single variable sampling
  • multi-column variable sampling
  • A/B test difference sampling

Installing

Strapping can be installed via pip from PyPI.

pip install strapping

Testing

Tu run tests for the package use tox:

tox

Example

Sample single variable

In this example we will use a bootstrapping algorithm to sample a distribution of mean and std. deviation of the given dataset.

Sample means using bootstrapping

Import bootstrap and stats module.

  • bootstrap contains bootstrapping algorithms,
  • stats contains helpers for computing basic statistics (e.g. confidence intervals).
from strapping import bootstrap, stats

Generate sample data using normal distribution:

X = np.random.normal(0, 1, size=100).reshape(-1, 1)

Sample a vector containing possible means for given dataset:

mu_sampled = bootstrap.sample(X, iterations=1000, aggrfunc=np.mean)
std_sampled = bootstrap.sample(X, iterations=1000, aggrfunc=np.std)

We can check output values:

>>> np.mean(mu_sampled), np.mean(std_sampled)
(-0.028259915654785906, 1.0099170040429664)

Compute confidence intervals

Now we will compute confidence intervals based on sampled values. This works for both single values and multi-column variables. By default, confidence interval will three values: (5th quantile, mean, 95th quantile).

q05, mean, q95 = stats.confidence_intervals(mu_sampled)

We can check output values:

>>> q05
array([-0.15844911])

>>> mean
array([-0.01509199])

>>> q95
array([0.12659994])

Sample multi-column variables

In this example we will test using bootstrapping for data containing multiple columns.

Generate data containing multiple columns:

X = np.array([
    np.random.normal(0, 1, size=100),
    np.random.normal(10, 5, size=100),
    np.random.normal(-20, 5, size=100),
]).T

Import bootstrap module:

from strapping import bootstrap 

Sample mean for given dataset:

mu_sampled = bootstrap.sample(X, iterations=1000, aggrfunc=np.mean)

We can check output values:

>>> mu_sampled.mean(axis=0)
array([ -0.06588892,   9.97571153, -19.187514  ])

A/B test difference between two variables

In this example we will test using bootstrapping to sample a difference between two given datasets. Then, we will use sampled values to compute percentage confidence intervals for the difference.

Sample means using bootstrapping

Generate data containing multiple columns:

X1 = np.random.normal(5, 2, size=100).reshape(-1, 1)
X2 = np.random.normal(6, 2, size=100).reshape(-1, 1)

Import bootstrap and stats modules:

from strapping import bootstrap, stats 

Sample mean for given dataset:

mu_sampled = bootstrap.sample_diffs(X1, X2, iterations=1000, aggrfunc=np.mean)

We can check output values:

>>> mu_sampled.mean()
-1.2875678613575356

Compute confidence intervals

Now we will compute both confidence intervals and percentage confidence intervals based on sampled values.

>>> stats.confidence_intervals(mu_sampled)
(array([-1.77019123]), array([-1.28756786]), array([-0.79820009]))

Percentage confidence intervals are computed as a percentage difference between sampled values and the mean value of a provided reference (control dataset).

>>> stats.percentage_confidence_intervals(mu_sampled, X1.mean())
(array([-0.36300107]), array([-0.26403278]), array([-0.16368146]))

Other

Compute Cohen's d

Using strapping you can easily compute bootstrapped value of Cohen's d, which is often used for a metric of measuring the effect size.

To do so first compute the difference between two datasets:

diff_sampled = bootstrap.sample_diffs(X1, X2, iterations=1000, aggrfunc=np.mean)

Then, compute the pooled standard deviation using a helper function and finally compute Cohen's d value:

from strapping.stats import pooled_std
pstd = pooled_std(X1, X2)

cohensd = diff_sampled / pstd

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strapping-0.1.5.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

strapping-0.1.5-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file strapping-0.1.5.tar.gz.

File metadata

  • Download URL: strapping-0.1.5.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.5

File hashes

Hashes for strapping-0.1.5.tar.gz
Algorithm Hash digest
SHA256 87777f4cc6d5a64f784578aa350db96ea6ab078da1a6e17e1c0275c0d0d86978
MD5 76356a62bca7f334059db0257948b9a0
BLAKE2b-256 b221169791292b1f0ac9195fc6c5c12e8141f1c5ff791afc8e2040431181baa0

See more details on using hashes here.

File details

Details for the file strapping-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: strapping-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.5

File hashes

Hashes for strapping-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 563f23070d69fcd2df44fef143b24eacc09113f1a577459520886651c29cc21a
MD5 b6d90a1c2fbfab8adb07c4d6f1bdad0f
BLAKE2b-256 57e564e93e2fe7827e531d46f8dcac009748cbc85667672f60834571f92efe0a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page