Python Software Foundation 20th Year Anniversary Fundraiser

Package for hypothesis testing in A/B-experiments

# abito

Python package for hypothesis testing. Suitable for using in A/B-testing software. Tested for Python >= 3.5. Based on numpy and scipy.

##### Features
1. Convenient interface to run significance tests.
2. Support of ratio-samples. Linearization included (delta-method).
3. Bootstrapping: can measure significance of any statistic, even quantiles. Multiprocessing is supported.
4. Ntile-bucketing: compress samples to get better performance.
5. Trim: get rid of heavy tails.

## Installation

``````pip install abito
``````

## Usage

The most powerful tool in this package is the Sample:

```import abito as ab
```

Let's draw some observations from Poisson distribution and initiate Sample instance from them.

```import numpy as np

observations = np.random.poisson(1, size=10**6)
sample = ab.sample(observations)
```

Now we can calculate any statistic in numpy-way.

```print(sample.mean())
print(sample.std())
print(sample.quantile(q=[0.05, 0.95]))
```

To compare with other sample we can use t_test or mann_whitney_u_test:

```observations_control = np.random.poisson(1.005, size=10**6)
sample_control = Sample(observations_control)

print(sample.t_test(sample_control))
print(sample.mann_whitney_u_test(sample_control))
```

### Bootstrap

Or we can use bootstrap to compare any statistic:

```sample.bootstrap_test(sample_control, stat='mean', n_iters=100)
```

To improve performance, it's better to provide observations in weighted form: unique values + counts. Or, we can compress samples, using built-in method:

```sample.reweigh(inplace=True)
sample_control.reweigh(inplace=True)
sample.bootstrap_test(sample_control, stat='mean', n_iters=10000)
```

Now bootstrap is working lightning-fast. To improve performance further you can set parameter n_threads > 1 to run bootstrapping using multiprocessing.

### Compress

```observations = np.random.normal(100, size=10**8)
sample = ab.sample(observations)

compressed = sample.compress(n_buckets=100, stat='mean')

%timeit sample.std()
%timeit compressed.std()
```

## Project details

This version 0.1.3 0.1.2 0.1.0 0.0.10 0.0.9 0.0.8 0.0.7 0.0.6 0.0.5 0.0.4 0.0.3 0.0.2 0.0.1