Generates A/B test groups
Project description
ABSplit
Split your data into matching A/B/n groups
Table of Contents
About the project
ABSplit is a python package that uses a genetic algorithm to generate as equal as possible A/B, A/B/C, or A/B/n test splits.
The project aims to provide a convenient and efficient way for splitting population data into distinct groups (ABSplit), as well as and finding matching samples that closely resemble a given original sample (Match).
Whether you have static population data or time series data, this Python package simplifies the process and allows you to analyze and manipulate your population data.
This covers the following use cases:
- ABSplit class: Splitting an entire population into n groups by given proportions
- Match class: Finding a matching group in a population for a given sample
Getting Started
Use the package manager pip to install ABSplit and it's prerequisites.
ABSplit requires pygad==3.0.1
Installation
pip install absplit
Tutorials
Please see this colab for a range of examples on how to use ABSplit and Match
Do it yourself
See this colab to learn how ABSplit works under the hood, and how to build your own group splitting tool using PyGAD,
Usage
from absplit import ABSplit
import pandas as pd
import datetime
import numpy as np
# Synthetic data
data_dct = {
'date': [datetime.date(2030,4,1) + datetime.timedelta(days=x) for x in range(3)]*5,
'country': ['UK'] * 15,
'region': [item for sublist in [[x]*6 for x in ['z', 'y']] for item in sublist] + ['x']*3,
'city': [item for sublist in [[x]*3 for x in ['a', 'b', 'c', 'd', 'e']] for item in sublist],
'metric1': np.arange(0, 15, 1),
'metric2': np.arange(0, 150, 10)
}
df = pd.DataFrame(data_dct)
# Identify which columns are metrics, which is the time period, and what to split on
kwargs = {
'metrics': ['metric1', 'metric2'],
'date_col': 'date',
'splitting': 'city'
}
# Initialise
ab = ABSplit(
df=df,
split=[.5, .5], # Split into 2 groups of equal size
**kwargs,
)
# Generate split
ab.run()
# Visualise generation fitness
ab.fitness()
# Visualise data
ab.visualise()
# Extract results
df = ab.results
API Reference
Absplit
ABSplit(df, metrics, splitting, date_col=None, ga_params={}, metric_weights={}, splits=[0.5, 0.5], size_penalty=0)
Splits population into 2 groups. Mutually exclusive, completely exhaustive
Arguments:
df
(pd.DataFrame): Dataframe to be splitmetrics
(str, list): Name of, or list of names of, metric columns in DataFramesplitting
(str): Name of column that represents individuals in the population that is getting splitdate_col
(str, optional): Name of column that represents time periods, if applicable. If left empty, it will perform a static split, i.e. not across timeseries, (defaultNone
)ga_params
(dict, optional): Parameters for the genetic algorithmpygad.GA
module parameters, see here for arguments you can pass (default:{}
)splits
(list, optional): How many groups to split into, and relative size of the groups (default:[0.5, 0.5]
, 2 groups of equal size)size_penalty
(float, optional): Penalty weighting for differences in the population count between groups (default:0
)metric_weights
(dict, optional): Weights for each metric in the data. If you want the splitting to focus on one metrics more than the other, you can prioritise this here (default:{}
)
Match
Match(population, sample, metrics, splitting, date_col=None, ga_params={}, metric_weights={})
Takes DataFrame sample
and finds a comparable group in population
.
Arguments:
population
(pd.DataFrame): Population to search for comparable group (Must exclude sample data)sample
(pd.DataFrame): Sample we are looking to find a match for.metrics
(str, list): Name of, or list of names of, metric columns in DataFramesplitting
(str): Name of column that represents individuals in the population that is getting splitdate_col
(str, optional): Name of column that represents time periods, if applicable. If left empty, it will perform a static split, i.e. not across timeseries, (defaultNone
)ga_params
(dict, optional): Parameters for the genetic algorithmpygad.GA
module parameters, see here for arguments you can pass (default:{}
)splits
(list, optional): How many groups to split into, and relative size of the groups (default:[0.5, 0.5]
, 2 groups of equal size)metric_weights
(dict, optional): Weights for each metric in the data. If you want the splitting to focus on one metrics more than the other, you can prioritise this here (default:{}
)
Contributing
I welcome contributions to ABSplit! For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for absplit-1.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f46d4ed5e2afdff67d15c67f068f68e5ead5512b930c5f8ad44aa387f7cbdbd9 |
|
MD5 | a818aba15967784adc38763f2aec9c3e |
|
BLAKE2b-256 | 37b28097966a1add8048680217334cda1945cfbfce2cddb4f9f240a3593d09ed |