Skip to main content

An experimental Python library for advanced A/B testing analysis, leveraging statistical techniques and ML for deeper insights.

Project description

Variatio

PyPI version Python GitHub Issues License

Variatio is an experimental Python library designed for advanced A/B testing analysis. It leverages statistical techniques and machine learning, including variance reduction through CUPED and integration with CatBoost for predictive insights. Variatio is ideal for data scientists and researchers looking to obtain deeper insights from their A/B testing efforts.

Features

This tool streamlines A/B testing by automatically calculating all the classic metrics behind the scenes. Simply provide your data, and it will compute the metrics you request, delivering them in a visually appealing report. Here are some of the key features:

  • Statistical Significance Testing: Easily compare metrics between control and test groups, with all necessary calculations done for you.
  • CUPED Adjustments: This experimental extension utilizes Controlled-experiment Using Pre-Experiment Data (CUPED) techniques for variance reduction. It innovatively employs user properties as covariates and leverages CatBoost regression, moving beyond traditional linear approaches to enhance A/B test sensitivity.

Installation

pip install variatio

or install Variatio directly from the source:

git clone https://github.com/dmitry_brazhenko/Variatio.git
cd Variatio
pip install .

Data Preparation

To use Variatio effectively, you need to prepare three main types of data:

1. Event Data

The event_data is crucial for analysis in Variatio. It tracks user interactions and should include the following mandatory columns:

  • timestamp: The date and time when the event occurred.
  • userid: A unique identifier for the user who triggered the event.
  • event_name: The name of the event (e.g., 'login', 'purchase').

In addition to these mandatory columns, you can include optional attributes that provide additional details about each event. For example, purchase_value could be used to track the value of purchase events. These optional attributes can vary based on the event type and what you aim to analyze.

Event Data Example:

timestamp userid event_name purchase_value
2023-01-01 00:00:00 7 purchase 274
2023-01-01 01:00:00 4 purchase 175
2023-01-01 02:00:00 5 purchase 179
2023-01-01 03:00:00 7 purchase 102
2023-01-01 04:00:00 3 login 0

2. User Allocations Data Sample

The ab_test_allocations dataset is essential for understanding the distribution of users across different A/B test groups. It includes the following columns:

  • timestamp: The date and time when the user was attributed to a specific A/B test group. This helps track when each user started experiencing the variant they were allocated to.
  • userid: A unique identifier for the user.
  • abgroup: The A/B test group the user was allocated to. This typically represents the control group (e.g., 'A') and one or more test groups (e.g., 'B', 'C').

Example:

timestamp userid abgroup
2022-12-15 00:00:00 1 A
2022-12-15 01:00:00 2 B
2022-12-15 02:00:00 3 B
2022-12-15 03:00:00 4 B
2022-12-15 04:00:00 5 A

3. User Properties Data Sample (Optional)

This optional user_properties dataset can enhance the analysis with user demographic or behavioral data. The userid column must match the event_data and ab_test_allocations datasets.

userid age gender country device_type membership_status
1 56 Male USA Tablet Free
2 69 Female India Mobile Free
3 46 Male Australia Tablet Free
4 32 Female UK Tablet Free
5 60 Male Germany Tablet Free

Using VariatioAnalyzer

After preparing your datasets, you can use VariatioAnalyzer to perform A/B testing analysis. Initialize the analyzer with your datasets and specify the control group:

from variatio import VariatioAnalyzer
import pandas as pd

event_data = pd.DataFrame({
    "timestamp": pd.to_datetime(["2023-01-01 00:00:00", "2023-01-01 01:00:00", 
                                 "2023-01-01 02:00:00",
                  "2023-01-01 03:00:00", "2023-01-01 04:00:00"]),
    "userid": [7, 4, 5, 7, 3],
    "event_name": ["purchase", "purchase", "purchase", "purchase", "login"],
    "purchase_value": [274, 175, 179, 102, 0]
})

# Creating the user_allocations DataFrame
user_allocations = pd.DataFrame({
    "timestamp": pd.to_datetime(["2022-12-15 00:00:00", "2022-12-15 01:00:00", 
                                 "2022-12-15 02:00:00",
                  "2022-12-15 03:00:00", "2022-12-15 04:00:00"]),
    "userid": [1, 2, 3, 4, 5],
    "abgroup": ["A", "B", "B", "B", "A"]
})

# Creating the user_properties DataFrame
user_properties = pd.DataFrame({
    "userid": [1, 2, 3, 4, 5],
    "age": [56, 69, 46, 32, 60],
    "gender": ["Male", "Female", "Male", "Female", "Male"],
    "country": ["USA", "India", "Australia", "UK", "Germany"],
    "device_type": ["Tablet", "Mobile", "Tablet", "Tablet", "Tablet"],
    "membership_status": ["Free", "Free", "Free", "Free", "Free"]
})

analyzer = VariatioAnalyzer(event_data, user_allocations, "A", user_properties)

You can then calculate various metrics such as event count per user, attribute sum per user (useful for calculating metrics like ARPU), or conversion rates to specific events:

# Calculate the count of 'purchase' events per user
analyzer.calculate_event_count_per_user('purchase')

# Calculate the sum of 'purchase_value' for 'purchase' events per user
analyzer.calculate_event_attribute_sum_per_user('purchase', 'purchase_value')

# Calculate the conversion rate to 'login' events
analyzer.calculate_conversion('login')

Generating Reports

After calculating the desired metrics, you can save them to an HTML file for easy viewing:

analyzer.save_report("abtest_report.html")

Here's a sample report showcasing meticulously calculated metrics, organized in an easy-to-analyze table format.

Contribution Guidelines

We welcome contributions to Variatio! If you'd like to contribute, please follow these guidelines:

  • Fork the repository and create your feature branch.
  • Make sure your code adheres to the project's coding standards.
  • Submit a pull request with a detailed description of your changes.

License

Variatio is licensed under the MIT License. See LICENSE for more details.

Disclaimer

Variatio is shared with the community as an experimental library, offered "as-is" and without any warranties. Your explorations and tests with it are encouraged, but please proceed with awareness of its experimental nature. We're keen to hear about your experiences and eager to tackle any challenges you encounter. Should you have questions or face any issues, don't hesitate to open an issue in our repository. We welcome your feedback and contributions to make Variatio even better!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

variatio-0.0.16.tar.gz (173.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

variatio-0.0.16-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file variatio-0.0.16.tar.gz.

File metadata

  • Download URL: variatio-0.0.16.tar.gz
  • Upload date:
  • Size: 173.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for variatio-0.0.16.tar.gz
Algorithm Hash digest
SHA256 1de41d2dd76016af2d54789e1c9092bdfb02c4dde5c1677aacc92898b3d0fac3
MD5 01420992e730e224b83aa41e030eb0d8
BLAKE2b-256 daccfab184b06151f2f3ade5e9f6809ec3a4c8cbee9a55b3e66ac3b5a512c628

See more details on using hashes here.

File details

Details for the file variatio-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: variatio-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for variatio-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 0a21f0750ea7b962d66266da574f7decafef183e8c5940318abdc6bc130fba70
MD5 69a13d75995d098d03e5dde107033597
BLAKE2b-256 5cce5d943ab5173d6d0af1ca6adf342f86f06bdaa51d759cf3f5c97f77c87d79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page