Skip to main content

Fast and customizable framework for Causal Inference

Project description

HypEx: Advanced Causal Inference and AB Testing Toolkit

Last release Telegram Pypi downloads Python versions Pypi downloads\month

Introduction

HypEx (Hypotheses and Experiments) is a comprehensive library crafted to streamline the causal inference and AB testing processes in data analytics. Developed for efficiency and effectiveness, HypEx employs Rubin's Causal Model (RCM) for matching closely related pairs, ensuring equitable group comparisons when estimating treatment effects.

Boasting a fully automated pipeline, HypEx adeptly calculates the Average Treatment Effect (ATE), Average Treatment Effect on the Treated (ATT), and Average Treatment Effect on the Control (ATC). It offers a standardized interface for executing these estimations, providing insights into the impact of interventions across various population subgroups.

Beyond causal inference, HypEx is equipped with robust AB testing tools, including Difference-in-Differences ( Diff-in-Diff) and CUPED methods, to rigorously test hypotheses and validate experimental results.

Features

  • Faiss KNN Matching: Utilizes Faiss for efficient and precise nearest neighbor searches, aligning with RCM for optimal pair matching.
  • Data Filters: Built-in outlier and Spearman filters ensure data quality for matching.
  • Result Validation: Offers multiple validation methods, including random treatment, feature, and subset validations.
  • Data Tests: Incorporates SMD, KS, PSI, and Repeats tests to affirm the robustness of effect estimations.
  • Feature Selection: Employs LGBM and Catboost feature selection to pinpoint the most impactful features for causal analysis.
  • AB Testing Suite: Features a suite of AB testing tools for comprehensive hypothesis evaluation.
  • Stratification support: Stratify groups for nuanced analysis
  • Weights support: Empower your analysis by assigning custom weights to features, enhancing the matching precision to suit your specific research needs

Warnings

Some functions in HypEx can facilitate solving specific auxiliary tasks but cannot automate decisions on experiment design. Below, we will discuss features that are implemented in HypEx but do not automate the design of experiments.

Note: For Matching, it's recommended not to use more than 7 features as it might result in the curse of dimensionality, making the results unrepresentative.

Feature Selection

Feature selection models the significance of features for the accuracy of target approximation. However, it does not rule out the possibility of overlooked features, the complex impact of features on target description, or the significance of features from a business logic perspective. The algorithm will not function correctly if there are data leaks.

Points to consider when selecting features:

  • Data leaks - these should not be present.
  • Influence on treatment distribution - features should not affect the treatment distribution.
  • The target should be describable by features.
  • All features significantly affecting the target should be included.
  • The business rationale of features.
  • The feature selection function can be useful for addressing these tasks, but it does not solve them nor does it absolve the user of the responsibility for their selection, nor does it justify it.

Link to ReadTheDocs

Random Treatment

Random Treatment algorithm randomly shuffles the actual treatment. It is expected that the treatment's effect on the target will be close to 0.

These method is not sufficiently accurate marker of a successful experiment.

Link to ReadTheDocs

Installation

pip install -U hypex

Quick start

Explore usage examples and tutorials here.

Matching example

from hypex import Matcher
from hypex.utils.tutorial_data_creation import create_test_data

# Define your data and parameters
df = create_test_data(rs=42, na_step=45, nan_cols=['age', 'gender'])

info_col = ['user_id']
outcome = 'post_spends'
treatment = 'treat'
model = Matcher(input_data=df, outcome=outcome, treatment=treatment, info_col=info_col)
results, quality_results, df_matched = model.estimate()

AA-test example

from hypex import AATest
from hypex.utils.tutorial_data_creation import create_test_data

data = create_test_data(rs=52, na_step=10, nan_cols=['age', 'gender'])

info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

experiment = AATest(info_cols=info_cols, target_fields=target)
results = experiment.process(data, iterations=1000)
results.keys()

AB-test example

from hypex import ABTest
from hypex.utils.tutorial_data_creation import create_test_data

data = create_test_data(rs=52, na_step=10, nan_cols=['age', 'gender'])

model = ABTest()
results = model.execute(
    data=data,
    target_field='post_spends',
    target_field_before='pre_spends',
    group_field='group'
)

model.show_beautiful_result()

Documentation

For more detailed information about the library and its features, visit our documentation on ReadTheDocs.

You'll find comprehensive guides and tutorials that will help you get started with HypEx, as well as detailed API documentation for advanced use cases.

Contributions

Join our vibrant community! For guidelines on contributing, reporting issues, or seeking support, please refer to our Contributing Guidelines.

More Information & Resources

Habr (ru) - discover how HypEx is revolutionizing causal inference in various fields.
A/B testing seminar - Seminar in NoML about matching and A/B testing
Matching with HypEx: Simple Guide - Simple matching guide with explanation
Matching with HypEx: Grouping - Matching with grouping guide
HypEx vs Causal Inference and DoWhy - discover why HypEx is the best solution for causal inference
HypEx vs Causal Inference and DoWhy: part 2 - discover why HypEx is the best solution for causal inference

Testing different libraries for the speed of matching

Visit this notebook ain Kaggle and estimate results by yourself.

Group size 32 768 65 536 131 072 262 144 524 288 1 048 576 2 097 152 4 194 304
Causal Inference 46s 169s None None None None None None
DoWhy 9s 19s 40s 77s 159s 312s 615s 1 235s
HypEx with grouping 2s 6s 16s 42s 167s 509s 1 932s 7 248s
HypEx without grouping 2s 7s 21s 101s 273s 982s 3 750s 14 720s

Join Our Community

Have questions or want to discuss HypEx? Join our Telegram chat and connect with the community and the developers.

Conclusion

HypEx stands as an indispensable resource for data analysts and researchers delving into causal inference and AB testing. With its automated capabilities, sophisticated matching techniques, and thorough validation procedures, HypEx is poised to unravel causal relationships in complex datasets with unprecedented speed and precision.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypex-0.1.8.tar.gz (63.9 kB view details)

Uploaded Source

Built Distribution

hypex-0.1.8-py3-none-any.whl (68.6 kB view details)

Uploaded Python 3

File details

Details for the file hypex-0.1.8.tar.gz.

File metadata

  • Download URL: hypex-0.1.8.tar.gz
  • Upload date:
  • Size: 63.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.15 Darwin/24.0.0

File hashes

Hashes for hypex-0.1.8.tar.gz
Algorithm Hash digest
SHA256 43a50bc197c80f1255649d435f627b7c3ed5c8fe6eb6f1853e322046da3d7b77
MD5 8e9b40882501fa2eb864be7598438bd4
BLAKE2b-256 a935bd7426d17bfd9ee19a57111fd7e448ea65bd6cca7b864e5c0e78b42628f8

See more details on using hashes here.

File details

Details for the file hypex-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: hypex-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 68.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.15 Darwin/24.0.0

File hashes

Hashes for hypex-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 afdfee2f89c9a4fc058145482fab83a33c2991806047940b4a7ea972ba405215
MD5 e74258816f0f78992424ff0d6a1de90d
BLAKE2b-256 84c648f7349518b6c80c38bd6fb46096d7701b2904d949826da74c51ba3371bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page