Fast and customizable framework for Causal Inference
Project description
HypEx: Advanced Causal Inference and AB Testing Toolkit
Introduction
HypEx (Hypotheses and Experiments) is a comprehensive library crafted to streamline the causal inference and AB testing processes in data analytics. Developed for efficiency and effectiveness, HypEx employs Rubin's Causal Model (RCM) for matching closely related pairs, ensuring equitable group comparisons when estimating treatment effects.
Boasting a fully automated pipeline, HypEx adeptly calculates the Average Treatment Effect (ATE), Average Treatment Effect on the Treated (ATT), and Average Treatment Effect on the Control (ATC). It offers a standardized interface for executing these estimations, providing insights into the impact of interventions across various population subgroups.
Beyond causal inference, HypEx is equipped with robust AB testing tools, including Difference-in-Differences ( Diff-in-Diff) and CUPED methods, to rigorously test hypotheses and validate experimental results.
Features
- Faiss KNN Matching: Utilizes Faiss for efficient and precise nearest neighbor searches, aligning with RCM for optimal pair matching.
- Data Filters: Built-in outlier and Spearman filters ensure data quality for matching.
- Result Validation: Offers multiple validation methods, including random treatment, feature, and subset validations.
- Data Tests: Incorporates SMD, KS, PSI, and Repeats tests to affirm the robustness of effect estimations.
- Feature Selection: Employs LGBM and Catboost feature selection to pinpoint the most impactful features for causal analysis.
- AB Testing Suite: Features a suite of AB testing tools for comprehensive hypothesis evaluation, including CUPED and CUPAC variance reduction methods with detailed reports.
- Stratification support: Stratify groups for nuanced analysis
- Weights support: Empower your analysis by assigning custom weights to features, enhancing the matching precision to suit your specific research needs
Warnings
Some functions in HypEx can facilitate solving specific auxiliary tasks but cannot automate decisions on experiment design. Below, we will discuss features that are implemented in HypEx but do not automate the design of experiments.
Note: For Matching, it's recommended not to use more than 7 features as it might result in the curse of dimensionality, making the results unrepresentative.
Installation
pip install -U hypex
Prefer the old version? You can still use it, but it won't receive updates:
pip install hypex==0.1.10
Quick start
Explore usage examples and tutorials here.
Matching example
from hypex.dataset import Dataset, InfoRole, TreatmentRole, TargetRole, DefaultRole, FeatureRole
from hypex import Matching
data = Dataset(
roles={
"user_id": InfoRole(int), # InfoRole for ID
"treat": TreatmentRole(int), # TreatmentRole is for identify user group (control or target)
"post_spends": TargetRole(float) # TargetRole for Target :)
},
data="data.csv",
default_role=FeatureRole(), # All remaining columns will be of type FeatureRole (searching for similar ones)
)
test = Matching() # Classic Matching (maha distance + full metrics)
test = Matching(metric="att") # Calc only ATT
test = Matching(distance="l2") # Choose distance here
result = test.execute(data)
result.resume # Resume of results
result.full_data # old df_matched. Wide df with pairs
result.indexes # Only indexed pairs (good for join)
More about Matching here
AA-test example
from hypex.dataset import Dataset, InfoRole, TreatmentRole, TargetRole, StratificationRole
from hypex import AATest
data = Dataset(
roles={
"user_id": InfoRole(int), # InfoRole for ID.
"pre_spends": TargetRole(), # TargetRole for check homogeneity
"post_spends": TargetRole(), # TargetRole for check homogeneity
"gender": StratificationRole(str) # StratificationRole for strata
}, data="data.csv",
)
aa = AATest(n_iterations=10)
res = aa.execute(data)
res.resume # Resume for all test
res.aa_score # AA score
res.best_split # The best homogeneity split
res.best_split_statistic # Statistics for best split
More about AA test here
AB-test example
from hypex.dataset import Dataset, InfoRole, TreatmentRole, TargetRole
from hypex import ABTest
data = Dataset(
roles={
"user_id": InfoRole(int), # InfoRole use for ID
"treat": TreatmentRole(), # TreatmentRole is for identify user group (control or target)
"pre_spends": TargetRole(), # Target for A/B(n) Tests
"post_spends": TargetRole(), # Target for A/B(n) Tests
}, data="data.csv",
)
test = ABTest() # Classic A/B test
test = ABTest(multitest_method="bonferroni") # A/Bn test with Bonferroni corrections
test = ABTest(additional_tests=['t-test', 'u-test', 'chi2-test']) # Use can choose tests
test = ABTest(cuped_features={'post_spends': 'pre_spends'}) # CUPED variance reduction
test = ABTest(cupac_features={'post_spends': ['pre_spends', 'feature1']}) # CUPAC variance reduction
result = test.execute(data)
result.resume # Resume of results
result.variance_reduction_report # Variance reduction report for CUPED/CUPAC
More about A/B test here
Documentation
For more detailed information about the library and its features, visit our documentation on ReadTheDocs.
If you want to learn more about the architecture of HypEx, check the schemes folder.
You'll find comprehensive guides and tutorials that will help you get started with HypEx, as well as detailed API documentation for advanced use cases.
Contributions
Join our vibrant community! For guidelines on contributing, reporting issues, or seeking support, please refer to our Contributing Guidelines.
More Information & Resources
Habr (ru) - discover how HypEx is revolutionizing causal
inference in various fields.
A/B testing seminar - Seminar in NoML about
matching and A/B testing
Matching with HypEx: Simple Guide -
Simple matching guide with explanation
Matching with HypEx: Grouping - Matching
with grouping guide
HypEx vs Causal Inference and DoWhy -
discover why HypEx is the best solution for causal inference
HypEx vs Causal Inference and DoWhy: part 2 -
discover why HypEx is the best solution for causal inference
Testing different libraries for the speed of matching
Visit this notebook ain Kaggle and estimate results by yourself.
| Group size | 32 768 | 65 536 | 131 072 | 262 144 | 524 288 | 1 048 576 | 2 097 152 | 4 194 304 |
|---|---|---|---|---|---|---|---|---|
| Causal Inference | 46s | 169s | None | None | None | None | None | None |
| DoWhy | 9s | 19s | 40s | 77s | 159s | 312s | 615s | 1 235s |
| HypEx with grouping | 2s | 6s | 16s | 42s | 167s | 509s | 1 932s | 7 248s |
| HypEx without grouping | 2s | 7s | 21s | 101s | 273s | 982s | 3 750s | 14 720s |
Join Our Community
Have questions or want to discuss HypEx? Join our Telegram chat and connect with the community and the developers.
Conclusion
HypEx stands as an indispensable resource for data analysts and researchers delving into causal inference and AB testing. With its automated capabilities, sophisticated matching techniques, and thorough validation procedures, HypEx is poised to unravel causal relationships in complex datasets with unprecedented speed and precision.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hypex-1.0.5.tar.gz.
File metadata
- Download URL: hypex-1.0.5.tar.gz
- Upload date:
- Size: 86.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45f79ae4eecd64a2abec9f77278b6b4ba1c08f3c05b956e8bad3e2283bec9984
|
|
| MD5 |
0d83849426a175cd73f198eb1cad3125
|
|
| BLAKE2b-256 |
baf3d6baf82f82bf31ae28b43b2720eb42ebdcd9f458d1eae8ec1d33c04be524
|
File details
Details for the file hypex-1.0.5-py3-none-any.whl.
File metadata
- Download URL: hypex-1.0.5-py3-none-any.whl
- Upload date:
- Size: 117.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3534a395c145ab8eb5b0eccaa43b07c75d0b462435278eddd6304820da5bc8e
|
|
| MD5 |
505409d1dcfb0723c1ba034f228ad44e
|
|
| BLAKE2b-256 |
4e239967910d204800443c0ecff04f38c617b25d54a720eb06d66b50b64c4c02
|