Skip to main content

Do causal inference more casually

Project description

ci PyPI version Downloads

casual_inference

The casual_inference is a Python package provides a simple interface to do causal inference. Doing causal analyses is a complicated stuff. We have to pay attention to many things to do such analyses properly. The casual_inference is developed aiming to reduce such effort.

Installation

pip install casual-inference

Overview

This package will provide several types of evaluator. They have evaluate() and some summary_xxx() methods. The evaluate() method evaluates treatment impact by calculating several statistics in it, and the summary_xxx() methods summarize such statistics in some ways. (e.g., table style, bar chart style, ...)

The evaluate() method expects that the data which has a schema like as follows will be passed.

unit variant metric_A metric_B ...
1 1 0 0.01 ...
2 1 1 0.05 ...
3 2 0 0.02 ...
... ... ... ... ...

The table has been already aggregated by the unit column. (i.e. The unit column should be the primary key)

Columns

  • unit: The unit you want to conduct analysis on. Typically it will be user_id, session_id, ... in the web service domain.
  • variant: The group of intervention. This package always assumes 1 is a variant of control group.
  • metrics: metrics you want to evaluate. e.g., The number of purchases, conversion rate, ...

Quick Start

The casual_inference supports not only the evaluation of normal A/B testing and A/A testing, but also advanced causal inference techniques.

A/B test evaluation

from casual_inference.dataset import create_sample_ab_result
from casual_inference.evaluator import ABTestEvaluator

data = create_sample_ab_result(n_variant=3, sample_size=1000000, simulated_lift=[-0.01, 0.01])

evaluator = ABTestEvaluator()
evaluator.evaluate(
    data=data,
    unit_col="rand_unit",
    variant_col="variant",
    metrics=["metric_bin", "metric_cont"]
)

evaluator.summary_plot()

eval_result

It diagnoses Sample Ratio Mismatch (SRM) automatically. When it detects the SRM, it'll display a warning on the output so that the Analyst can interpret the result carefully.

You can also see the example notebook to see more detailed example.

A/A test evaluation

from casual_inference.dataset import create_sample_ab_result
from casual_inference.evaluator import AATestEvaluator

data = create_sample_ab_result(n_variant=2, sample_size=1000000, simulated_lift=[0.0])

evaluator = AATestEvaluator()
evaluator.evaluate(
    data=data,
    unit_col="rand_unit",
    metrics=["metric_bin", "metric_cont"]
)

evaluator.summary_plot()

eval_result

You can also see the example notebook to see more detailed example.

Sample Size evaluation

from casual_inference.dataset import create_sample_ab_result
from casual_inference.evaluator import SampleSizeEvaluator

data = create_sample_ab_result(n_variant=2, sample_size=1000000)

evaluator = SampleSizeEvaluator()
evaluator.evaluate(
    data=data,
    unit_col="rand_unit",
    metrics=["metric_bin", "metric_cont"]
)

evaluator.summary_plot()

eval_result

You can also see the example notebook to see more detailed example.

Advanced causal inference techniques

It also supports advanced causal inference techniques.

  • Linear Regression

Another evaluation methods like Propensity Score Matching are planed to implement in the future.

References

  • Kohavi, Ron, Diane Tang, and Ya Xu. 2020. ​Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press. https://experimentguide.com/
    • A Great book covering comprehensive topics around practical A/B testing. I do recommend to read this book for all people who works on A/B testing.
  • Alex Deng, Ulf Knoblich, and Jiannan Lu. 2018. Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). Association for Computing Machinery, New York, NY, USA, 233–242. https://doi.org/10.1145/3219819.3219919
    • Describing how to approximate variance of relative difference, and when the analysis unit was more granular than the randomization unit.
  • Lucile Lu. 2016. Power, minimal detectable effect, and bucket size estimation in A/B tests. Twitter Engineering Blog. link
    • Describing Concept around Type I error and Type II error, Power Analysis. (Sample size calculation)
  • Aleksander Fabijan, Jayant Gupchup, Somit Gupta, Jeff Omhover, Wen Qin, Lukas Vermeer, and Pavel Dmitriev. 2019. Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb for Practitioners. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19). Association for Computing Machinery, New York, NY, USA, 2156–2164. https://doi.org/10.1145/3292500.3330722
    • Introduce Sample Ratio Mismatch (SRM) and describe various example of SRM happening, and provide taxonomy that help debugging when the SRM happened.
  • Shota Yasui. 2020. 効果検証入門. 技術評論社. https://gihyo.jp/book/2020/978-4-297-11117-5
    • A Great introduction book about practical causal inference technique written in Japanese.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

casual_inference-0.7.0.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

casual_inference-0.7.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file casual_inference-0.7.0.tar.gz.

File metadata

  • Download URL: casual_inference-0.7.0.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.15 Linux/6.8.0-1014-azure

File hashes

Hashes for casual_inference-0.7.0.tar.gz
Algorithm Hash digest
SHA256 4e8fa84bb8cc27971122a59e4ede7301ce006ca3508e3cc6323ba1922c59f2f1
MD5 e704c02698e97eed94bfe6a8dd4bbd67
BLAKE2b-256 dc41eae9c481ea593f021fc4b5c35b719e20295846a4584f815e9b80cbdd31f0

See more details on using hashes here.

File details

Details for the file casual_inference-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: casual_inference-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.15 Linux/6.8.0-1014-azure

File hashes

Hashes for casual_inference-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e05963a76cdaa7a2a66bd5e3d6288566af93ba8302da2b11307cf3da0a65a94
MD5 0f1c8b93c45716d0a36c1255ecb6c1ef
BLAKE2b-256 07e9f373a4767758ee387d7deb5505e1c85ce7985af25bdd4719058b608142b7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page