CausalNLP: A Practical Toolkit for Causal Inference with Text

These details have not been verified by PyPI

Project links

Homepage

Project description

Welcome to CausalNLP

What is CausalNLP?

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or “controlled-for” variable.

Features

Low-code causal inference in as little as two commands
Out-of-the-box support for using text as a “controlled-for” variable (e.g., confounder)
Built-in Autocoder that transforms raw text into useful variables for causal analyses (e.g., topics, sentiment, emotion, etc.)
Sensitivity analysis to assess robustness of causal estimates
Quick and simple key driver analysis to yield clues on potential drivers of an outcome based on predictive power, correlations, etc.
Can easily be applied to “traditional” tabular datasets without text (i.e., datasets with only numerical and categorical variables)
Includes an experimental PyTorch implementation of CausalBert by Veitch, Sridar, and Blei (based on reference implementation by R. Pryzant)

Install

pip install -U pip
pip install causalnlp

NOTE: On Python 3.6.x, if you get a RuntimeError: Python version >= 3.7 required, try ensuring NumPy is installed before CausalNLP (e.g., pip install numpy==1.18.5).

Usage

To try out the examples yourself:

Example: What is the causal impact of a positive review on a product click?

import pandas as pd

df = pd.read_csv('sample_data/music_seed50.tsv', sep='\t', on_bad_lines='skip')

The file music_seed50.tsv is a semi-simulated dataset from here. Columns of relevance include: - Y_sim: outcome, where 1 means product was clicked and 0 means not. - text: raw text of review - rating: rating associated with review (1 through 5) - T_true: 0 means rating less than 3, 1 means rating of 5, where T_true affects the outcome Y_sim. - T_ac: an approximation of true review sentiment (T_true) created with Autocoder from raw review text - C_true:confounding categorical variable (1=audio CD, 0=other)

We’ll pretend the true sentiment (i.e., review rating and T_true) is hidden and only use T_ac as the treatment variable.

Using the text_col parameter, we include the raw review text as another “controlled-for” variable.

from causalnlp import CausalInferenceModel
from lightgbm import LGBMClassifier

cm = CausalInferenceModel(df, 
                         metalearner_type='t-learner', learner=LGBMClassifier(num_leaves=500),
                         treatment_col='T_ac', outcome_col='Y_sim', text_col='text',
                         include_cols=['C_true'])
cm.fit()

outcome column (categorical): Y_sim
treatment column: T_ac
numerical/categorical covariates: ['C_true']
text covariate: text
preprocess time:  1.1179866790771484  sec
start fitting causal inference model
time to fit causal inference model:  10.361494302749634  sec

Estimating Treatment Effects

CausalNLP supports estimation of heterogeneous treatment effects (i.e., how causal impacts vary across observations, which could be documents, emails, posts, individuals, or organizations).

We will first calculate the overall average treatment effect (or ATE), which shows that a positive review increases the probability of a click by 13 percentage points in this dataset.

Average Treatment Effect (or ATE):

print( cm.estimate_ate() )

{'ate': 0.1309311542209525}

Conditional Average Treatment Effect (or CATE): reviews that mention the word “toddler”:

print( cm.estimate_ate(df['text'].str.contains('toddler')) )

{'ate': 0.15559234254638685}

Individualized Treatment Effects (or ITE):

test_df = pd.DataFrame({'T_ac' : [1], 'C_true' : [1], 
                        'text' : ['I never bought this album, but I love his music and will soon!']})
effect = cm.predict(test_df)
print(effect)

[[0.80538201]]

Model Interpretability:

print( cm.interpret(plot=False)[1][:10] )

v_music    0.079042
v_cd       0.066838
v_album    0.055168
v_like     0.040784
v_love     0.040635
C_true     0.039949
v_just     0.035671
v_song     0.035362
v_great    0.029918
v_heard    0.028373
dtype: float64

Features with the v_ prefix are word features. C_true is the categorical variable indicating whether or not the product is a CD.

Text is Optional in CausalNLP

Despite the “NLP” in CausalNLP, the library can be used for causal inference on data without text (e.g., only numerical and categorical variables). See the examples for more info.

Documentation

API documentation and additional usage examples are available at: https://amaiya.github.io/causalnlp/

How to Cite

Please cite the following paper when using CausalNLP in your work:

@article{maiya2021causalnlp,
    title={CausalNLP: A Practical Toolkit for Causal Inference with Text},
    author={Arun S. Maiya},
    year={2021},
    eprint={2106.08043},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    journal={arXiv preprint arXiv:2106.08043},
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.8.1

Feb 6, 2025

0.8.0

Jun 15, 2024

0.8.dev0 pre-release

Jun 15, 2024

0.7.0

Aug 2, 2022

0.6.0

Oct 20, 2021

0.5.0

Sep 3, 2021

0.4.0

Jul 20, 2021

0.3.1

Jul 19, 2021

0.3.0

Jul 15, 2021

0.2.0

Jun 21, 2021

0.1.3

Jun 17, 2021

0.1.2

Jun 17, 2021

0.1.1

Jun 17, 2021

0.1.0

Jun 16, 2021

0.1.0b1 pre-release

Jun 15, 2021

0.1.0b0 pre-release

Jun 15, 2021

0.0.1b0 pre-release

Jun 14, 2021

0.0.1a0 pre-release

May 30, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causalnlp-0.8.1.tar.gz (63.3 kB view details)

Uploaded Feb 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

causalnlp-0.8.1-py3-none-any.whl (71.8 kB view details)

Uploaded Feb 6, 2025 Python 3

File details

Details for the file causalnlp-0.8.1.tar.gz.

File metadata

Download URL: causalnlp-0.8.1.tar.gz
Upload date: Feb 6, 2025
Size: 63.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for causalnlp-0.8.1.tar.gz
Algorithm	Hash digest
SHA256	`268e7ef29fc5b8311f017275b076611221a708fdbe369b6f72d21520d7d12cb5`
MD5	`fd7d04fb646b67cd171e35a7a291bf59`
BLAKE2b-256	`e878926b2a4f6836dcda18d3fd4bb3645126bd3639b4117b7bbdb9b5bc656136`

See more details on using hashes here.

File details

Details for the file causalnlp-0.8.1-py3-none-any.whl.

File metadata

Download URL: causalnlp-0.8.1-py3-none-any.whl
Upload date: Feb 6, 2025
Size: 71.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for causalnlp-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`75fc68cbf4d74ebbcd472188941e59b6879239601ee5e4f6c1bdb0830e6cb17c`
MD5	`faf7b75696413dbce6a9bf507e773e09`
BLAKE2b-256	`6bdabb2f591ad1667986174407ce9e9a430594798858883b2177408d91704878`

See more details on using hashes here.

causalnlp 0.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Welcome to CausalNLP

What is CausalNLP?

Features

Install

Usage

Example: What is the causal impact of a positive review on a product click?

Estimating Treatment Effects

Text is Optional in CausalNLP

Documentation

How to Cite

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes