Uplift modeling integrated with scikit-learn

These details have not been verified by PyPI

Project description

libuplift

Uplift modeling package based on and integrated with `scikit-learn`.

Authors: Szymon Jaroszewicz, Krzysztof Rudaś

Design goals

The design goal of libuplift is to seamlessly integrate with scikit-learn and follow its conventions as closely as possible. It is possible to use model evaluation and tuning facilities from scikit-learn either directly or as thin wrappers provided by libuplift.

Features

A comprehensive collection of datasets for uplift modeling (we believe this is the most complete collection of randomized datasets)
- marketing and advertising datasets
- medical RTC datasets
Tight integration with scikit-learn: model evaluation routines can be used just as in scikit-learn
Meta-models: T/S/X Learners, transformed target learner

Getting started

To install libuplift simply use

pip install libuplift

or to get the latest version install directly from Github

pip install git+https://github.com/jszymon/uplift-sklearn

Let us now build an uplift model on the well known Hillstrom dataset. Begin with the necessary imports:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression

Now fetch the dataset and do basic preprocessing

from libuplift.datasets import fetch_Hillstrom
D = fetch_Hillstrom(as_frame=True)
trt = D.treatment
# encode categorical features, standardize numerical features
ct = ColumnTransformer([("ohe", OneHotEncoder(), list(D.categ_values.keys()))],
                       remainder=StandardScaler())
X = ct.fit_transform(D.data)
# keep only women's campaign
mask = ~(trt == 1)
X = X[mask]
y = D.target_visit[mask]
trt = (trt[mask] == 2)*1

By libuplift convention, treatments are denoted by successive integers with 0 indicating controls. Addtionally the special n_trt argument is passed to all methods to indicate the number of treatments (if n_trt is None it will be inferred automatically, but this may be unreliable and is discouraged).

Now, we're ready to fit an uplift model (TLearner in our case)

X_train, X_test, y_train, y_test, trt_train, trt_test = train_test_split(X, y, trt, train_size=0.7)
m = TLearnerUpliftClassifier(base_estimator=LogisticRegression())
m.fit(X_train, y_train, trt_train, n_trt=1)

and draw an uplift curve

import matplotlib.pyplot as plt
from libuplift.metrics import uplift_curve, area_under_uplift_curve

score = m.predict(X_test)[:,1]
print("AUUC=", area_under_uplift_curve(y_test, score, trt_test, n_trt=1))
cx, cy = uplift_curve(y_test, score, trt_test, n_trt=1)
plt.plot(cx, cy)
plt.plot([0,1], [0,cy[-1]], "k-")
plt.show()

An uplift curve

One can use cross_val_score and GridSearchCV to easily evaluate models or tune their parameters, just as one does in scikit-learn. The functions provided by libuplift are thin wrappers of original scikit-learn functions so they behave exactly the same as they would for standard classifiers.

# import those from libuplift instead of sklearn
from libuplift.model_selection import cross_val_score
from libuplift.model_selection import GridSearchCV

m1 = TLearnerUpliftClassifier(base_estimator=LogisticRegression())
m_cv1 = GridSearchCV(m1,
                     {"base_estimator__C":[1e-1,1,1e1,1e2,1e3]},
                     cv=3, n_jobs=-1)
# tune regularization of treatment/control models separately
m2 = TLearnerUpliftClassifier(base_estimator=[("model_c", LogisticRegression()),
                                              ("model_t", LogisticRegression())])
m_cv2 = GridSearchCV(m2,
                    {"model_c__C":[1e-1,1,1e1,1e2,1e3],
                    "model_t__C":[1e-1,1,1e1,1e2,1e3]},
                    cv=3, n_jobs=-1)

Now evaluate both models using crossvalidated Area Under Uplift Curve

auuc_m1 = np.mean(cross_val_score(m_cv1, X, y, trt, n_trt=1, cv=5, scoring="auuc"))
auuc_m2 = np.mean(cross_val_score(m_cv2, X, y, trt, n_trt=1, cv=5, scoring="auuc"))
print("crossval AUUC m1:", auuc_m1)
print("crossval AUUC m2:", auuc_m2)

Finally, do a permutation test and draw a learning curve. Again the functions below are thin wrappers of original scikit-learn functions so they accept the same set of parameters.

from libuplift.model_selection import permutation_test_score, learning_curve

score, permutation_scores, pv =\
    permutation_test_score(m, X, y, trt, n_trt=1, cv=3,
                           n_permutations=100, scoring="auuc",
                           verbose=10, n_jobs=-1)

fix, (ax0, ax1) = plt.subplots(ncols=2)
ax0.hist(permutation_scores, density=True, label=f"p-value={pv}")
ax0.axvline(score, color="r")
ax0.set_title("Permutation test")

train_sizes, train_scores, test_scores = learning_curve(m, X, y, trt, n_trt=1, scoring="auuc")

train_scores_mean = train_scores.mean(axis=1)
train_scores_std = train_scores.std(axis=1)
test_scores_mean = test_scores.mean(axis=1)
test_scores_std = test_scores.std(axis=1)
ax1.fill_between(train_sizes,
                 train_scores_mean - train_scores_std,
                 train_scores_mean + train_scores_std,
                 alpha=0.1, color='r')
ax1.plot(train_sizes, train_scores_mean, 'ro-', label="Train score")
ax1.fill_between(train_sizes,
                 test_scores_mean - test_scores_std,
                 test_scores_mean + test_scores_std,
                 alpha=0.1, color='g')
ax1.plot(train_sizes, test_scores_mean, 'go-', label="Test score")
ax1.legend()
ax1.yaxis.tick_right()
ax1.set_title("Learning curve")
plt.show()

An uplift curve

We can see that the model is significantly better than random guessing and optimal performance seems to be achieved aleady with 10000 training records.

Documenation

The documentation is available on Read the Docs

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1

Mar 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libuplift-0.1.tar.gz (110.0 kB view details)

Uploaded Mar 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

libuplift-0.1-py3-none-any.whl (136.7 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file libuplift-0.1.tar.gz.

File metadata

Download URL: libuplift-0.1.tar.gz
Upload date: Mar 5, 2026
Size: 110.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for libuplift-0.1.tar.gz
Algorithm	Hash digest
SHA256	`253b9857abfa87c17d388da70063aeaa277e104d1c00d62a34532c6dc85ec2e0`
MD5	`42e25a0d3ef35db0447c94e76cffca3e`
BLAKE2b-256	`fcb21f0de0359e309fc7f5657c4c6977849faca6c8ab413aa2bf3dbf6f05ee23`

See more details on using hashes here.

File details

Details for the file libuplift-0.1-py3-none-any.whl.

File metadata

Download URL: libuplift-0.1-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 136.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for libuplift-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a480e93f6c30a5ec3e66b91af247e5778fe3f26f2d32ce4b93e5bcc8bd29aa7`
MD5	`9ec716a8955c326a2f961421fde42541`
BLAKE2b-256	`79abe3b394a88836bb4302d4fa69ed1964a3762c9f5be48c97d4d4906973064a`

See more details on using hashes here.

libuplift 0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

libuplift

Uplift modeling package based on and integrated with `scikit-learn`.

Authors: Szymon Jaroszewicz, Krzysztof Rudaś

Design goals

Features

Getting started

Documenation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

libuplift 0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

libuplift

Uplift modeling package based on and integrated with scikit-learn.

Authors: Szymon Jaroszewicz, Krzysztof Rudaś

Design goals

Features

Getting started

Documenation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Uplift modeling package based on and integrated with `scikit-learn`.