Active Learning for treatment effect estimation

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language
- Python :: 3.10

Project description

Automatic Stopping for Batch-mode Experimentation

Created with nbdev by Zoltan Puha

Install

python -m pip install git+https://github.com/puhazoli/asbe

How to use

ASBE builds on the functional views of modAL, where an AL algorithm can be run by putting together pieces. You need the following ingredients: - an ITE estimator (ITEEstimator()), - an acquisition function, - and an assignment function. - Additionaly, you can add a stopping criteria to your model. If all the above are defined, you can construct an ASLearner, which will help you in the active learning process.

from asbe.base import *
from asbe.models import *
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
import numpy as np

N = 1000
X = np.random.normal(size = N*2).reshape((-1,2))
t = np.random.binomial(n = 1, p = 0.5, size = N)
y = np.random.binomial(n = 1, p = 1/(1+np.exp(X[:, 1]*2 + t*3)))
ite = 1/(1+np.exp(X[:, 1]*2 + t*3)) - 1/(1+np.exp(X[:, 1]*2))
a = BaseITEEstimator(LogisticRegression(solver="lbfgs"))
a.fit(X_training=X, t_training=t, y_training=y)

Learning actively

Similarly, you can create an BaseActiveLearner, for which you will initialize the dataset and set the preferred modeling options. Let’s see how it works: - we will use XBART to model the treatment effect with a one-model approach - we will use expected model change maximization - for that, we need an approximate model, we will use the SGDRegressor

You can call .fit() on the BaseActiveLearner, which will by default fit the training data supplied. To select new units from the pool, you just need to call the query() method, which will return the selected X and the query_ix of these units. BaseActiveLearner expects the n2 argument, which tells how many units are queried at once. For sequential AL, we can set this to 1. Additionally, some query strategies can require different treatment effect estimates - EMCM needs uncertainty around the ITE. We can explicitly tell the the BaseITEEstimator to return all the predicted treatment effects. Then, we can teach the newly acquired units to the learner, by calling the teach function. The score function provides an evaluation of the given learner.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from copy import deepcopy
import pandas as pd

X_train, X_test, t_train, t_test, y_train, y_test, ite_train, ite_test = train_test_split(
    X, t, y, ite,  test_size=0.8, random_state=1005)
ds = {"X_training": X_train,
     "y_training": y_train,
     "t_training": t_train,
     "ite_training": np.zeros_like(y_train),
     "X_pool": deepcopy(X_test), 
     "y_pool": deepcopy(y_test),
     "t_pool": deepcopy(t_test),
     "ite_pool" : np.zeros_like(y_test),
     "X_test": X_test,
     "y_test": y_test,
      "t_test": t_test,
      "ite_test": ite_test
     }
asl = BaseActiveLearner(estimator = BaseITEEstimator(model = RandomForestClassifier(),
                                         two_model=False),
                        acquisition_function=BaseAcquisitionFunction(),
                        assignment_function=BaseAssignmentFunction(),
                        stopping_function = None,
                        dataset=ds)
asl.fit()
X_new, query_idx = asl.query(no_query=10)
asl.teach(query_idx)
preds = asl.predict(asl.dataset["X_test"])
asl.score()

0.34842037641629464

asl = BaseActiveLearner(estimator = BaseITEEstimator(model = RandomForestClassifier(),
                                         two_model=True),
                        acquisition_function=[BaseAcquisitionFunction(),
                                             BaseAcquisitionFunction(no_query=20)],
                        assignment_function=BaseAssignmentFunction(),
                        stopping_function = None,
                        dataset=ds,
                        al_steps = 3)
resd = pd.DataFrame(asl.simulate(metric="decision"))

resd.plot()

<AxesSubplot:>

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language
- Python :: 3.10

Release history Release notifications | RSS feed

This version

0.1.4

Apr 5, 2024

0.1.3

Nov 20, 2023

0.1.2

Nov 17, 2023

0.1.1

Nov 10, 2023

0.1.0

Jun 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asbe-0.1.4.tar.gz (34.5 kB view hashes)

Uploaded Apr 5, 2024 Source

Built Distribution

asbe-0.1.4-py3-none-any.whl (33.7 kB view hashes)

Uploaded Apr 5, 2024 Python 3

Hashes for asbe-0.1.4.tar.gz

Hashes for asbe-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`48aa24e6d12fb9266fc03de6f8299d638534e234d99243757800e3812b5fe94a`
MD5	`281c0471def0cb0b0248febef276abe8`
BLAKE2b-256	`bd144bad926a3ff2aa4652388ccffafd0a767e6547420bccb6fb66e950e5b688`

Hashes for asbe-0.1.4-py3-none-any.whl

Hashes for asbe-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`48f84776d94cf1988e8fff2671febd86ab501c45f1376a2d439a7179a4ac7e38`
MD5	`56964590b6745250ccc8b7dbaee96900`
BLAKE2b-256	`657f959905a221e703aa053854005451fa0cf89e821d7b868482e83c5887caf0`