A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Healthcare Industry
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

AutoPrognosis - A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.

:key: Features

:rocket:Automatically learns ensembles of pipelines for classification or survival analysis.
:cyclone: Easy to extend pluginable architecture.
:fire: Interpretability tools.

:rocket:Installation

Using pip

The library can be installed from PyPI using

$ pip install autoprognosis

or from source, using

$ pip install .

Redis (Optional, but recommended)

AutoPrognosis can use Redis as a backend to improve the performance and quality of the searches.

For that, install the redis-server package following the steps described on the official site.

:boom: Sample Usage

More advanced use cases can be found on our tutorials section.

List the available classifiers

from autoprognosis.plugins.prediction.classifiers import Classifiers
print(Classifiers().list_available())

Create a study for classifiers

from pathlib import Path

from sklearn.datasets import load_breast_cancer

from autoprognosis.studies.classifiers import ClassifierStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator


X, Y = load_breast_cancer(return_X_y=True, as_frame=True)

df = X.copy()
df["target"] = Y

workspace = Path("workspace")
study_name = "example"

study = ClassifierStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=100,  # how many trials to do for each candidate
    timeout=60,  # seconds
    classifiers=["logistic_regression", "lda", "qda"],
    workspace=workspace,
)

study.run()

output = workspace / study_name / "model.p"
model = load_model_from_file(output)

metrics = evaluate_estimator(model, X, Y)

print(f"model {model.name()} -> {metrics['clf']}")

List available survival analysis estimators

from autoprognosis.plugins.prediction.risk_estimation import RiskEstimation
print(RiskEstimation().list_available())

Survival analysis study

# stdlib
import os
from pathlib import Path

# third party
import numpy as np
from pycox import datasets

# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]

X = df.drop(columns = ["duration"])
T = df["duration"]
Y = df["event"]

eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]

workspace = Path("workspace")
study_name = "example_risks"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_event="duration",
    time_horizons=eval_time_horizons,
    num_iter=10,
    num_study_iter=1,
    timeout=10,
    risk_estimators=["cox_ph", "survival_xgboost"],
    score_threshold=0.5,
    workspace=workspace,
)

study.run()

output = workspace / study_name / "model.p"

model = load_model_from_file(output)
metrics = evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)

print(f"Model {model.name()} score: {metrics['clf']}")

:high_brightness: Tutorials

Plugins

AutoML

:cyclone: Building a demonstrator

After running a study, a model template will be available in the workspace, in the model.p file. Based on this template, you can create a demonstrator using the scripts/build_demonstrator.py script.

Usage: build_demonstrator.py [OPTIONS]

Options:
  --name TEXT               The title of the demonstrator
  --task_type TEXT          classification/risk_estimation
  --dashboard_type TEXT     streamlit or dash. Default: streamlit
  --dataset_path TEXT       Path to the dataset csv
  --model_path TEXT         Path to the model template, usually model.p
  --time_column TEXT        Only for risk_estimation tasks. Which column in
                            the dataset is used for time-to-event
  --target_column TEXT      Which column in the dataset is the outcome
  --horizons TEXT           Only for risk_estimation tasks. Which time
                            horizons to plot.
  --explainers TEXT         Which explainers to include. There can be multiple
                            explainer names, separated by a comma. Available
                            explainers:
                            kernel_shap,invase,shap_permutation_sampler,lime.
  --imputers TEXT           Which imputer to use. Available imputers:
                            ['sinkhorn', 'EM', 'mice', 'ice', 'hyperimpute',
                            'most_frequent', 'median', 'missforest',
                            'softimpute', 'nop', 'mean', 'gain']
  --plot_alternatives TEXT  Only for risk_estimation. List of categorical
                            columns by which to split the graphs. For example,
                            plot outcome for different treatments available.
  --output TEXT             Where to save the demonstrator files. The content
                            of the folder can be directly used for
                            deployments(for example, to Heroku).
  --help                    Show this message and exit.

Build a demonstrator for a classification task

For this task, the scripts needs access to the model template workspace/model.p(generated after running a study), the baseline dataset "dataset.csv", and the target column target in the dataset, which contains the outcomes. Based on that, the demonstrator can be built using:

python ./scripts/build_demonstrator.py \
  --model_path=workspace/model.p  \
  --dataset_path=dataset.csv \
  --target_column=target \
  --task_type=classification

The result is a folder, output/image_bin, containing all the files necessary for running the demonstrator. You can start the demonstrator using

cd output/image_bin/
pip install -r ./requirements.txt
python ./app.py

The contents of the output/image_bin can be used for cloud deployments, for example, Heroku.

Optionally, you can customize the output option to store the output files. The default is set to output/image_bin.

Build a demonstrator for a survival analysis task

For this task, the scripts needs access to the model template workspace/model.p(generated after running a study), the baseline dataset "dataset.csv", the target column target in the dataset, the time_to_event column time_to_event, and the plotted time horizons. Based on that, the demonstrator can be built using:

python ./scripts/build_demonstrator.py \
  --model_path=workspace/model.p \
  --dataset_path=dataset.csv \
  --time_column=time_to_event \
  --target_column=target \
  --horizons="14,27,41" # use your own time horizons here, separated by a comma
  --task_type=risk_estimation

The result is a folder, output/image_bin, containing all the files necessary for running the demonstrator. You can start the demonstrator using

cd output/image_bin/
pip install -r ./requirements.txt
python ./app.py

The contents of the output/image_bin can be used for cloud deployments, for example, Heroku.

Customizing the demonstrator

You can customize your demonstrator, by selected multiple explainers.

python ./scripts/build_demonstrator.py \
  --model_path=workspace/model.p  \
  --dataset_path=dataset.csv \
  --target_column=target \
  --task_type=classification
  --explainers="invase,kernel_shap"

:zap: Plugins

Imputation methods

from autoprognosis.plugins.imputers import  Imputers

imputer = Imputers().get(<NAME>)

Name	Description
hyperimpute	Iterative imputer using both regression and classification methods based on linear models, trees, XGBoost, CatBoost and neural nets
mean	Replace the missing values using the mean along each column with `SimpleImputer`
median	Replace the missing values using the median along each column with `SimpleImputer`
most_frequent	Replace the missing values using the most frequent value along each column with `SimpleImputer`
missforest	Iterative imputation method based on Random Forests using `IterativeImputer` and `ExtraTreesRegressor`
ice	Iterative imputation method based on regularized linear regression using `IterativeImputer` and `BayesianRidge`
mice	Multiple imputations based on ICE using `IterativeImputer` and `BayesianRidge`
softimpute	`Low-rank matrix approximation via nuclear-norm regularization`
EM	Iterative procedure which uses other variables to impute a value (Expectation), then checks whether that is the value most likely (Maximization) - `EM imputation algorithm`
gain	`GAIN: Missing Data Imputation using Generative Adversarial Nets`

Preprocessing methods

from autoprognosis.plugins.preprocessors import Preprocessors

preprocessor = Preprocessors().get(<NAME>)

Name	Description
maxabs_scaler	Scale each feature by its maximum absolute value. `MaxAbsScaler`
scaler	Standardize features by removing the mean and scaling to unit variance. - `StandardScaler`
feature_normalizer	Normalize samples individually to unit norm. `Normalizer`
normal_transform	Transform features using quantiles information.`QuantileTransformer`
uniform_transform	Transform features using quantiles information.`QuantileTransformer`
minmax_scaler	Transform features by scaling each feature to a given range.`MinMaxScaler`

Classification

from autoprognosis.plugins.prediction.classifiers import Classifiers

classifier = Classifiers().get(<NAME>)

Name	Description
neural_nets	PyTorch based neural net classifier.
logistic_regression	`LogisticRegression`
catboost	Gradient boosting on decision trees - `CatBoost`
random_forest	A random forest classifier. `RandomForestClassifier`
tabnet	`TabNet : Attentive Interpretable Tabular Learning`
xgboost	`XGBoostClassifier`

Survival Analysis

from autoprognosis.plugins.prediction.risk_estimation import RiskEstimation

predictor = RiskEstimation().get(<NAME>)

Name	Description
survival_xgboost	`XGBoost Survival Embeddings`
loglogistic_aft	`Log-Logistic AFT model`
deephit	`DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks`
cox_ph	`Cox’s proportional hazard model`
weibull_aft	`Weibull AFT model.`
lognormal_aft	`Log-Normal AFT model`
coxnet	`CoxNet is a Cox proportional hazards model also referred to as DeepSurv`

Regression

from autoprognosis.plugins.prediction.regression import Regression

regressor = Regression().get(<NAME>)

Name	Description
tabnet_regressor	`TabNet : Attentive Interpretable Tabular Learning`
catboost_regressor	Gradient boosting on decision trees - `CatBoost`
random_forest_regressor	`RandomForestRegressor`
xgboost_regressor	`XGBoostClassifier`
neural_nets_regression	PyTorch-based neural net regressor.
linear_regression	`LinearRegression`

Explainers

from autoprognosis.plugins.explainers import Explainers

explainer = Explainers().get(<NAME>)

Name	Description
risk_effect_size	Feature importance using Cohen's distance between probabilities
lime	`Lime: Explaining the predictions of any machine learning classifier`
symbolic_pursuit	[`Symbolic Pursuit`](Learning outside the black-box: at the pursuit of interpretable models)
shap_permutation_sampler	`SHAP Permutation Sampler`
kernel_shap	`SHAP KernelExplainer`
invase	`INVASE: Instance-wise Variable Selection`

Uncertainty

from autoprognosis.plugins.uncertainty import UncertaintyQuantification
model = UncertaintyQuantification().get(<NAME>)

Name	Description
cohort_explainer
conformal_prediction
jackknife

:hammer: Test

After installing the library, the tests can be executed using pytest

$ pip install .[testing]
$ pytest -vxs -m "not slow"

Citing

If you use this code, please cite the associated paper:

TODO

References

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Healthcare Industry
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.1.21

Mar 15, 2023

0.1.20

Feb 14, 2023

0.1.18

Jan 31, 2023

0.1.17

Jan 27, 2023

0.1.16

Jan 26, 2023

0.1.15

Jan 19, 2023

0.1.14

Dec 20, 2022

0.1.13

Dec 20, 2022

0.1.12

Dec 17, 2022

0.1.11

Dec 16, 2022

0.1.10

Dec 15, 2022

0.1.9

Nov 28, 2022

0.1.8

Nov 15, 2022

0.1.7

Nov 15, 2022

0.1.7a0 pre-release

Nov 15, 2022

0.1.6

Nov 8, 2022

0.1.5

Nov 6, 2022

0.1.4

Oct 26, 2022

0.1.3

Oct 23, 2022

0.1.2

Oct 22, 2022

This version

0.1.1

Oct 21, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

autoprognosis-0.1.1-py2.py3-none-macosx_10_14_x86_64.whl (233.0 kB view hashes)

Uploaded Oct 21, 2022 Python 2 Python 3 macOS 10.14+ x86-64

autoprognosis-0.1.1-py2.py3-none-any.whl (235.3 kB view hashes)

Uploaded Oct 21, 2022 Python 2 Python 3

Hashes for autoprognosis-0.1.1-py2.py3-none-macosx_10_14_x86_64.whl

Hashes for autoprognosis-0.1.1-py2.py3-none-macosx_10_14_x86_64.whl
Algorithm	Hash digest
SHA256	`8c85a55d33fcb1eaf9b25c18d8c05f27b617beab2c68f41026f3cd9c49962ce3`
MD5	`8982fd10967915e9479b2db3b8b4bb6c`
BLAKE2b-256	`59c9e2ff1a97b83a532cc92c67df78d36f6e0af607eab9dc1f40a177ee5c7f01`

Hashes for autoprognosis-0.1.1-py2.py3-none-any.whl

Hashes for autoprognosis-0.1.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`38d963b5a403acaf22028b547d461526288ebba765e8d8f59ff5b23f961c67cf`
MD5	`f71d700f6b0070dd7d93c74407b1ea0f`
BLAKE2b-256	`795180e55bf94b9577d407d1e9cf57eae4e3b73b876b25e42fe097451dc03997`

autoprognosis 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

AutoPrognosis - A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.

:key: Features

:rocket:Installation

Using pip

Redis (Optional, but recommended)

:boom: Sample Usage

:high_brightness: Tutorials

Plugins

AutoML

:cyclone: Building a demonstrator

Build a demonstrator for a classification task

Build a demonstrator for a survival analysis task

Customizing the demonstrator

:zap: Plugins

Imputation methods

Preprocessing methods

Classification

Survival Analysis

Regression

Explainers

Uncertainty

:hammer: Test

Citing

References

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions