Skip to main content

Implementation of Optimal Sparse Survival Trees

Project description

OSST Documentation

Implementation of Optimal Sparse Survival Trees (OSST), an optimal decision tree algorithm for survival analysis. This is implemented based on Generalized Optimal Sparse Decision Tree framework (GOSDT). If you need classification trees, please use GOSDT. If you need regression trees, please use Optimal Sparse Regression Trees (OSRT).


Installation

You may use the following commands to install OSST along with its dependencies on macOS, Ubuntu and Windows.
You need Python 3.9 or later to use the module osst in your project.

pip3 install attrs packaging editables pandas scikit-learn sortedcontainers gmpy2 matplotlib scikit-survival
pip3 install osst

You need to install gmpy2==2.0.a1 if You are using Python 3.12

Configuration

The configuration is a JSON object and has the following structure and default values:

{ 
  "regularization": 0.01,
  "depth_budget": 5,
  "minimum_captured_points": 7,
  "bucketize": false,
  "number_of_buckets": 0,
  "warm_LB": false,
  "path_to_labels": "",
  
  "uncertainty_tolerance": 0.0,
  "upperbound": 0.0,
  "worker_limit": 1,
  "precision_limit": 0,
  "model_limit": 1,
  "time_limit": 0,

  "verbose": false,
  "diagnostics": false,
  "look_ahead": true,

  "model": "",
  "timing": "",
  "trace": "",
  "tree": "",
  "profile": ""
}

Key parameters

regularization

  • Values: Decimal within range [0,1]
  • Description: Used to penalize complexity. A complexity penalty is added to the risk in the following way.
    ComplexityPenalty = # Leaves x regularization
    
  • Default: 0.01
  • Note: We highly recommend setting the regularization to a value larger than 1/num_samples. A small regularization could lead to a longer training time and possible overfitting.

depth_budget

  • Values: Integers >= 1
  • Description: Used to set the maximum tree depth for solutions, counting a tree with just the root node as depth 1. 0 means unlimited.
  • Default: 5

minimum_captured_points

  • Values: Integers >= 1
  • Description: Minimum number of sample points each leaf node must capture
  • Default: 7

bucketize

  • Values: true or false
  • Description: Enables bucketization of time threshold for training
  • Default: false

number_of_buckets

  • Values: Integers
  • Description: The number of time thresholds to which origin data mapping to if bucktize flag is set to True
  • Default: 0

warm_LB

  • Values: true or false
  • Description: Enables the reference lower bound
  • Default: false

path_to_labels

  • Values: string representing a path to a directory.
  • Description: IBS loss of reference model
  • Special Case: When set to empty string, no reference IBS loss are stored.
  • Default: Empty string

time_limit

  • Values: Decimal greater than or equal to 0
  • Description: A time limit upon which the algorithm will terminate. If the time limit is reached, the algorithm will terminate with an error.
  • Special Cases: When set to 0, no time limit is imposed.
  • Default: 0

More parameters

Flag

look_ahead

  • Values: true or false
  • Description: Enables the one-step look-ahead bound implemented via scopes
  • Default: true

diagnostics

  • Values: true or false
  • Description: Enables printing of diagnostic trace when an error is encountered to standard output
  • Default: false

verbose

  • Values: true or false
  • Description: Enables printing of configuration, progress, and results to standard output
  • Default: false

Tuners

uncertainty_tolerance

  • Values: Decimal within range [0,1]
  • Description: Used to allow early termination of the algorithm. Any models produced as a result are guaranteed to score within the lowerbound and upperbound at the time of termination. However, the algorithm does not guarantee that the optimal model is within the produced model unless the uncertainty value has reached 0.
  • Default: 0.0

upperbound

  • Values: Decimal within range [0,1]
  • Description: Used to limit the risk of model search space. This can be used to ensure that no models are produced if even the optimal model exceeds a desired maximum risk. This also accelerates learning if the upperbound is taken from the risk of a nearly optimal model.
  • Special Cases: When set to 0, the bound is not activated.
  • Default: 0.0

Limits

model_limit

  • Values: Decimal greater than or equal to 0
  • Description: The maximum number of models that will be extracted into the output.
  • Special Cases: When set to 0, no output is produced.
  • Default: 1

precision_limit

  • Values: Decimal greater than or equal to 0
  • Description: The maximum number of significant figures considered when converting ordinal features into binary features.
  • Special Cases: When set to 0, no limit is imposed.
  • Default: 0

worker_limit

  • Values: Decimal greater than or equal to 1
  • Description: The maximum number of threads allocated to executing th algorithm.
  • Special Cases: When set to 0, a single thread is created for each core detected on the machine.
  • Default: 1

Files

model

  • Values: string representing a path to a file.
  • Description: The output models will be written to this file.
  • Special Case: When set to empty string, no model will be stored.
  • Default: Empty string

profile

  • Values: string representing a path to a file.
  • Description: Various analytics will be logged to this file.
  • Special Case: When set to empty string, no analytics will be stored.
  • Default: Empty string

timing

  • Values: string representing a path to a file.
  • Description: The training time will be appended to this file.
  • Special Case: When set to empty string, no training time will be stored.
  • Default: Empty string

trace

  • Values: string representing a path to a directory.
  • Description: snapshots used for trace visualization will be stored in this directory
  • Special Case: When set to empty string, no snapshots are stored.
  • Default: Empty string

tree

  • Values: string representing a path to a directory.
  • Description: snapshots used for trace-tree visualization will be stored in this directory
  • Special Case: When set to empty string, no snapshots are stored.
  • Default: Empty string

Example

Example code to run OSST with lower bound guessing, and depth limit. The example python file is available in osst/example.py.

import pandas as pd
import numpy as np
from osst.model.osst import OSST
from osst.model.metrics import harrell_c_index, uno_c_index, integrated_brier_score, cumulative_dynamic_auc, compute_ibs_per_sample
from sklearn.model_selection import train_test_split
from sksurv.ensemble import RandomSurvivalForest
from sksurv.datasets import get_x_y
import pathlib


dataset_path = "experiments/datasets/churn/churn.csv"

# read the dataset
# preprocess your data otherwise OSST will binarize continuous feature using all threshold values.
df = pd.read_csv(dataset_path)
X, event, y = df.iloc[:,:-2].values, df.iloc[:,-2].values.astype(int), df.iloc[:,-1].values
h = df.columns[:-2]
X = pd.DataFrame(X, columns=h)
event = pd.DataFrame(event)
y = pd.DataFrame(y)
_, y_sksurv = get_x_y(df, df.columns[-2:], 1)
print("X shape: ", X.shape)
# split train and test set
X_train, X_test, event_train, event_test, y_train, y_test, y_sksurv_train, y_sksurv_test \
      = train_test_split(X, event, y, y_sksurv, test_size=0.2, random_state=2024)

times_train = np.unique(y_train.values.reshape(-1))
times_test = np.unique(y_test.values.reshape(-1))
print("Train time thresholds range: ({:.1f}, {:.1f}),  Test time thresholds range: ({:.1f}, {:.1f})".format(\
    times_train[0], times_train[-1], times_test[0], times_test[-1]))

# compute reference lower bounds
ref_model = RandomSurvivalForest(n_estimators=100, max_depth=3, random_state=2024)
ref_model.fit(X_train, y_sksurv_train)
ref_S_hat = ref_model.predict_survival_function(X_train)
ref_estimates = np.array([f(times_train) for f in ref_S_hat])
ibs_loss_per_sample = compute_ibs_per_sample(event_train, y_train, event_train, y_train, ref_estimates, times_train)

labelsdir = pathlib.Path('/tmp/warm_lb_labels')
labelsdir.mkdir(exist_ok=True, parents=True)

labelpath = labelsdir / 'warm_label.tmp'
labelpath = str(labelpath)

pd.DataFrame(ibs_loss_per_sample, columns=['class_labels']).to_csv(labelpath, header='class_labels', index=None)

# fit model

config = {
    "look_ahead": True,
    "diagnostics": True,
    "verbose": False,

    "regularization": 0.01,
    "uncertainty_tolerance": 0.0,
    "upperbound": 0.0,
    "depth_budget": 5,
    "minimum_captured_points": 7,

    "model_limit": 100,
    
    "warm_LB": True,
    "path_to_labels": labelpath,
  }


model = OSST(config)
model.fit(X_train, event_train, y_train)
print("evaluate the model, extracting tree and scores", flush=True)

# evaluation
n_leaves = model.leaves()
n_nodes = model.nodes()
time = model.time
print("Model training time: {}".format(time))
print("# of leaves: {}".format(n_leaves))

print("Train IBS score: {:.6f} , Test IBS score: {:.6f}".format(\
    model.score(X_train, event_train, y_train), model.score(X_test, event_test, y_test)))

S_hat_train = model.predict_survival_function(X_train)
estimates_train = np.array([f(times_train) for f in S_hat_train])

S_hat_test = model.predict_survival_function(X_test)
estimates_test = np.array([f(times_test) for f in S_hat_test])

print("Train Harrell's c-index: {:.6f}, Test Harrell's c-index: {:.6f}".format(\
    harrell_c_index(event_train, y_train, estimates_train, times_train)[0], \
    harrell_c_index(event_test, y_test, estimates_test, times_test)[0]))

print("Train Uno's c-index: {:.6f}, Test Uno's c-index: {:.6f}".format(\
    uno_c_index(event_train, y_train, event_train, y_train, estimates_train, times_train)[0],\
    uno_c_index(event_train, y_train, event_test, y_test, estimates_test, times_test)[0]))

print("Train AUC: {:.6f}, Test AUC: {:.6f}".format(\
    cumulative_dynamic_auc(event_train, y_train, event_train, y_train, estimates_train, times_train)[0],\
    cumulative_dynamic_auc(event_train, y_train, event_test, y_test, estimates_test, times_test)[0]))

print(model.tree)

Output

X shape:  (2000, 42)
Train time thresholds range: (0.0, 12.0),  Test time thresholds range: (0.0, 12.0)
osst reported successful execution
training completed. 4.968 seconds.
bounds: [0.168379..0.168379] (0.000000) IBS loss = 0.118379, iterations=16920
evaluate the model, extracting tree and scores
Model training time: 4.9679999351501465
# of leaves: 5
Train IBS score: 0.118379 , Test IBS score: 0.124289
Train Harrell's c-index: 0.737871, Test Harrell's c-index: 0.734727
Train Uno's c-index: 0.689405, Test Uno's c-index: 0.706680
Train AUC: 0.800940, Test AUC: 0.806016
if product_accounting_No = 1 then:
    predicted time: 4
    normalized loss penalty: 0.0
    complexity penalty: 0.01

else if csat_score_7 = 1 and product_accounting_No != 1 then:
    predicted time: 3
    normalized loss penalty: 0.0
    complexity penalty: 0.01

else if csat_score_7 != 1 and product_accounting_No != 1 and product_payroll_No = 1 then:
    predicted time: 2
    normalized loss penalty: 0.0
    complexity penalty: 0.01

else if csat_score_7 != 1 and csat_score_8 = 1 and product_accounting_No != 1 and product_payroll_No != 1 then:
    predicted time: 1
    normalized loss penalty: 0.0
    complexity penalty: 0.01

else if csat_score_7 != 1 and csat_score_8 != 1 and product_accounting_No != 1 and product_payroll_No != 1 then:
    predicted time: 0
    normalized loss penalty: 0.0
    complexity penalty: 0.01

License

This software is licensed under a 3-clause BSD license (see the LICENSE file for details).


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

osst-0.1.7-cp312-abi3-win_amd64.whl (806.5 kB view details)

Uploaded CPython 3.12+ Windows x86-64

osst-0.1.7-cp312-abi3-macosx_14_0_arm64.whl (582.3 kB view details)

Uploaded CPython 3.12+ macOS 14.0+ ARM64

osst-0.1.7-cp312-abi3-macosx_13_0_x86_64.whl (674.6 kB view details)

Uploaded CPython 3.12+ macOS 13.0+ x86-64

osst-0.1.7-cp312-abi3-macosx_12_0_x86_64.whl (663.3 kB view details)

Uploaded CPython 3.12+ macOS 12.0+ x86-64

osst-0.1.7-cp311-abi3-win_amd64.whl (806.5 kB view details)

Uploaded CPython 3.11+ Windows x86-64

osst-0.1.7-cp311-abi3-macosx_14_0_arm64.whl (582.3 kB view details)

Uploaded CPython 3.11+ macOS 14.0+ ARM64

osst-0.1.7-cp311-abi3-macosx_13_0_x86_64.whl (674.6 kB view details)

Uploaded CPython 3.11+ macOS 13.0+ x86-64

osst-0.1.7-cp311-abi3-macosx_12_0_x86_64.whl (663.3 kB view details)

Uploaded CPython 3.11+ macOS 12.0+ x86-64

osst-0.1.7-cp310-abi3-win_amd64.whl (806.5 kB view details)

Uploaded CPython 3.10+ Windows x86-64

osst-0.1.7-cp310-abi3-macosx_14_0_arm64.whl (582.3 kB view details)

Uploaded CPython 3.10+ macOS 14.0+ ARM64

osst-0.1.7-cp310-abi3-macosx_13_0_x86_64.whl (674.6 kB view details)

Uploaded CPython 3.10+ macOS 13.0+ x86-64

osst-0.1.7-cp310-abi3-macosx_12_0_x86_64.whl (663.3 kB view details)

Uploaded CPython 3.10+ macOS 12.0+ x86-64

osst-0.1.7-cp39-abi3-win_amd64.whl (807.2 kB view details)

Uploaded CPython 3.9+ Windows x86-64

osst-0.1.7-cp39-abi3-macosx_14_0_arm64.whl (582.3 kB view details)

Uploaded CPython 3.9+ macOS 14.0+ ARM64

osst-0.1.7-cp39-abi3-macosx_13_0_x86_64.whl (674.6 kB view details)

Uploaded CPython 3.9+ macOS 13.0+ x86-64

osst-0.1.7-cp39-abi3-macosx_12_0_x86_64.whl (663.3 kB view details)

Uploaded CPython 3.9+ macOS 12.0+ x86-64

File details

Details for the file osst-0.1.7-cp312-abi3-win_amd64.whl.

File metadata

  • Download URL: osst-0.1.7-cp312-abi3-win_amd64.whl
  • Upload date:
  • Size: 806.5 kB
  • Tags: CPython 3.12+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for osst-0.1.7-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 326641073d6d857c8320944fbc57ae323a0423323a85c704ef4e2ce80ce94d3e
MD5 b8675d03bb4c99009c9f3f4c832f2a30
BLAKE2b-256 eff1f53f98ad8ba9c9b6c055e38b78998e49e9dc689268d992ffd7c658333b79

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp312-abi3-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp312-abi3-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 5bf1479102ecf2ce5c5fa74a5522fc7870b0db1bb9749cfb1ecdcbd37b415cf8
MD5 cbc93150e2df41864edb03538de79bde
BLAKE2b-256 d7369cba644d8b0d17eeeb2dd51ed6bf5c1a71a2ab2b0fb2f12d9ab922670cc3

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp312-abi3-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp312-abi3-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 aff1e80b8660822944af942abbb9cee741dd34cf82d8154ad4b6c8fbed128c95
MD5 d5554ea2ca494f682c08a071c3b10659
BLAKE2b-256 6db4dd46f7176cb9cb787915319c6de7707efad27429f2b4a07e46df5447e902

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp312-abi3-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp312-abi3-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 1112e7358b1dd34730c0b2d41e4c847e45c503bfe7d6d9e7fb9bee18fe3a3b82
MD5 a0373afe53d920fd6531fd13962dd113
BLAKE2b-256 e17b7f9c99d70e71dd271ff4bd52dca7e346340c19ba805b8cf9261bdfb0258b

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: osst-0.1.7-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 806.5 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for osst-0.1.7-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7d5e2af7bfd92b930adf02fedcd88d035cb70a795a5bba36fecc10d68213cd94
MD5 50160e432cb9b9a14f9d3aa4172ee042
BLAKE2b-256 75ecab98fb1461fad4b09c2caa1339526200c4359bd126d08582788457ccaf0c

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp311-abi3-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp311-abi3-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 6ffd03a1be3c84d415f8d0f89ea75bfb499251bc1fc692b437a4d7652ece6ae1
MD5 2c4a3cbe9921be30f94a2c827f16e23c
BLAKE2b-256 14980fe768e7c919cc4e2023b6118cf44b48041189e972065fd5041d4986c45e

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp311-abi3-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp311-abi3-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 83aa496d33d6b19630194f7db71296ab46d97039d833267d12c4507eb42e8e29
MD5 c997f6b2ecc67a95f68413e3ac0df394
BLAKE2b-256 a7a241aebd8587916c79a3ce1c7d79fc7b948adfc5ab70c277813145d65b4195

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp311-abi3-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp311-abi3-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 dfec732ad6161dd730d8476be5f6a6fa88bdcbd55fccc3f75763efb38aa04862
MD5 a25cc31a4d35667d0772fc1425abf10e
BLAKE2b-256 46231b6a25b0c630dc4ca0ea836b47f0f55bd1551cae5709de510bcd08a8f55d

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: osst-0.1.7-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 806.5 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for osst-0.1.7-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1fe831fa4d665bd5330f88175fd14e9dace15e065b1b42adaa0bba81ee4d4b07
MD5 6756ebb5bc359af1ef536fee1059f89d
BLAKE2b-256 32a5bd4b7a0e74f9a89c2f50147602d70cd00ba3446b5c9cc1182e83609c251a

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp310-abi3-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp310-abi3-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 841617eded5af93e19dd6031fe60e9fded51e6c491ad979c4930e20b71bf6b9b
MD5 f516745a6441a1317a25f3a8c3a9243d
BLAKE2b-256 db8225165284aac733fff4ce524370fb80525dd802738522b48cf5d5655f9161

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp310-abi3-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp310-abi3-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 61ad3559a3801b3df7f85d95ba6d9814a06cae5e150cb2724b7912a79bcb69ca
MD5 f3c879490d0094cb4598d0f857fd80af
BLAKE2b-256 512293511cf2a3516d14e98548ef3e212e1ac5c0a781706e5349a3a0687bc30f

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp310-abi3-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp310-abi3-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 48421a22f621955abbd194f404195d8c6d78a27b65c066b4732b5dfcf41dfcac
MD5 53c16b5ec57476d4e14a3b2f5aa50499
BLAKE2b-256 b9aa15034e25c99214ef9b46bd35114fd6025c16867eb8601126562a539524dd

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: osst-0.1.7-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 807.2 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.13

File hashes

Hashes for osst-0.1.7-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 9e5e3aba1fde3152ca398a2af16fc0f08399ee5dfa0d9241ac539c5b7d01d77c
MD5 abb5b3bd45f1eacd9b6250c3f0343d0c
BLAKE2b-256 da10f71034bde9581454ccebfc015020b36618cfe35f94d96aae16f97efef91f

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp39-abi3-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp39-abi3-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 c85d1b4ae40d8c82fe51c1dcc3c9079f04ca3cd8f3678a458c1893e91f3e2d69
MD5 d2af51bcf82471674f12ae01a7975429
BLAKE2b-256 9f8d6b42651623fea7ce62e7b3b2d841a21415450515741e5f5a7cfd531d875a

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp39-abi3-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp39-abi3-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 db9dbc9e1e8f46e7d5a345fd75c019b8e7fc3cd3863554da465413f24bc714bf
MD5 bedadc4a03e0cdf25b8a4f2353e10bee
BLAKE2b-256 2b95f4054b1e1602ee304af1bd28ddd8910d02c911b1643aed3e3974f4ad7a25

See more details on using hashes here.

File details

Details for the file osst-0.1.7-cp39-abi3-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for osst-0.1.7-cp39-abi3-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 70522a8d0161af2c8b2b50a32c922999417887812f40717d81f63f2c73fc34d0
MD5 813666e064ef0be42dca5e2811614eca
BLAKE2b-256 4529f40d9f2645b7f9c91e33211c08d3518c14099f70df3ff13e177a1c298a37

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page