Skip to main content

ROBI: Robust and Optimized Biomarker Identifier

Project description

ROBI: Robust and Optimized Biomarker Identifier

ROBI is a selection pipeline that select predictive biomarkers from any set of features. The selection if performed with a robust and adjustable control of the number of false positives as well as a control for confounders. ROBI can control for confounders and already known biomarkers in order to select only new and relevant information.

PyPI version License Python 3.7

Keys features:

  • :shield: Robust control of the number of false positives by passing permuted datasets through the selection pipeline thousands of times (the more, the better). The proportion of false positive can be adjusted.
  • :heavy_plus_sign: Increased discovery rate via optimised feature selection.
  • :balance_scale: Reliable predictive power estimation through permutation tests instead of fixed thresholds.
  • :tada: Only select new information via control for confounders and correlation with known biomarkers.
  • :zap: Fast parallelized implementation that can leverage both CPU and GPU for extensive tests: you can easily evaluate tens of thousands of potential biomarkers with millions of permutations in a few minutes.

:rocket: Installation

pip install robi

:zap: Although PyTorch is not required to use the package, ROBI runs much faster with its PyTorch implementation. The speed gain is great on CPU, and much greater on GPU. To use the PyTorch implementation, simply install PyTorch (conda is the easiest way), and ROBI will use it automatically. To tell ROBI to use the GPU, set device='cuda' in the robi.make_selection function.

:sparkles: Utilisation

Basic usage

First, ROBI must be imported:

import robi

Then, a pandas dataframe need to be defined were each row is a patient, and each column a feature (biomarker, outcome, ...), such as:

print(df)
outcome candidate_1 candidate_2
10 0 100
25 0.1 -2

with candidate_1 and candidate_2 the candidate biomarkers that we want to evaluate and outcome the target (e.g. the feature that we want to be predicted by the selected biomarkers).

Then, the selection can be performed with:

selection, scores = robi.make_selection(df,
                                        candidates = ['candidate_1', 'candidate_2'],
                                        targets = 'outcome')

robi.make_selection will plot the following image:

The x axis is the degree of permissiveness: how strict is the selection. A low permissiveness means a stricter selection, reducing the number of false positives, but at the cost of more false negatives (e.g. missed discoveries). On the other hand, a high permissiveness means a less strict selection, increasing the number of discoveries but at the cost of more false positives. The orange line represent the number of selected candidates. The blue line represent the average number of false positives. The blue area is the 95% confidence interval for the number of false positives. If the selection is performed on multiple targets, a plot for each target is generated.

robi.make_selection will return two variables:

  • selection: contains the results of the selection for multiple level of false discovery rate.
  • scores: contains the results of the evaluation of the prognostic value of each candidate.

selection will look like this:

target permissiveness n_selected n_FP P_only_FP selected
outcome 0.01 1 0.1 (0-0) 1e-3 ['candidate_1']
outcome 0.02 2 0.5 (0-2) 1e-2 ['candidate_1', 'candidate_2']
... ... ... ... ... ...

with:

  • target: on which target was the selection performed
  • permissiveness: degree of permissiveness for this selection
  • n_selected: number of selected candidates
  • n_FP: average number of false positives and 95% confidence interval in parentheses
  • P_only_FP: probability of having only false positives selected
  • selected: list of the selected candidates for the corresponding permissiveness

scores will look like this:

candidate target C_index p_value
candidate_1 outcome 0.65 1e-3
candidate_2 outcome 0.55 2e-2
... ... ... ...

with:

  • candidate: candidate to whom belong the row
  • target: on which target was the C-index computed
  • C-index: C-index of the corresponding candidate for the corresponding target
  • p-value: p-value of the corresponding C-index

:rotating_light: :warning: the C-index is anti-concordant with the targets :warning: :rotating_light:

  • a feature with a C-index > 0.5 is negatively correlated with the target. The higher the feature value, the lower the target.
  • a feature with a C-index < 0.5, is positively correlated with the target. The higher the feature value, the higher the target.

So if the target is the Overall Survival or any other survival metric, a C-index > 0.5 means that the corresponding feature is positively correlated to the risk.

Control for confounders

If confounders are presents in the dataset, they can be listed in the confounders parameter. ROBI will discard any candidate that is sensitive to these confounders, making sure that any selected biomarker is relevant and worth studying further.

selection, scores = robi.make_selection(df,
                                        candidates,
                                        targets = 'outcome',
                                        confounders = ['age', 'sex'])

This way, any candidate whose hazard ratio changes by more than 10% when confounders are introduced in a Cox model, will be discarded.

Control for known biomarkers

If some biomarkers are already known and used, we can avoid selecting candidates that are simply replicating this known information. For instance, if we know that tumor volume affect the outcome of patients, we can specify known = ['tumor_volume']such as:

selection, scores = robi.make_selection(df,
                                        candidates,
                                        targets = 'outcome',
                                        known = ['tumor_volume'],
                                        confounders = ['age', 'sex'])

This way, any candidate that is simply a proxy of the tumor volume will be discarded. Multiple known biomarkers can be listed. Collinearity and multicollinearity will be tested.

Censored target

ROBI can handle censored targets (e.g. we know that a patient was alive until a certain date, but then we don't if he died or when). For instance, to use the Overall Survival (OS), one must specify:

selection, scores = robi.make_selection(df,
                                        candidates,
                                        targets = {
                                            'OS': ('OS_time', 'OS_happened')
                                        })

with OS_time being the time between diagnosis and death or end of study, and OS_happened a boolean feature stating if a patient died (True or 1) or not (False or 0) during the study.

Multiple targets

ROBI can perform the biomarker selection for multiple targets at the same time. For instance, the candidates could be evaluated for OS and Progression Free Survival (PFS). Simply pass them to the targets parameter as a dictionary:

selection, scores = robi.make_selection(df,
                                        candidates,
                                        targets = {
                                            'PFS': ('PFS_time', 'PFS_happened'),
                                            'OS': ('OS_time', 'OS_happened')
                                        })

The key of the dictionary is the name of the target. The first element of the tuple is the time, the second says if the event happened or not.

When giving multiple targets, some could be censored while other might be uncensored. Give them like in the following example:

selection, scores = robi.make_selection(df,
                                        candidates,
                                        targets = {
                                            'uncensored_target': ('uncensored_target'),
                                            'censored_target': ('censored_target_time', 'censored_target_happened')
                                        })

other parameters

:memo: Examples

You can find example notebooks in the notebooks folder of this repository.

radiomic DLBCL TCGA synthetic data

:mag: Pipeline diagram

:technologist: Author

Louis Rebaud: louis.rebaud@gmail.com

:page_facing_up: License

This project is licensed under the Apache License 2.0 - see the LICENSE.md file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robi-0.0.3.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robi-0.0.3-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file robi-0.0.3.tar.gz.

File metadata

  • Download URL: robi-0.0.3.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for robi-0.0.3.tar.gz
Algorithm Hash digest
SHA256 cf5d84e7ba53f776ab269166df4f3542e42a54b5ee75fe10bc84d0e1292283a3
MD5 ff88b9dd985a6dd0875cbccdfe380f56
BLAKE2b-256 15d90ef74ea4881ddbace14668fa7af20aa780af11844b3e323d7d52cc917b4e

See more details on using hashes here.

File details

Details for the file robi-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: robi-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for robi-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6b83375d9addf7665874556c3b629762dc7081daabc47764cecce99292fe7cc8
MD5 62a7039e1a99bc91df53d1252a9bf7f8
BLAKE2b-256 4de0e909eb43867f29b75d30514ffe6e0753db9149a042cc0c807f57f72edfb7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page