Skip to main content

No project description provided

Project description

GHOST

This repository is part of the Supporting Information to

GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning

Carmen Esposito,1 Gregory A. Landrum,1,2 Nadine Schneider,3 Nikolaus Stiefl,3 and Sereina Riniker1

1 Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich,Switzerland
2 T5 Informatics GmbH, Spalenring 11, 4055 Basel, Switzerland
3 Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus,4002 Basel, Switzerland

Installing GHOST

You can install the most recent release of GHOST from pypi:

python -m pip install ghostml

or, if you want to install the development version directly from github:

python -m pip install git+https://github.com/rinikerlab/GHOST

Content

Notebooks:

  • library_example.ipynb
    Example of how to use the ghostml library.

  • example_oob_threshold_optimization.ipynb
    Example of how to use the oob-based thresholding method to optimize the decision threshold of a random forest classifier.

  • example_GHOST.ipynb
    Example of how to use GHOST (Generalized tHreshOld ShifTing) to optimize the decision threshold of classification models.

  • Tutorial_Threshold_Optimization_RF.ipynb
    Notebook explaining step by step how to reproduce the results reported in our work. Here, the code is only executed for 6 public datasets and the random forest model.

  • Reproduce_Results_Public_Datasets.ipynb
    Notebook to reproduce the results reported in our work. Here, results are produced for all 138 public datasets. The user can choose between four different machine learning methods, namely random forest (RF), gradient boosting (GB), XGBoost (XGB), and logistic regression (LR). The user can also choose between two different molecular descriptors, ECFP4 and RDKit2D.

  • DeepChem_PubChem.ipynb
    Notebook to reproduce the results of the multi-task classification models for the PubChem datasets.

  • DeepChem_MoleculeNet.ipynb
    Notebook to reproduce the results of the multi-task classification models for the MoleculeNet datasets.

Validation Data:

The threshold optimization methods have been validated agaist 138 public datasets and these are all provided here in the folder notebooks/data.

Dependencies:

If you are just interested in using ghostml in your own code/notebooks, you'll just need these packages:

  • numpy
  • pandas
  • sklearn

A list of dependencies to run the example notebooks is available in the file notebooks/ghost_env.yml. This conda environment was used to obtain the results reported in our work.

Authors

Carmen Esposito (GHOST procedure) and Greg Landrum (oob-based threshold optimization approach, data collection, initial code).

Acknowledgements

Conformal prediction (CP) experiments were adapted from the CP functions provided by the Volkamer Lab.

License

This package is licensed under the terms of the MIT license.

Citation

https://doi.org/10.1021/acs.jcim.1c00160

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostml-0.3.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostml-0.3.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file ghostml-0.3.0.tar.gz.

File metadata

  • Download URL: ghostml-0.3.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.1

File hashes

Hashes for ghostml-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d349f812b7b445833e6df25a6cecd88e00cd51c10da6187fe790ff5329b82a1b
MD5 db0b0d0a3f877c4ee94e376dd4cbe05a
BLAKE2b-256 f3b3f6ebcb8bbba26a604a8008229acd098229027e2061ecf7ad9dba456dcdf2

See more details on using hashes here.

File details

Details for the file ghostml-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: ghostml-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.1

File hashes

Hashes for ghostml-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 77cd116837bc7cdce7df69826af05ca0f8e347b1fcaa629ac4820cfb6ef6a3ec
MD5 415c025c535f0a47b57908159a2434bc
BLAKE2b-256 d99947523c0f367e609b805ece78694044822a55dac12bd2d2ca926e40beb622

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page