Interpretable and reliable multivariate random forest for simultaneous classification and regression
Project description
MORGOTH
This is the implementation of our novel random forest (RF)-based approach for Multivariate classificatiOn and Regression increasinG trustwOrTHiness (MORGOTH). A detailed description and application of the model can be found in our pre-print `Increasing trustworthiness of machine learning-based drug sensitivity prediction with a multivariate random forest approach'. MORGOTH can be used to simultaneously perform classification and regression using a novel objective function during the training, which is a linear combination of classification and regression error. Moreover, it offers the possibility to perform conformal prediction (CP), which can be used to obtain reliable classification and regression results. A more detailed explanation of CP and the framework we use can be found in our article 'Reliable anti-cancer drug sensitivity prediction and prioritization'. Additionally, MORGOTH provides a graph representation of the random forest to address model interpretability, and a cluster analysis of the leaves to measure the dissimilarity of new inputs from the training data to account for its reliability.
For issues and questions, please contact Lisa-Marie Rolli (lisa-marie.rolli[at]uni-saarland.de) or Kerstin Lenhof (research[at]klenhof.de).
Installation
You can install our morgoth package using pip:
pip install morgoth
used python3 libraries: fireducks pandas numpy typing math bisect operator copy sklearn time scipy collections multiprocessing functools re
Usage
An exemplary use is running our provided main as a module, which you can call after downloading the Example_Data folder from our GitHub.
python3 -m morgoth Example_Data/example_Json_config.json
Note that the directory tree should be kept and the path to the output folder should be edited in the file Example_Data/example_JSON_config.json. The prediction results for classification will be found in <output_dir><analysis_name>_ClassificationResultsFile1.txt and the regression results are stored in <output_dir><analysis_name>_<1-error_rate>_RegressionResultsFile1.txt. If if the field swap_test_calibration in the config file is set to 'True' there will be one additional file per task, respectively, where the '1' in the file name is replaced by a '2'. If a distance measure is given in the config, <output_dir><analysis_name>_SilhouetteScoresTrainSamples_<distance>.txt and <output_dir><analysis_name>_SilhouetteScoresTestSamples_<distance>.txt will contain the silhouette scores for the training and test samples, respectively. If draw_graph is set to True, the files <output_dir>/<analysis_name>_<sample_name>.dot contain the sample specific graphs and <output_dir><analysis_name>__graph_whole_forest.dot and <output_dir><analysis_name>__graph_average_whole_forest.dot contain the graph for the whole test set with either the raw count across all samples as edge weight or averaged by the number of test samples, respectively.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file morgoth-1.3.tar.gz.
File metadata
- Download URL: morgoth-1.3.tar.gz
- Upload date:
- Size: 31.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3290e8f3b3b8773a8a5e135d01daa87e5563e1c94039d25f526d9eb579d02fa1
|
|
| MD5 |
b7e21a10842f74042cff860d2a5d57bc
|
|
| BLAKE2b-256 |
3d01286f6c679d89ca746fbd00e38fc0942dafb2840e34e4f18194dc8f83877b
|
File details
Details for the file morgoth-1.3-py3-none-any.whl.
File metadata
- Download URL: morgoth-1.3-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c09186eb7d3ccd08823d88f455b33fe7a72ff192f451e8ff76ae6819b0b47fb
|
|
| MD5 |
8c2628fb22f6e97215be411f8ca69028
|
|
| BLAKE2b-256 |
7596a57e216df0f404964955c9e564e20790200c90317ad5a59f424b912f7713
|