Step Forward Cross Validation for Bioactivity Prediction
Project description
Step Forward Cross Validation for Bioactivity Prediction
This repo contains code to reproduce the results of SFCV Paper. These results include model predictions, tables, and images. Efforts are made to ensure reproducibility of this project. In case of undefined behaviour or errors in installing or benchmarking, please open an issue.
Install via PyPI
pip install sfcv
or
pip install git+https://github.com/Manas02/sfcv.git@main
Environment Setup
This project uses pyvenv to manage python
environment with Python 3.11. The following command will create virtual env in .venv directory.
Create Venv
python3.11 -m venv .venv
Install Requirements
pip install -r requirements.txt
Dataset
Landrum & Riniker [Paper | Data]
1. Download Datasets and Standardize SMILES
Please open and run 00_Data_source_and_standardize.ipynb to download the above-mentioned dataset and to standardize the SMILES in those files.
2. Predicting LogP, LogD and Computing MCE-18
Follow that by running 01_Data_add_LogP_LogD_MCE18.ipynb to predict and add data for CrippenLogP (rdkit), LogD (Code) and compute MCE-18.
3. Comparing the changes in number of compounds after standardization and deduplication
Follow this with
running 02_Table_mol_per_target_before_after_standardization.ipynb
to generate the table and parity plot. The results are saved in benchmark/results/tables and
benchmark/results/figures directories.
4. Comparing and Plotting the Distributions of Properties in Dataset
Run 03_Plots_Table_target_properties.ipynb to get the summary of properties as a table and to plot the distributions.
Method
1. Data Splitting
1. Implementing SortedStepForwardCV and UnsortedStepForwardCV
Run 04_Implementation_SFCV.ipynb to visualise how SortedStepForwardCV and UnsortedStepForwardCV work.
2. Implementing ScaffoldSplitCV
Run 05_Implementation_ScaffoldSplitCV.ipynb to check how ScaffoldSplitCV works. The algorithm groups molecules by their chemical scaffolds, shuffles these groups, and sequentially assigns entire scaffold groups to the training set until a target fraction is reached, with the remaining groups forming the test set.
3. Implementing RandomSplitCV
Run 06_Implementation_RandomSplitCV.ipynb to check how RandomSplitCV works.
4. Validating the Splits produce (almost) equal number of test compounds per fold
Run 07_Validate_train_test_split.ipynb to visualise number of molecules in test set across folds across targets.
5. Plotting Chemical Space wrt Split Type
Run 08_Plots_chemical_space_across_split.ipynb to visualise chemical space wrt Split types.
6. Plotting & Comparing Distribution of Sorting properties per Split type per Fold across Targets
Run 09_Plots_Table_split_properties.ipynb to visualise distributions of sorting properties wrt Split types per fold averaged over all targets.
2. Metrics
1. Implementing Discovery Yield
Run 10_Implimentation_Discovery_Yield.ipynb to understand and visualise the illustrative example of discovery yield.
2. Implementing Novelty Error
Run 11_Implimentation_Novelty_Error.ipynb to understand and visualise the illustrative example of novelty error.
3. Implementing Benchmark
Run 12_Implementation_Benchmark.ipynb to see how benchmarking was performed.
3. Results
1. Extract Results
Run 13_Table_extract_results.ipynb to extract results into digestable format.
2. Plot Results
Run 14_Plots_results.ipynb, 15_Plots_Result_hERG.ipynb and 16_Plots_Result_MAPK.ipynb, 17_Plots_Result_VEGFR.ipynb to visualise the results.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sfcv-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sfcv-0.1.1-py3-none-any.whl
- Upload date:
- Size: 126.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2347483f8f7cf53c56829c82332123a06e7f7dadd749b8431756e6c6ab2f36c
|
|
| MD5 |
03bb858f5f11292fca96a24cc5c40819
|
|
| BLAKE2b-256 |
690ce4e1dc2592f9828b2911337343564cad23f8e4383454b07b827e46489bd4
|