Skip to main content

Python package to predict membrane permeability of cyclic peptides.

Project description


tests

Code style: black Cookiecutter template from @SchwallerGroup Learn more @SchwallerGroup

CycPepPerm


Python package to predict membrane permeability of cyclic peptides.

👩‍💻 Installation

We provide the code as a python package, so the only thing you need is to install it. We recommend creating a new conda environment for that, which allows simple package management for a project. Follow these instructions to install Anaconda. However, the package containing our code can also be installed without creating a project-specific environment. In that case, one just skips the first two lines of the following code:

conda create -n cyc-pep-perm python=3.10
conda activate cyc-pep-perm
pip install cyc-pep-perm==0.1.2

🛠️ For Developers

See detailed installation instructions The repository can be cloned from GitHub and installed with `pip` or `conda`. The code was built with Python 3.10 on Linux but other OS should work as well.

With conda:

$ git clone git+https://github.com/schwallergroup/CycPepPerm.git
$ cd CycPepPerm
$ conda env create -f environment.yml
$ conda activate cyc_pep_perm
$ pip install -e .

or with pip:

$ git clone git+https://github.com/schwallergroup/CycPepPerm.git
$ cd CycPepPerm
$ conda create -n cyc_pep_perm python=3.10
$ conda activate cyc_pep_perm
$ pip install -r requirements.txt
$ pip install -e .

If the options above did not work, please try from scratch:

$ git clone git+https://github.com/schwallergroup/CycPepPerm.git
$ cd CycPepPerm
$ conda create -c conda-forge -n cyc_pep_perm rdkit=2022.03.5 python=3.10
$ conda activate cyc_pep_perm
$ conda install -c conda-forge scikit-learn=1.0.2
$ conda install -c rdkit -c mordred-descriptor mordred
$ conda install -c conda-forge xgboost
$ conda install -c conda-forge seaborn
$ pip install shap
$ conda install -c conda-forge jupyterlab
$ pip isntall pandas-ods-reader
$ pip install -e .

🔥 Usage

For some more examples on how to process data, train and evaluate the alogrithms, please consult the folder notebooks/. This folder also contains a notebook to perform polynomial fits as described in the paper.

All data paths in the following examples are taken from the hard-coded paths that work when one clones this repository. If you use the python package and download the data separately, please change the paths accordingly.

Data preprocessing

Here we showcase how to handle the data as for our use-case. Some simple reformating is done (see also the notebook notebooks/01_data_preparation.ipynb) starting from .ods file with DataWarrior output (for data see Data and Models).

import os

from cyc_pep_perm.data.processing import DataProcessing

data_dir = "/path/to/data/folder" # ADAPT TO YOUR PATH!

# this can also be a .csv input
datapath = os.path.join(data_dir, "perm_random80_train_raw.ods")

# instantiate the class and make sure the columns match your inputed file - otherwise change arguments
dp = DataProcessing(datapath=datapath)

# make use of precomputed descriptors from DataWarrior
df = dp.read_data(filename="perm_random80_train_dw.csv")

# calculate Mordred deescripttors
df_mordred = dp.calc_mordred(filename="perm_random80_train_mordred.csv")

Training

Make sure to have the data ready to be used. In order to make the hyperparameter search more extensive, please look into the respective python scripts (e.g. src/cyc_pep_perm/models/randomforest.py) and adjust the PARAMS dictionary.

import os

from cyc_pep_perm.models.randomforest import RF

data_dir = "/path/to/data/folder" # ADAPT TO YOUR PATH!
train_data = os.path.join(data_dir, "perm_random80_train_dw.csv")
model_dir = "/path/to/model/folder" # ADAPT TO YOUR PATH!
rf_model_trained = os.path.join(model_dir, "rf_random_dw.pkl")

# instantiate class
rf_regressor = RF()

model = rf_regressor.train(
    datapath = train_data,
    savepath = rf_model_trained,
)

y_pred, rmse, r2 = rf_regressor.evaluate()
# will print training results, e.g.:
>>> RMSE: 8.45
>>> R2: 0.879

Prediction

import os

from cyc_pep_perm.models.randomforest import RF

data_dir = "/path/to/data/folder" # ADAPT TO YOUR PATH!
train_data = os.path.join(data_dir, "perm_random20_test_dw.csv")
model_dir = "/path/to/model/folder" # ADAPT TO YOUR PATH!
rf_model_trained = os.path.join(model_dir, "rf_random_dw.pkl")

# instantiate class
rf_regressor = RF()

# load trained model
rf_regressor.load(
    modelpath = rf_model_trained,
)

# data to predict on, e.g.:
df = pd.read_csv(train_data)
X = df.drop(columns=["SMILES"])

# predict
y_pred = rf_regressor.predict(X)

Data and Models

All data required for reproducing the results in the paper are provided in the folder data/. Beware that due to the random nature of these models, the results might differ from the ones reported in the paper. The files found in data/ are split into training and test data (randomly split 80/20) and with either the DataWarrior (dw) or the Mordred descriptors. The simple data processing can be found in the notebook notebooks/01_data_preparation.ipynb. The DataWarrior descriptors are computed with external software (DataWarrior). The following files are provided:

  • data/perm_random20_test_dw.csv - test data with DataWarrior descriptors
  • data/perm_random20_test_mordred.csv - test data with Mordred descriptors
  • data/perm_random20_test_raw.ods - test data before processing
  • data/perm_random80_train_dw.csv - training data with DataWarrior descriptors
  • data/perm_random80_train_mordred.csv - training data with Mordred descriptors
  • data/perm_random80_train_raw.ods - training data before processing

The models are provided in the folder models/ and can be loaded with the load_model() method of the respective class. The models provided are:

  • models/rf_random_dw.pkl - Random Forest trained on DataWarrior descriptors
  • models/rf_random_mordred.pkl - Random Forest trained on Mordred descriptors
  • models/xgb_random_dw.pkl - XGBoost trained on DataWarrior descriptors
  • models/xgb_random_mordred.pkl - XGBoost trained on Mordred descriptors

✅ Citation

@Misc{this_repo,
  author = { Rebecca M Neeser },
  title = { cyc_pep_perm - Python package to predict membrane permeability of cyclic peptides. },
  howpublished = {Github},
  year = {2023},
  url = {https://github.com/schwallergroup/CycPepPerm }
}

🛠️ For Developers

See developer instructions

👐 Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

🥼 Testing

After cloning the repository and installing tox with pip install tox, the unit tests in the tests/ folder can be run reproducibly with:

$ tox

Additionally, these tests are automatically re-run with each commit in a GitHub Action.

📖 Building the Documentation

The documentation can be built locally using the following:

$ git clone git+https://github.com/schwallergroup/CycPepPerm.git
$ cd CycPepPerm
$ tox -e docs
$ open docs/build/html/index.html

The documentation automatically installs the package as well as the docs extra specified in the setup.cfg. sphinx plugins like texext can be added there. Additionally, they need to be added to the extensions list in docs/source/conf.py.

📦 Making a Release

After installing the package in development mode and installing tox with pip install tox, the commands for making a new release are contained within the finish environment in tox.ini. Run the following from the shell:

$ tox -e finish

This script does the following:

  1. Uses Bump2Version to switch the version number in the setup.cfg, src/cyc_pep_perm/version.py, and docs/source/conf.py to not have the -dev suffix
  2. Packages the code in both a tar archive and a wheel using build
  3. Uploads to PyPI using twine. Be sure to have a .pypirc file configured to avoid the need for manual input at this step
  4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
  5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can use tox -e bumpversion -- minor after.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyc_pep_perm-0.1.2.tar.gz (4.8 MB view details)

Uploaded Source

Built Distribution

cyc_pep_perm-0.1.2-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file cyc_pep_perm-0.1.2.tar.gz.

File metadata

  • Download URL: cyc_pep_perm-0.1.2.tar.gz
  • Upload date:
  • Size: 4.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for cyc_pep_perm-0.1.2.tar.gz
Algorithm Hash digest
SHA256 fb92a80b65e75298aaf0d666ab854a964336c3dab2c71193d995bcc939216e9c
MD5 df5ae929cabb739cdb95955327ef73fa
BLAKE2b-256 b062878f908080c8e0ec0fab6d18c3112d4919ec2dfda3f4ae57b04eaa7e5af4

See more details on using hashes here.

File details

Details for the file cyc_pep_perm-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: cyc_pep_perm-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for cyc_pep_perm-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 68832286d9beecb2136325099164a337f92ea2c229aa4e846b5baee8ed3ccea9
MD5 be37206e30f384930f54cd0804b38d6f
BLAKE2b-256 3bdd67828414ffe009a06725051fead820a3805add55c5d3648b7221e4d0b637

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page