Skip to main content

Water Solubility Prediction Project

Project description

Project Logo

Water Solubility Prediction Project

🔍 Overview 🔎

This project aims to predict the water solubility of chemical compounds using machine learning techniques. The project developed here can be used to estimate the water solubility of new compounds only using the SMILES code, which can be valuable in various industries such as pharmaceuticals, agriculture, and environmental science. In this repository, we are making available the data we used to train and test our models and .pkl files containing the optimized parameters of our best model. But more importantly a notebook tracing what we did from the beginning to the end of this project and a package that can predict the water solubility of several SMILEs and of a .csv file containing several SMILEs.

📝 Project Structure 📝

This project contains two main elements: a Notebook and a Package.

1️⃣ A Notebook containing:

  • Import Relevant Modules and Libraries
  • Data Collection
  • Data Cleaning
  • Calculation of RDkit Molecular Descriptors
  • Selection of Machine Learning Models
  • Fine-tuning
  • Analysis of different models
  • Saving of the best trained model and scaler

2️⃣ A Package containing two main functions:

  • A function to predict the LogS value for one or more SMILES
  • A function to predicts LogS values for SMILES codes stored in a CSV file

🔨 Installation 🔨

🌍 Environment 🌍

  1. clone our repository:
git clone https://github.com/Nohalyan/Projetppchem
  1. Open your terminal or Anaconda Prompt and navigate to the directory /WSPP_Projectppchem containing the wsppchem_env_environment.yml file and run the following command to create the Conda environment:
conda env create -f environment.yml
  1. Activate the environment: After creating the environment, activate it using:
conda activate wsppchem_env
  1. Verify the environment: To check that all the dependencies are installed correctly, you can list the installed packages:
conda list

📦 Package 📦

  1. Install our package wsppchem with a simple pip install.
pip install wsppchem
  1. Import all the functions using the following command.
from wsppchem.wspp_functions import *
  1. Enjoy! 😁

The two main functions of our package are predict_logS_smiles and predict_logS_csv which can be used in the following way:

predict_logS_smiles(*smiles_codes)
predict_logS_csv(csv_file_path)

The first function predict_logS_smiles(*smiles_codes) can be used to predict the LogS value for one or more SMILES at the same time. The second fucntion predict_logS_csv(csv_file_path) can be used to predicts LogS values for SMILES codes stored in a .csv file. And if you need any help, you can use the function wspphelp() which will give you more precise information on the functions as well as an example of how to use them.

📗 License 📕

This project is licensed under the MIT License.

📜 References 📜

This project is based on the code of this Github Jupyter notebook: https://github.com/gashawmg, as well as data from https://github.com/PatWalters.

📖 Authors 📖

This project was carried out as part of EPFL's Practical programming in Chemistry course.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsppchem-1.0.tar.gz (1.3 MB view hashes)

Uploaded Source

Built Distribution

wsppchem-1.0-py3-none-any.whl (1.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page