Skip to main content

A framework for building machine- and deel-learning predictors for molecular characteristics using Hydronaut and Chemfeat.

Project description


title: README author: Jan-Michael Rye

MolPred logo

Synopsis

MolPred is a Hydronaut-based framework for building machine- and deep-learning predictors for molecular characteristics using Chemfeat. MolPred will

Links

Usage

The framework can train user-supplied models to predict features of molecules. To train a model, the user should provide a set of International Chemical Identifiers (InChis) representing the molecules of the training set along with one or more features associated with these molecules. The user should then customize the example configuration file to select their model and chemical feature sets.

All results are logged with MLflow and any trained model can be re-used for testing or prediction by altering the configuration file to set the operation mode (train, test or predict) and a previous MLflow run ID for reloading the model and feature set.

Model

To create a model, the user must define a subclass of molpred.model.base.ModelBase. Some methods such as train and predict are required while others such as visualize_data and visualize_prediction_metrics are optional.

Once the model has been defined, it can be registered using the class's register method and then selected by name from the configuration file (experiment.params.model.name).

Examples

Scoring

molpred.model.scoring.register_scorer can be used to register custom scikit-learn scorers created with make_scorer. These scorers can then be used by name in the configuration file (experiment.params.model.scorers) to calculate and log metrics for the model during training and testing.

Visualization

All features calculated by Chemfeat are automatically plotted and logged for each run to provide insights into the correlation between the features and the target characteristics.

Numeric Features

All numeric features for a feature set are plotted together using a Seaborn stripplot after normalization.

Example of numeric feature plot

Categoric Features

Categoric features with common prefixes that only vary by a numeric suffix are grouped together and displayed as differential counts of each categoric value per target category. The data is displayed using a customized scatterplot that can visually separate data even for fingerprint features of up to 4096 bits. These plots attempt to highlight the indices of features that significantly vary per target category.

Example of categoric feature plot

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molpred-2023.2.tar.gz (200.0 kB view details)

Uploaded Source

Built Distribution

molpred-2023.2-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file molpred-2023.2.tar.gz.

File metadata

  • Download URL: molpred-2023.2.tar.gz
  • Upload date:
  • Size: 200.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for molpred-2023.2.tar.gz
Algorithm Hash digest
SHA256 9c4aaed8aeab1a318ab3e93b114781dd751b27679c1d630a32575e80fff61959
MD5 924c5539e0620798f80078ab0d11989f
BLAKE2b-256 45a9e54b6a9a2ce35ffb2535c7a6db051c0f59a4884aaf438d5dac05b3d2e6b1

See more details on using hashes here.

File details

Details for the file molpred-2023.2-py3-none-any.whl.

File metadata

  • Download URL: molpred-2023.2-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for molpred-2023.2-py3-none-any.whl
Algorithm Hash digest
SHA256 12546b55f40459ef4d6ef827286cc4bb0d7a6c9f1d9048ba1c7e399389b6a0af
MD5 4a9145cd035d93cfa863abab5efe643c
BLAKE2b-256 ad08ef7bcc6eb855a2fe798bd6c0310b390e6d01c9af21ff2655d75b0732b42d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page