Skip to main content

scikit-learn classes for molecule transformation

Project description

scikit-mol

Fancy logo Fancy logo

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Implemented

  • Descriptors
    • MolecularDescriptorTransformer

  • Fingerprints
    • MorganFingerprintTransformer
    • MACCSKeysFingerprintTransformer
    • RDKitFingerprintTransformer
    • AtomPairFingerprintTransformer
    • TopologicalTorsionFingerprintTransformer
    • MHFingerprintTransformer
    • SECFingerprintTransformer
    • AvalonFingerprintTransformer

  • Conversions
    • SmilesToMol

  • Standardizer
    • Standardizer

- safeinference - SafeInferenceWrapper - set_safe_inference_mode
  • Utilities
    • CheckSmilesSanitazion

Installation

Users can install latest tagged release from pip

pip install scikit-mol

or from conda-forge

conda install -c conda-forge scikit-mol

The conda forge package should get updated shortly after a new tagged release on pypi.

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

Roadmap and Contributing

Help wanted! Are you a PhD student that want a "side-quest" to procrastinate your thesis writing or are you simply interested in computational chemistry, cheminformatics or simply with an interest in QSAR modelling, Python Programming open-source software? Do you want to learn more about machine learning with Scikit-Learn? Or do you use scikit-mol for your current work and would like to pay a little back to the project and see it improved as well? With a little bit of help, this project can be improved much faster! Reach to me (Esben), for a discussion about how we can proceed.

Currently we are working on fixing some deprecation warnings, its not the most exciting work, but it's important to maintain a little. Later on we need to go over the scikit-learn compatibility and update to some of their newer features on their estimator classes. We're also brewing on some feature enhancements and tests, such as new fingerprints and a more versatile standardizer.

There are more information about how to contribute to the project in CONTRIBUTION.md

BUGS

Probably still, please check issues at GitHub and report there

Contributers:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_mol-0.4.3.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

scikit_mol-0.4.3-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file scikit_mol-0.4.3.tar.gz.

File metadata

  • Download URL: scikit_mol-0.4.3.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for scikit_mol-0.4.3.tar.gz
Algorithm Hash digest
SHA256 b4ddc619817f1635bfec52b421498d0a744831282b442772169ba4d840087474
MD5 dabad36e71bbb315834a80043935c5b0
BLAKE2b-256 dfe30e4611501f5af1a5cf99f44a094f19fe600a0036e4d3f6b7e29d05bc62b0

See more details on using hashes here.

File details

Details for the file scikit_mol-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: scikit_mol-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for scikit_mol-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f31239f5f8eb68331a33a64620f563f5e7913d8781d65caa863c9c3b74e8e2de
MD5 ca01b2f718f92914340e099faaf221f6
BLAKE2b-256 448d9a940e0b80f5530f39afa050408aaa6a5e4eb4d98c5b5ca3c719c3ed5a35

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page