Skip to main content

scikit-learn classes for molecule transformation

Project description

scikit-mol

Fancy logo Fancy logo

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Implemented

  • Descriptors
    • MolecularDescriptorTransformer

  • Fingerprints
    • MorganFingerprintTransformer
    • MACCSKeysFingerprintTransformer
    • RDKitFingerprintTransformer
    • AtomPairFingerprintTransformer
    • TopologicalTorsionFingerprintTransformer
    • MHFingerprintTransformer
    • SECFingerprintTransformer
    • AvalonFingerprintTransformer

  • Conversions
    • SmilesToMol

  • Standardizer
    • Standardizer

- safeinference - SafeInferenceWrapper - set_safe_inference_mode
  • Utilities
    • CheckSmilesSanitazion

Installation

Users can install latest tagged release from pip

pip install scikit-mol

or from conda-forge

conda install -c conda-forge scikit-mol

The conda forge package should get updated shortly after a new tagged release on pypi.

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

Contributing

There are more information about how to contribute to the project in CONTRIBUTION.md

BUGS

Probably still, please check issues at GitHub and report there

Contributers:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_mol-0.4.2.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

scikit_mol-0.4.2-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file scikit_mol-0.4.2.tar.gz.

File metadata

  • Download URL: scikit_mol-0.4.2.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for scikit_mol-0.4.2.tar.gz
Algorithm Hash digest
SHA256 a281ec770a0c62176a6fbd23b716e3cef18a2484592f23a95abf82de8f29d1a8
MD5 4985e7f1a099e52a779bf013d9029aea
BLAKE2b-256 d9f937acc0ba8c5b867aee0989109745e74e19efdca5893a784be65f1d1c821c

See more details on using hashes here.

File details

Details for the file scikit_mol-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: scikit_mol-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for scikit_mol-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 05f459bec41298d276cf48ce6e3da610910d4a6332af89b14ebed745948e4132
MD5 81cd023e7d35af0c8c452eb5af818c9b
BLAKE2b-256 06a06701384cf25e4a98e7c85a3cd028763d5e0bf2b9993ac1dd444f4452a23d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page