Skip to main content

scikit-learn classes for molecule transformation

Project description

scikit-mol

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Implemented

  • Transformer Classes
    • SmilesToMol
    • Desc2DTransformer
    • MACCSTransformer
    • RDKitFPTransformer
    • AtomPairFingerprintTransformer
    • TopologicalTorsionFingerprintTransformer
    • MorganTransformer
    • SECFingerprintTransformer

  • Utilities
    • CheckSmilesSanitazion

Installation

Users can install latest tagged release from pip

pip install scikit-mol

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Developers

git clone git@github.com:EBjerrum/scikit-mol.git
pip install -e .

Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

BUGS

Probably still

Contributers:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_mol-0.1.1.tar.gz (116.1 kB view details)

Uploaded Source

Built Distribution

scikit_mol-0.1.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file scikit_mol-0.1.1.tar.gz.

File metadata

  • Download URL: scikit_mol-0.1.1.tar.gz
  • Upload date:
  • Size: 116.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for scikit_mol-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4daa79c8abbb31aaf1857578761c07d55d63a15027383680eabe9bf4f655fab8
MD5 1e225213e0e5d8800db02fbec8644edf
BLAKE2b-256 422ae29fe1c6a00ccc02928dd35fbc2ea3e946e59bae986e7e8ca7d65e482f9a

See more details on using hashes here.

File details

Details for the file scikit_mol-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: scikit_mol-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for scikit_mol-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f8fd8a3ece06142f8dbc8d0298f9e32878012521b584a7e1f8616fa3942ea7a1
MD5 5cb908d06cf59f5b98abb6a1a6682865
BLAKE2b-256 aaa753ddd50f4bb4b27c9621fdb48b42e69884909125482805831400b3e00afe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page