Skip to main content

scikit-learn classes for molecule transformation

Project description

scikit-mol

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Implemented

  • Transformer Classes
    • SmilesToMol
    • Desc2DTransformer
    • MACCSTransformer
    • RDKitFPTransformer
    • AtomPairFingerprintTransformer
    • TopologicalTorsionFingerprintTransformer
    • MorganTransformer
    • SECFingerprintTransformer

  • Utilities
    • CheckSmilesSanitazion

Installation

Users can install latest tagged release from pip

pip install scikit-mol

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Developers

git clone git@github.com:EBjerrum/scikit-mol.git
pip install -e .

Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

BUGS

Probably still, please check issues at GitHub and report there

Contributers:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_mol-0.2.0.tar.gz (714.9 kB view details)

Uploaded Source

Built Distribution

scikit_mol-0.2.0-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file scikit_mol-0.2.0.tar.gz.

File metadata

  • Download URL: scikit_mol-0.2.0.tar.gz
  • Upload date:
  • Size: 714.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for scikit_mol-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d7fa4aab9e3740c08803deedffbf705127f8601b4fd39a3872fc20fff3eb3841
MD5 c58dc889e9ef6e2724418a6f317c8e25
BLAKE2b-256 d5f7ce58c818a75980a20dbb412a20d62e02ed40aa0da7d285b5173d5158ecde

See more details on using hashes here.

File details

Details for the file scikit_mol-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: scikit_mol-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for scikit_mol-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5a6b74127b560e382cd1658a683fb0cf244e866e4ce841f1debdf25673ba8c8
MD5 d5cc9c2725621dff0db52e663317a709
BLAKE2b-256 45f10664ba83fde0556f8227bc5fc15a2ac44aff1c3667dd95aaefe29a991175

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page