Skip to main content

scikit-learn classes for molecule transformation

Project description

scikit-mol

Fancy logo Fancy logo

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Implemented

  • Descriptors
    • MolecularDescriptorTransformer

  • Fingerprints
    • MorganFingerprintTransformer
    • MACCSKeysFingerprintTransformer
    • RDKitFingerprintTransformer
    • AtomPairFingerprintTransformer
    • TopologicalTorsionFingerprintTransformer
    • MHFingerprintTransformer
    • SECFingerprintTransformer
    • AvalonFingerprintTransformer

  • Conversions
    • SmilesToMol

  • Standardizer
    • Standardizer

  • Utilities
    • CheckSmilesSanitazion

Installation

Users can install latest tagged release from pip

pip install scikit-mol

or from conda-forge

conda install -c conda-forge scikit-mol

The conda forge package should get updated shortly after a new tagged release on pypi.

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

Contributing

There are more information about how to contribute to the project in CONTRIBUTION.md

BUGS

Probably still, please check issues at GitHub and report there

Contributers:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_mol-0.4.1.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

scikit_mol-0.4.1-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file scikit_mol-0.4.1.tar.gz.

File metadata

  • Download URL: scikit_mol-0.4.1.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for scikit_mol-0.4.1.tar.gz
Algorithm Hash digest
SHA256 91bb94e9305a4b83f63c6f4d5ba576cf032ebed103643b5b9ecf52d21fad12da
MD5 12e8c7c6595ab28e64c20490d6a99f19
BLAKE2b-256 829f02d774608cb6ea033b0764c06afcbe0d1fee56fd8af2a455e37843e5b990

See more details on using hashes here.

File details

Details for the file scikit_mol-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: scikit_mol-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for scikit_mol-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 81be4b28981ff0051d7b7704b8fe06833512fbe65038a6e128bb37a92abb9ffb
MD5 dc503552e9e7180202990f236660aa85
BLAKE2b-256 7df7a2011c70a5e3e7f6716a2e42a585644e7eb07378f19787cf95d915575b6d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page