Skip to main content

scikit-learn classes for molecule transformation

Project description

scikit-mol

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Implemented

  • Transformer Classes
    • SmilesToMol
    • Desc2DTransformer
    • MACCSTransformer
    • RDKitFPTransformer
    • AtomPairFingerprintTransformer
    • TopologicalTorsionFingerprintTransformer
    • MorganTransformer

  • Utilities
    • CheckSmilesSanitazion

Installation

Users can install latest tagged release from pip

pip install scikit-mol

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Developers

git clone git@github.com:EBjerrum/scikit-mol.git
pip install -e .

Documentation

None yet, but there are some # %% delimted examples in the notebooks directory that have some demonstrations

BUGS

Probably still

TODO

  • Unit test coverage of classes
  • If possible return same type as input (e.g. List to list, numpy to numpy, pandas Series to pandas series)
  • Docstrings for classes and methods
  • Make further example notebooks

Ideas

  • LINGOS transformer

Contributers:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_mol-0.0.2.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

scikit_mol-0.0.2-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file scikit_mol-0.0.2.tar.gz.

File metadata

  • Download URL: scikit_mol-0.0.2.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for scikit_mol-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3a694878acce64f60a69859f771ad8f7ea9705e62daa4b624f3bcd555f914034
MD5 7d7a81ae2d9a8abf5ceb6c3c6a2fa7df
BLAKE2b-256 9b08be0e403d82747f5428b0a43c6a4a83a862db266e7fe79be27a53f8cea068

See more details on using hashes here.

File details

Details for the file scikit_mol-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: scikit_mol-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for scikit_mol-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f7ef614f92ea0e4500f3216b22eb062ffc689a2c13efce91b9f4600111592294
MD5 68943ae9e2091bba484ef3ca0ab0557c
BLAKE2b-256 1fbe17ec3ff33c46b5e5b3c334d6a4ed49ae044ac2fdae0160d03dc334ba2e81

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page