scikit-learn classes for molecule transformation
Project description
scikit-mol
Scikit-Learn classes for molecular vectorization using RDKit
The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings
As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:
pipe = Pipeline([('mol_transformer', MorganTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])
>>> array([4.93858815])
The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities
The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14
Implemented
- Transformer Classes
- SmilesToMol
- Desc2DTransformer
- MACCSTransformer
- RDKitFPTransformer
- AtomPairFingerprintTransformer
- TopologicalTorsionFingerprintTransformer
- MorganTransformer
- Utilities
- CheckSmilesSanitazion
Installation
Users can install latest tagged release from pip
pip install scikit-mol
Bleeding edge
pip install git+https://github.com:EBjerrum/scikit-mol.git
Developers
git clone git@github.com:EBjerrum/scikit-mol.git
pip install -e .
Documentation
None yet, but there are some # %% delimted examples in the notebooks directory that have some demonstrations
BUGS
Probably still
TODO
- Unit test coverage of classes
- If possible return same type as input (e.g. List to list, numpy to numpy, pandas Series to pandas series)
- Docstrings for classes and methods
- Make further example notebooks
- Standalone usage (not in pipeline)
- Advanced pipelining
- Hyperparameter optimization via external optimizer e.g. https://scikit-optimize.github.io/stable/
Ideas
- LINGOS transformer
Contributers:
- Esben Jannik Bjerrum, esben@cheminformania.com
- Carmen Esposito https://github.com/cespos
- Son Ha, sonha@uni-mainz.de
- Oh-hyeon Choung, ohhyeon.choung@gmail.com
- Andreas Poehlmann, https://github.com/ap--
- Ya Chen, https://github.com/anya-chen
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scikit_mol-0.0.2.tar.gz
.
File metadata
- Download URL: scikit_mol-0.0.2.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a694878acce64f60a69859f771ad8f7ea9705e62daa4b624f3bcd555f914034 |
|
MD5 | 7d7a81ae2d9a8abf5ceb6c3c6a2fa7df |
|
BLAKE2b-256 | 9b08be0e403d82747f5428b0a43c6a4a83a862db266e7fe79be27a53f8cea068 |
File details
Details for the file scikit_mol-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: scikit_mol-0.0.2-py3-none-any.whl
- Upload date:
- Size: 11.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7ef614f92ea0e4500f3216b22eb062ffc689a2c13efce91b9f4600111592294 |
|
MD5 | 68943ae9e2091bba484ef3ca0ab0557c |
|
BLAKE2b-256 | 1fbe17ec3ff33c46b5e5b3c334d6a4ed49ae044ac2fdae0160d03dc334ba2e81 |