Skip to main content

scikit-learn classes for molecule transformation

Project description

scikit-mol

Scikit-Mol Logo

python versions

pypi version conda version license

powered by rdkit Ruff

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Installation

Users can install latest tagged release from pip

pip install scikit-mol

or from conda-forge

conda install -c conda-forge scikit-mol

The conda forge package should get updated shortly after a new tagged release on pypi.

Bleeding edge

pip install git+https://github.com/EBjerrum/scikit-mol.git

Documentation

Example notebooks and API documentation are now hosted on https://scikit-mol.readthedocs.io

We also put a software note on ChemRxiv. https://doi.org/10.26434/chemrxiv-2023-fzqwd

Other use-examples

Scikit-Mol has been featured in blog-posts or used in research, some examples which are listed below:

Roadmap and Contributing

Help wanted! Are you a PhD student that want a "side-quest" to procrastinate your thesis writing or are you interested in computational chemistry, cheminformatics or simply with an interest in QSAR modelling, Python Programming open-source software? Do you want to learn more about machine learning with Scikit-Learn? Or do you use scikit-mol for your current work and would like to pay a little back to the project and see it improved as well? With a little bit of help, this project can be improved much faster! Reach to me (Esben), for a discussion about how we can proceed.

Currently, we are working on fixing some deprecation warnings, it's not the most exciting work, but it's important to maintain a little. Later on we need to go over the scikit-learn compatibility and update to some of their newer features on their estimator classes. We're also brewing on some feature enhancements and tests, such as new fingerprints and a more versatile standardizer.

There are more information about how to contribute to the project in CONTRIBUTING

BUGS

Probably still, please check issues at GitHub and report there

Contributors

Scikit-Mol has been developed as a community effort with contributions from people from many different companies, consortia, foundations and academic institutions.

Cheminformania Consulting, Aptuit, BASF, Bayer AG, Boehringer Ingelheim, Chodera Lab (MSKCC), EPAM Systems,ETH Zürich, Evotec, Johannes Gutenberg University, Martin Luther University, Odyssey Therapeutics, Open Molecular Software Foundation, Openfree.energy, Polish Academy of Sciences, Productivista, Simulations-Plus Inc., University of Vienna

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_mol-0.6.1.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scikit_mol-0.6.1-py3-none-any.whl (61.7 kB view details)

Uploaded Python 3

File details

Details for the file scikit_mol-0.6.1.tar.gz.

File metadata

  • Download URL: scikit_mol-0.6.1.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scikit_mol-0.6.1.tar.gz
Algorithm Hash digest
SHA256 1483b380b1bd9bce3195af38eb283fb2d3ee5876902e3d086046d7b537dca4be
MD5 af31db6b120ed62c539c8197c0cbdf08
BLAKE2b-256 0dc4010379af37ee2a4bf14e5bbceaeaae3986a99a68e2fd4ce9d12179059992

See more details on using hashes here.

Provenance

The following attestation bundles were made for scikit_mol-0.6.1.tar.gz:

Publisher: pytest.yaml on EBjerrum/scikit-mol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scikit_mol-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: scikit_mol-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 61.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scikit_mol-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e98b01e3f62c289e7935c1c240c074a7498148fc55f309262e27bba3c249f9f4
MD5 e5bfc539b622a51163fc9f271f7593b5
BLAKE2b-256 5ae80ded4360e61066186b0f62d73ac66241eac57b200d89cfe4720c5f28f771

See more details on using hashes here.

Provenance

The following attestation bundles were made for scikit_mol-0.6.1-py3-none-any.whl:

Publisher: pytest.yaml on EBjerrum/scikit-mol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page