Skip to main content

molfeat - the hub for all your molecular featurizers

Project description

molfeat - the hub for all your molecular featurizers

Docs | Homepage


DOI PyPI Conda PyPI - Downloads Conda PyPI - Python Version license GitHub Repo stars GitHub Repo stars test code-check doc release

Molfeat is a hub of molecular featurizers. It supports a wide variety of out-of-the-box molecular featurizers and can be easily extended to include your own custom featurizers.

  • 🚀 Fast, with a simple and efficient API.
  • 🔄 Unify pre-trained molecular embeddings and hand-crafted featurizers in a single package.
  • ➕ Easily add your own featurizers through plugins.
  • 📈 Benefit from increased performance through a trouble-free caching system.

Visit our website at https://molfeat.datamol.io.

Installation

Installing Molfeat

Use mamba:

mamba install -c conda-forge molfeat

Tips: You can replace mamba by conda.

Note: We highly recommend using a Conda Python distribution to install Molfeat. The package is also pip installable if you need it: pip install molfeat.

Optional dependencies

Not all featurizers in the Molfeat core package are supported by default. Some featurizers require additional dependencies. If you try to use a featurizer that requires additional dependencies, Molfeat will raise an error and tell you which dependencies are missing and how to install them.

  • To install dgl: run mamba install -c dglteam "dgl<=2.0" # there is some issue with "dgl>2.0.0" related to graphbolt
  • To install dgllife: run mamba install -c conda-forge dgllife
  • To install fcd_torch: run mamba install -c conda-forge fcd_torch
  • To install pyg: run mamba install -c conda-forge pytorch_geometric
  • To install graphormer-pretrained: run mamba install -c conda-forge graphormer-pretrained
  • To install map4: see https://github.com/reymond-group/map4
  • To install bio-embeddings: run mamba install -c conda-forge 'bio-embeddings >=0.2.2'

If you install Molfeat using pip, there are optional dependencies that can be installed with the main package. For example, pip install "molfeat[all]" allows installing all the compatible optional dependencies for small molecule featurization. There are other options such as molfeat[dgl], molfeat[graphormer], molfeat[transformer], molfeat[viz], and molfeat[fcd]. See the optional-dependencies for more information.

Installing Plugins

The functionality of Molfeat can be extended through plugins. The use of a plugin system ensures that the core package remains easy to install and as light as possible, while making it easy to extend its functionality with plug-and-play components. Additionally, it ensures that plugins can be developed independently from the core package, removing the bottleneck of a central party that reviews and approves new plugins. Consult the molfeat documentation for more details on how to create your own plugins.

However, this does imply that the installation of a plugin is plugin-dependent: please consult the relevant documentation to learn more.

API tour

import datamol as dm
from molfeat.calc import FPCalculator
from molfeat.trans import MoleculeTransformer
from molfeat.store.modelstore import ModelStore

# Load some dummy data
data = dm.data.freesolv().sample(100).smiles.values

# Featurize a single molecule
calc = FPCalculator("ecfp")
calc(data[0])

# Define a parallelized featurization pipeline
mol_transf = MoleculeTransformer(calc, n_jobs=-1)
mol_transf(data)

# Easily save and load featurizers
mol_transf.to_state_yaml_file("state_dict.yml")
mol_transf = MoleculeTransformer.from_state_yaml_file("state_dict.yml")
mol_transf(data)

# List all available featurizers
store = ModelStore()
store.available_models

# Find a featurizer and learn how to use it
model_card = store.search(name="ChemBERTa-77M-MLM")[0]
model_card.usage()

How to cite

Please cite Molfeat if you use it in your research: DOI.

Contribute

See developers for a comprehensive guide on how to contribute to molfeat. molfeat is a community-led initiative and whether you're a first-time contributor or an open-source veteran, this project greatly benefits from your contributions. To learn more about the community and datamol.io ecosystem, please see community.

Maintainers

  • @cwognum
  • @maclandrol
  • @hadim

License

Under the Apache-2.0 license. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molfeat-0.10.1.tar.gz (408.4 kB view details)

Uploaded Source

Built Distribution

molfeat-0.10.1-py3-none-any.whl (165.3 kB view details)

Uploaded Python 3

File details

Details for the file molfeat-0.10.1.tar.gz.

File metadata

  • Download URL: molfeat-0.10.1.tar.gz
  • Upload date:
  • Size: 408.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for molfeat-0.10.1.tar.gz
Algorithm Hash digest
SHA256 d033b93bdd025d7e9f0aeec0be038d1a21c76ef6b538d2ce2380d06f45b05bb2
MD5 07bcaf19eedbf385474339a00719b4ad
BLAKE2b-256 7dcb7e8d0c42e30a9c9acc2338d5c51b9fd9b2ab293c39f625b9a8a90874c741

See more details on using hashes here.

File details

Details for the file molfeat-0.10.1-py3-none-any.whl.

File metadata

  • Download URL: molfeat-0.10.1-py3-none-any.whl
  • Upload date:
  • Size: 165.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for molfeat-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 85e300f73fda9ae2514746ffa5034a47e7b7b3192a86c1bd17f03754e1feda9c
MD5 becfc8715ed84a856a63506971ca04f6
BLAKE2b-256 d9c4012b8c78020ee38e921c1de9510c570af2f6d5a68b613b133d063d52a3ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page