Skip to main content

C++ module for featurizing molecules

Project description

cuik-molmaker

cuik-molmaker is a specialized package designed for molecular featurization, converting chemical structures into formats that can be effectively used as inputs for deep learning models, particularly graph neural networks (GNNs).

cuik-molmaker is built as a hybrid package, leveraging both C++ and Python to deliver high performance and ease of use. The core featurization logic is implemented in C++ for maximum speed and efficiency, while the Python interface provides a user-friendly API that integrates seamlessly with modern GNN training and inference workflows. This design combines the computational power of C++ with the flexibility and accessibility of Python, making cuik-molmaker both fast and intuitive for researchers and developers. As cuik-molmaker interfaces with the C++ API of rdkit, the produced features are expected to be identical to those produced by rdkit.

Quick start

Setup conda environment

# Set environment variables
export PYTHON_VERSION=3.11
export RDKIT_VERSION=2025.03.2

conda create -n cuik_molmaker_env python=${PYTHON_VERSION} conda-forge::rdkit==${RDKIT_VERSION} conda-forge::pybind11==2.13.6 conda-forge::libboost-devel==1.86.0 conda-forge::libboost-python-devel==1.86.0 

conda activate cuik_molmaker_env

This step is optional if you already have a conda environment with the required dependencies.

Install wheel from NVIDIA PyPI

We provide a handy script to install the wheel from NVIDIA PyPI based on your OS and other dependencies.

python scripts/check_and_install_cuik_molmaker.py

Usage: Computing atom and bond features

import cuik_molmaker
import numpy as np

# List all available atom onehot features
print(cuik_molmaker.list_all_atom_onehot_features())

# Compute atom (atomic number, number of hydrogen, chirality) and bond (bond type) features for acetic acid
acetic_acid_smiles = "CC(=O)O"

# Get atom onehot feature names as NumPy array
atom_onehot_feature_array = cuik_molmaker.atom_onehot_feature_names_to_array(['atomic-number', 'num-hydrogens', 'chirality'])

# Get bond feature names as NumPy array
bond_feature_array = cuik_molmaker.bond_feature_names_to_array(['bond-type-onehot'])

# Set parameters for featurization
explicit_h, offset_carbon, duplicate_edges, add_self_loop = False, False, True, False

# Featurize
all_features =cuik_molmaker.mol_featurizer(acetic_acid_smiles, atom_onehot_feature_array, np.array([]), bond_feature_array, explicit_h, offset_carbon, duplicate_edges, add_self_loop)

# This returns a list of NumPy arrays.
# First index contains atom features
print(all_features[0].shape)

# Second index contains bond features
print(all_features[1].shape)

# Third index contains edge indices in COO format
print(all_features[2].shape)

Usage: Computing molecular descriptors

from cuik_molmaker.mol_features import MoleculeFeaturizer

featurizer = MoleculeFeaturizer(molecular_descriptor_type="rdkit2D", rdkit2D_normalization_type="fast")

smiles_list = ["CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin
               "CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1", # diphenhydramine
]
rdkit2D_descriptors = featurizer.featurize(smiles_list)

# Print the shape of the descriptors
print(rdkit2D_descriptors.shape)

Source of acceleration

The hybrid C++/Python design of cuik-molmaker allows for the core featurization logic to be implemented in C++ and reduces the python overhead. Another source of acceleration is the creation of features for the entire minibatch of SMILES at once, which saves the overhead of creating memory allocation and concatenation.

Additional Documentation

File Description
USAGE.md Examples and instructions for using cuik-molmaker to featurize molecules including batching.
FEATURES.md Detailed list and explanation of all atom and bond features available for featurization.
BUILD.md Step-by-step instructions for building cuik-molmaker from source, including prerequisites and troubleshooting.
TESTING.md Guidelines and commands for running the test suite to verify installation and functionality.

Hardware Requirements

cuik-molmaker is designed to run on any CPU-based system.

Adoption

cuik-molmaker has currently been integrated into the following projects:

  • Chemprop: cuik-molmaker is available for use with conda and Docker installations of Chemprop. It can be enabled by setting --use-cuikmolmaker-featurization flag in the command line with all use cases: training, prediction, fingerprinting, and hyperparameter optimization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuik_molmaker_pin-2025.9.6-py313-none-win_amd64.whl (241.8 kB view details)

Uploaded Python 3.13Windows x86-64

cuik_molmaker_pin-2025.9.6-py313-none-manylinux_2_24_x86_64.whl (305.4 kB view details)

Uploaded Python 3.13manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.6-py312-none-win_amd64.whl (241.8 kB view details)

Uploaded Python 3.12Windows x86-64

cuik_molmaker_pin-2025.9.6-py312-none-manylinux_2_24_x86_64.whl (304.6 kB view details)

Uploaded Python 3.12manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.6-py311-none-win_amd64.whl (241.0 kB view details)

Uploaded Python 3.11Windows x86-64

cuik_molmaker_pin-2025.9.6-py311-none-manylinux_2_24_x86_64.whl (305.5 kB view details)

Uploaded Python 3.11manylinux: glibc 2.24+ x86-64

File details

Details for the file cuik_molmaker_pin-2025.9.6-py313-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.6-py313-none-win_amd64.whl
Algorithm Hash digest
SHA256 471f1c1b2ddbae848575cabc39d11ca93c4d7cebe36a5fbdb00512b35fc06262
MD5 0a2094933587302ac8c5243dcbb82487
BLAKE2b-256 805f1aabefb2a0704ba295f24e529d78f2e142273fb22e3912d8e1e910a82345

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.6-py313-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.6-py313-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 6342d538e46792f36d02e1e23a86c58db6b8862b038251fd0c3bffabdc0b868b
MD5 6e7ce8fb6e29f366f1097665f6fbd9e9
BLAKE2b-256 cb896ceeb1f75f11ec4ffa9ed2f436fec3f83feb245fd956f7d2081ebcc322cf

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.6-py312-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.6-py312-none-win_amd64.whl
Algorithm Hash digest
SHA256 1e6d9295c1baa68ba868e5e8ee610b9ac5dc3ef35b9a7ad13b978fcdce453e76
MD5 0e16feb69b8540e3aeba51f266f41d14
BLAKE2b-256 c473070996f1a896477812258c151c3a4799dc774fa7541f16b05a3fb3df7cb9

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.6-py312-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.6-py312-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 aecf752c97030832dc4a2f916a1070c069f4e05c0ee286499e6c4a8f1527e52e
MD5 4e9214d796f7100eb6578aaa6a51a49b
BLAKE2b-256 883c59f01b0d6e31ef675cee0f3e5d6fed695416515736c9d1284fbbdc8bc963

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.6-py311-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.6-py311-none-win_amd64.whl
Algorithm Hash digest
SHA256 44199504600afe5d23e6fe1238f9a1a67c457a864b155f900e889f198c710186
MD5 a43581a60f73a49d9c4477df2a5c441b
BLAKE2b-256 cf7228c72f12b8d1b415c95f7288d54eb5e43312eb582f98238a1919de0ad4ac

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.6-py311-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.6-py311-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 4593c3a0c273500673a28f10e575a6352d1c91377d859930a199dbe4eb66aca2
MD5 d2081dba29a489b78c132dd5e8faac87
BLAKE2b-256 9cad63a30f83c0b7afd2456a23cfba72c2e816d0c66a8bc62e63cce971455ab7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page