Skip to main content

C++ module for featurizing molecules

Project description

cuik-molmaker

cuik-molmaker is a specialized package designed for molecular featurization, converting chemical structures into formats that can be effectively used as inputs for deep learning models, particularly graph neural networks (GNNs).

cuik-molmaker is built as a hybrid package, leveraging both C++ and Python to deliver high performance and ease of use. The core featurization logic is implemented in C++ for maximum speed and efficiency, while the Python interface provides a user-friendly API that integrates seamlessly with modern GNN training and inference workflows. This design combines the computational power of C++ with the flexibility and accessibility of Python, making cuik-molmaker both fast and intuitive for researchers and developers. As cuik-molmaker interfaces with the C++ API of rdkit, the produced features are expected to be identical to those produced by rdkit.

Quick start

Setup conda environment

# Set environment variables
export PYTHON_VERSION=3.11
export RDKIT_VERSION=2025.03.2

conda create -n cuik_molmaker_env python=${PYTHON_VERSION} conda-forge::rdkit==${RDKIT_VERSION} conda-forge::pybind11==2.13.6 conda-forge::libboost-devel==1.86.0 conda-forge::libboost-python-devel==1.86.0 

conda activate cuik_molmaker_env

This step is optional if you already have a conda environment with the required dependencies.

Install wheel from NVIDIA PyPI

We provide a handy script to install the wheel from NVIDIA PyPI based on your OS and other dependencies.

python scripts/check_and_install_cuik_molmaker.py

Usage: Computing atom and bond features

import cuik_molmaker
import numpy as np

# List all available atom onehot features
print(cuik_molmaker.list_all_atom_onehot_features())

# Compute atom (atomic number, number of hydrogen, chirality) and bond (bond type) features for acetic acid
acetic_acid_smiles = "CC(=O)O"

# Get atom onehot feature names as NumPy array
atom_onehot_feature_array = cuik_molmaker.atom_onehot_feature_names_to_array(['atomic-number', 'num-hydrogens', 'chirality'])

# Get bond feature names as NumPy array
bond_feature_array = cuik_molmaker.bond_feature_names_to_array(['bond-type-onehot'])

# Set parameters for featurization
explicit_h, offset_carbon, duplicate_edges, add_self_loop = False, False, True, False

# Featurize
all_features =cuik_molmaker.mol_featurizer(acetic_acid_smiles, atom_onehot_feature_array, np.array([]), bond_feature_array, explicit_h, offset_carbon, duplicate_edges, add_self_loop)

# This returns a list of NumPy arrays.
# First index contains atom features
print(all_features[0].shape)

# Second index contains bond features
print(all_features[1].shape)

# Third index contains edge indices in COO format
print(all_features[2].shape)

Usage: Computing molecular descriptors

from cuik_molmaker.mol_features import MoleculeFeaturizer

featurizer = MoleculeFeaturizer(molecular_descriptor_type="rdkit2D", rdkit2D_normalization_type="fast")

smiles_list = ["CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin
               "CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1", # diphenhydramine
]
rdkit2D_descriptors = featurizer.featurize(smiles_list)

# Print the shape of the descriptors
print(rdkit2D_descriptors.shape)

Source of acceleration

The hybrid C++/Python design of cuik-molmaker allows for the core featurization logic to be implemented in C++ and reduces the python overhead. Another source of acceleration is the creation of features for the entire minibatch of SMILES at once, which saves the overhead of creating memory allocation and concatenation.

Additional Documentation

File Description
USAGE.md Examples and instructions for using cuik-molmaker to featurize molecules including batching.
FEATURES.md Detailed list and explanation of all atom and bond features available for featurization.
BUILD.md Step-by-step instructions for building cuik-molmaker from source, including prerequisites and troubleshooting.
TESTING.md Guidelines and commands for running the test suite to verify installation and functionality.

Hardware Requirements

cuik-molmaker is designed to run on any CPU-based system.

Adoption

cuik-molmaker has currently been integrated into the following projects:

  • Chemprop: cuik-molmaker is available for use with conda and Docker installations of Chemprop. It can be enabled by setting --use-cuikmolmaker-featurization flag in the command line with all use cases: training, prediction, fingerprinting, and hyperparameter optimization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuik_molmaker_pin-2025.9.4-py313-none-win_amd64.whl (241.8 kB view details)

Uploaded Python 3.13Windows x86-64

cuik_molmaker_pin-2025.9.4-py313-none-manylinux_2_24_x86_64.whl (305.4 kB view details)

Uploaded Python 3.13manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.4-py312-none-win_amd64.whl (241.8 kB view details)

Uploaded Python 3.12Windows x86-64

cuik_molmaker_pin-2025.9.4-py312-none-manylinux_2_24_x86_64.whl (304.6 kB view details)

Uploaded Python 3.12manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.4-py311-none-win_amd64.whl (241.0 kB view details)

Uploaded Python 3.11Windows x86-64

cuik_molmaker_pin-2025.9.4-py311-none-manylinux_2_24_x86_64.whl (305.6 kB view details)

Uploaded Python 3.11manylinux: glibc 2.24+ x86-64

File details

Details for the file cuik_molmaker_pin-2025.9.4-py313-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.4-py313-none-win_amd64.whl
Algorithm Hash digest
SHA256 6aeb1d33a414ff5c8d185f1faca561911b737962106436d0176861883689425a
MD5 1670cd2ad6cc09a8616f2307aa74e2a5
BLAKE2b-256 138058143d0e9b78bbfd3b79364dfa01756e6af5458849a69e5a88f3c826c6e6

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.4-py313-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.4-py313-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 aac7b9bf542a0aaeb17ae3ae7fcac55d0b63f0c99ce943926845813e095eae4f
MD5 50ea516142d99f4f22ebeff71920ac78
BLAKE2b-256 988e2fb0285fe379beeb0995302df8837c2377656e0b931175157ec724598cfc

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.4-py312-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.4-py312-none-win_amd64.whl
Algorithm Hash digest
SHA256 ef1cc6786e3611b968a1ba9183b74731a5bcb9beae7df9ecf790777f54f989d5
MD5 75aa3609bf799f19db8f6f16281456e4
BLAKE2b-256 cfcd546fa935016aa060f686c031d619daf66f858466c9897e5934433f6d2713

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.4-py312-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.4-py312-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 d868229cf92c801942cb5590f4a48773ed3cbcfd2cc96788caf72168dcf79a8e
MD5 dab4e152c27eb7400151aeb260fcee40
BLAKE2b-256 d3050bc7e185063cc7530b82f0ffae1a6d08e1d97fff753b19f03a8ed3788d69

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.4-py311-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.4-py311-none-win_amd64.whl
Algorithm Hash digest
SHA256 07362dae4defaaf03607919d9e30ee5e6eb2d703c59c8a609bc5e8848648b2b6
MD5 a732d3e4e8c3eab24d5d81608393dfcd
BLAKE2b-256 05344a54b090c7dff0baea0bb68da471a661e4d20b8741fcdfedc845cd43b083

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.4-py311-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.4-py311-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 dca1007d16aa6787f9a930869687f052077a1de9789ea8a44e6c3ba9b0fe23cc
MD5 9a9edd08a6ef271aba843744fe950d38
BLAKE2b-256 811121e150016fabd7e882460ed8d077ea140701be996481b308ab55042c867a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page