Skip to main content

C++ module for featurizing molecules

Project description

cuik-molmaker

cuik-molmaker is a specialized package designed for molecular featurization, converting chemical structures into formats that can be effectively used as inputs for deep learning models, particularly graph neural networks (GNNs).

cuik-molmaker is built as a hybrid package, leveraging both C++ and Python to deliver high performance and ease of use. The core featurization logic is implemented in C++ for maximum speed and efficiency, while the Python interface provides a user-friendly API that integrates seamlessly with modern GNN training and inference workflows. This design combines the computational power of C++ with the flexibility and accessibility of Python, making cuik-molmaker both fast and intuitive for researchers and developers. As cuik-molmaker interfaces with the C++ API of rdkit, the produced features are expected to be identical to those produced by rdkit.

Quick start

Setup conda environment

# Set environment variables
export PYTHON_VERSION=3.11
export RDKIT_VERSION=2025.03.2

conda create -n cuik_molmaker_env python=${PYTHON_VERSION} conda-forge::rdkit==${RDKIT_VERSION} conda-forge::pybind11==2.13.6 conda-forge::libboost-devel==1.86.0 conda-forge::libboost-python-devel==1.86.0 

conda activate cuik_molmaker_env

This step is optional if you already have a conda environment with the required dependencies.

Install wheel from NVIDIA PyPI

We provide a handy script to install the wheel from NVIDIA PyPI based on your OS and other dependencies.

python scripts/check_and_install_cuik_molmaker.py

Usage: Computing atom and bond features

import cuik_molmaker
import numpy as np

# List all available atom onehot features
print(cuik_molmaker.list_all_atom_onehot_features())

# Compute atom (atomic number, number of hydrogen, chirality) and bond (bond type) features for acetic acid
acetic_acid_smiles = "CC(=O)O"

# Get atom onehot feature names as NumPy array
atom_onehot_feature_array = cuik_molmaker.atom_onehot_feature_names_to_array(['atomic-number', 'num-hydrogens', 'chirality'])

# Get bond feature names as NumPy array
bond_feature_array = cuik_molmaker.bond_feature_names_to_array(['bond-type-onehot'])

# Set parameters for featurization
explicit_h, offset_carbon, duplicate_edges, add_self_loop = False, False, True, False

# Featurize
all_features =cuik_molmaker.mol_featurizer(acetic_acid_smiles, atom_onehot_feature_array, np.array([]), bond_feature_array, explicit_h, offset_carbon, duplicate_edges, add_self_loop)

# This returns a list of NumPy arrays.
# First index contains atom features
print(all_features[0].shape)

# Second index contains bond features
print(all_features[1].shape)

# Third index contains edge indices in COO format
print(all_features[2].shape)

Usage: Computing molecular descriptors

from cuik_molmaker.mol_features import MoleculeFeaturizer

featurizer = MoleculeFeaturizer(molecular_descriptor_type="rdkit2D", rdkit2D_normalization_type="fast")

smiles_list = ["CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin
               "CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1", # diphenhydramine
]
rdkit2D_descriptors = featurizer.featurize(smiles_list)

# Print the shape of the descriptors
print(rdkit2D_descriptors.shape)

Source of acceleration

The hybrid C++/Python design of cuik-molmaker allows for the core featurization logic to be implemented in C++ and reduces the python overhead. Another source of acceleration is the creation of features for the entire minibatch of SMILES at once, which saves the overhead of creating memory allocation and concatenation.

Additional Documentation

File Description
USAGE.md Examples and instructions for using cuik-molmaker to featurize molecules including batching.
FEATURES.md Detailed list and explanation of all atom and bond features available for featurization.
BUILD.md Step-by-step instructions for building cuik-molmaker from source, including prerequisites and troubleshooting.
TESTING.md Guidelines and commands for running the test suite to verify installation and functionality.

Hardware Requirements

cuik-molmaker is designed to run on any CPU-based system.

Adoption

cuik-molmaker has currently been integrated into the following projects:

  • Chemprop: cuik-molmaker is available for use with conda and Docker installations of Chemprop. It can be enabled by setting --use-cuikmolmaker-featurization flag in the command line with all use cases: training, prediction, fingerprinting, and hyperparameter optimization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuik_molmaker_pin-2026.3.2-py313-none-win_amd64.whl (242.0 kB view details)

Uploaded Python 3.13Windows x86-64

cuik_molmaker_pin-2026.3.2-py313-none-manylinux_2_24_x86_64.whl (305.7 kB view details)

Uploaded Python 3.13manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2026.3.2-py312-none-win_amd64.whl (242.0 kB view details)

Uploaded Python 3.12Windows x86-64

cuik_molmaker_pin-2026.3.2-py312-none-manylinux_2_24_x86_64.whl (305.0 kB view details)

Uploaded Python 3.12manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2026.3.2-py311-none-win_amd64.whl (241.2 kB view details)

Uploaded Python 3.11Windows x86-64

cuik_molmaker_pin-2026.3.2-py311-none-manylinux_2_24_x86_64.whl (305.8 kB view details)

Uploaded Python 3.11manylinux: glibc 2.24+ x86-64

File details

Details for the file cuik_molmaker_pin-2026.3.2-py313-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.2-py313-none-win_amd64.whl
Algorithm Hash digest
SHA256 297a272562070ba6a4ea31fb0fc913be53595dedffcd8792286f1583ec0aa203
MD5 5757c0cb3d5a8b39d580ca3883754638
BLAKE2b-256 59fbfcccd6cf77e94dda82521aa2082f15730592f97683e776e1371e0e40d9f8

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.2-py313-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.2-py313-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 c7b57eb5d9b8f1b1d516e7665c9cc4779bb3ff0193bb05492d01aa0c1c07e84a
MD5 5e1f71190106077a602e7df90ce981d8
BLAKE2b-256 fbecd5c5f8a6cde79d62e81aeab104a6ea45255303b1f099e9724b5c30f80878

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.2-py312-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.2-py312-none-win_amd64.whl
Algorithm Hash digest
SHA256 5c4f1a3a8d18155d8e7841273d459112a5bbc24a8eb5b0b80c9ec25360387164
MD5 0eb9d0f3a4bd6eb7fc11967d5859e0f9
BLAKE2b-256 cfe563e235ede7843fb856bcfe9f8183fe8befaf6bdf5ae659570537c0682b9a

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.2-py312-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.2-py312-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 1cba49cace7df18e53c68b1206ad7b3f8604183cf1baff2bf2eb77feffa72de7
MD5 411dbfd75724efd3c882700602f01858
BLAKE2b-256 8b6e3737cc1527897dd5d4c775591f2fe8b0feae51ee5a8bfaa792ede7deb5ba

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.2-py311-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.2-py311-none-win_amd64.whl
Algorithm Hash digest
SHA256 c377c7996ae41088a22b820100461ce2957af6a24cce7b69dc87981815c5e8ce
MD5 a0d3964eff12a0075aeda9ccf825bd37
BLAKE2b-256 a9bb9bdcb895b26e50babf7697acc10eaa5caf56a66fe79a660a82f3e58bd5bf

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.2-py311-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.2-py311-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 9f2104b1e056e4f5df14750f35c93e5ff5543f7eec95ebe16183b80d4c15cf87
MD5 533b1aee3555a5eb52f8afac2df0e5a8
BLAKE2b-256 600479c9f5070250727f436efbb588ced260e39ed1eeeddd6f81cd1e4656c6bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page