Skip to main content

C++ module for featurizing molecules

Project description

cuik-molmaker

cuik-molmaker is a specialized package designed for molecular featurization, converting chemical structures into formats that can be effectively used as inputs for deep learning models, particularly graph neural networks (GNNs).

cuik-molmaker is built as a hybrid package, leveraging both C++ and Python to deliver high performance and ease of use. The core featurization logic is implemented in C++ for maximum speed and efficiency, while the Python interface provides a user-friendly API that integrates seamlessly with modern GNN training and inference workflows. This design combines the computational power of C++ with the flexibility and accessibility of Python, making cuik-molmaker both fast and intuitive for researchers and developers. As cuik-molmaker interfaces with the C++ API of rdkit, the produced features are expected to be identical to those produced by rdkit.

Quick start

Setup conda environment

# Set environment variables
export PYTHON_VERSION=3.11
export RDKIT_VERSION=2025.03.2

conda create -n cuik_molmaker_env python=${PYTHON_VERSION} conda-forge::rdkit==${RDKIT_VERSION} conda-forge::pybind11==2.13.6 conda-forge::libboost-devel==1.86.0 conda-forge::libboost-python-devel==1.86.0 

conda activate cuik_molmaker_env

This step is optional if you already have a conda environment with the required dependencies.

Install wheel from NVIDIA PyPI

We provide a handy script to install the wheel from NVIDIA PyPI based on your OS and other dependencies.

python scripts/check_and_install_cuik_molmaker.py

Usage: Computing atom and bond features

import cuik_molmaker
import numpy as np

# List all available atom onehot features
print(cuik_molmaker.list_all_atom_onehot_features())

# Compute atom (atomic number, number of hydrogen, chirality) and bond (bond type) features for acetic acid
acetic_acid_smiles = "CC(=O)O"

# Get atom onehot feature names as NumPy array
atom_onehot_feature_array = cuik_molmaker.atom_onehot_feature_names_to_array(['atomic-number', 'num-hydrogens', 'chirality'])

# Get bond feature names as NumPy array
bond_feature_array = cuik_molmaker.bond_feature_names_to_array(['bond-type-onehot'])

# Set parameters for featurization
explicit_h, offset_carbon, duplicate_edges, add_self_loop = False, False, True, False

# Featurize
all_features =cuik_molmaker.mol_featurizer(acetic_acid_smiles, atom_onehot_feature_array, np.array([]), bond_feature_array, explicit_h, offset_carbon, duplicate_edges, add_self_loop)

# This returns a list of NumPy arrays.
# First index contains atom features
print(all_features[0].shape)

# Second index contains bond features
print(all_features[1].shape)

# Third index contains edge indices in COO format
print(all_features[2].shape)

Usage: Computing molecular descriptors

from cuik_molmaker.mol_features import MoleculeFeaturizer

featurizer = MoleculeFeaturizer(molecular_descriptor_type="rdkit2D", rdkit2D_normalization_type="fast")

smiles_list = ["CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin
               "CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1", # diphenhydramine
]
rdkit2D_descriptors = featurizer.featurize(smiles_list)

# Print the shape of the descriptors
print(rdkit2D_descriptors.shape)

Source of acceleration

The hybrid C++/Python design of cuik-molmaker allows for the core featurization logic to be implemented in C++ and reduces the python overhead. Another source of acceleration is the creation of features for the entire minibatch of SMILES at once, which saves the overhead of creating memory allocation and concatenation.

Additional Documentation

File Description
USAGE.md Examples and instructions for using cuik-molmaker to featurize molecules including batching.
FEATURES.md Detailed list and explanation of all atom and bond features available for featurization.
BUILD.md Step-by-step instructions for building cuik-molmaker from source, including prerequisites and troubleshooting.
TESTING.md Guidelines and commands for running the test suite to verify installation and functionality.

Hardware Requirements

cuik-molmaker is designed to run on any CPU-based system.

Adoption

cuik-molmaker has currently been integrated into the following projects:

  • Chemprop: cuik-molmaker is available for use with conda and Docker installations of Chemprop. It can be enabled by setting --use-cuikmolmaker-featurization flag in the command line with all use cases: training, prediction, fingerprinting, and hyperparameter optimization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuik_molmaker_pin-2025.9.3-py313-none-win_amd64.whl (241.8 kB view details)

Uploaded Python 3.13Windows x86-64

cuik_molmaker_pin-2025.9.3-py313-none-manylinux_2_24_x86_64.whl (305.3 kB view details)

Uploaded Python 3.13manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.3-py313-none-macosx_11_0_x86_64.whl (268.5 kB view details)

Uploaded Python 3.13macOS 11.0+ x86-64

cuik_molmaker_pin-2025.9.3-py312-none-win_amd64.whl (241.8 kB view details)

Uploaded Python 3.12Windows x86-64

cuik_molmaker_pin-2025.9.3-py312-none-manylinux_2_24_x86_64.whl (304.6 kB view details)

Uploaded Python 3.12manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.3-py312-none-macosx_11_0_x86_64.whl (268.1 kB view details)

Uploaded Python 3.12macOS 11.0+ x86-64

cuik_molmaker_pin-2025.9.3-py311-none-win_amd64.whl (241.0 kB view details)

Uploaded Python 3.11Windows x86-64

cuik_molmaker_pin-2025.9.3-py311-none-manylinux_2_24_x86_64.whl (305.5 kB view details)

Uploaded Python 3.11manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.3-py311-none-macosx_11_0_x86_64.whl (269.9 kB view details)

Uploaded Python 3.11macOS 11.0+ x86-64

File details

Details for the file cuik_molmaker_pin-2025.9.3-py313-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.3-py313-none-win_amd64.whl
Algorithm Hash digest
SHA256 e89f834e46883175538dd14938dffd4c2bfcb41febf0b1b7897224c3257afcbc
MD5 143014b9485a9270394a16a6aa7dfc6f
BLAKE2b-256 370bd24a10292d2cee4ddcbdbae3efd66426b046bacaba9a74d7e851c26de301

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.3-py313-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.3-py313-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 c4af9379035543e7fc651b1854614f21c419ef068e891578ab4ec58db6d9e025
MD5 49990f1ece8e665264f72cc57ffaa5be
BLAKE2b-256 9a0931fb33dd01cda8badb513781ef117e440fd15f9ae05eda176f7731c9b9e8

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.3-py313-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.3-py313-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 3904496a413b8181c4c828b4e07a5813b3fcd619aed896be9f5c477abbd45a3f
MD5 3038655cd0ed3d207b6e30fbeccfb644
BLAKE2b-256 92567ba251f1a75dd7c4d6e8aa9c01a7c9f1ff9e7940e053d7894b330d9c14f0

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.3-py312-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.3-py312-none-win_amd64.whl
Algorithm Hash digest
SHA256 b081b56dd2c3fd37578e1747669eba72e7555094be2f2930145c85ac4435fcef
MD5 2d8354968e60f7da0f54efc885df76da
BLAKE2b-256 9d311d407505fd73c4bf6b1e175d6647cbf4bded9bbc25228f83551bfd376482

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.3-py312-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.3-py312-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 fd1f3f9c1f90083c8d604d32841c1c1750643a29bc8dc473396ed3e65dcd99f7
MD5 a3b962fc357037b686500c4a1c30f52a
BLAKE2b-256 bddfdaffd0f94ecd6439108fc68f5aea97cc84119f5c4ee5fd2a69bd0b582bcc

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.3-py312-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.3-py312-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 67d6ca4bb5858d902d5cf9bcace9c9230172624f4407e891e348e0b1f9329d47
MD5 5b4c9f2aa39bf715b015b73831d45fa9
BLAKE2b-256 988dbfed6dace9184b6f727af9ba2d5d1688184f246df56b2e35e7dc17b47ddc

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.3-py311-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.3-py311-none-win_amd64.whl
Algorithm Hash digest
SHA256 450f97ff0cf5e6e3e63fb58559dc3d2c47990ff6a3b6a6db54b44a897a217739
MD5 02868f6a650683a069324a0678a93d47
BLAKE2b-256 cd073018fc1ab35507fe6dd154d825a7d3067e76a8a45d42295e6a0b0ea597a5

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.3-py311-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.3-py311-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 646b0334d4496320dd09a2aa1a037a3786d3e6b3a66e70170ad22ec2a9d7202c
MD5 6f328108cf6388de076880a89b50adb7
BLAKE2b-256 af69525f9229a9f87210492136a518c91fbcec929dd28de3ebe0fb1960efc596

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.3-py311-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.3-py311-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 cbd58279ba429ecdadc968aad029c15e62cba19c6d0bdeb1a71b4c22307f31b1
MD5 ed1759d9096d642bffa17560a76c3d17
BLAKE2b-256 7978080210cb79997db49cb9223a4a63c42248c41c508786d1f5d146bd8a3af8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page