Skip to main content

C++ module for featurizing molecules

Project description

cuik-molmaker

cuik-molmaker is a specialized package designed for molecular featurization, converting chemical structures into formats that can be effectively used as inputs for deep learning models, particularly graph neural networks (GNNs).

cuik-molmaker is built as a hybrid package, leveraging both C++ and Python to deliver high performance and ease of use. The core featurization logic is implemented in C++ for maximum speed and efficiency, while the Python interface provides a user-friendly API that integrates seamlessly with modern GNN training and inference workflows. This design combines the computational power of C++ with the flexibility and accessibility of Python, making cuik-molmaker both fast and intuitive for researchers and developers. As cuik-molmaker interfaces with the C++ API of rdkit, the produced features are expected to be identical to those produced by rdkit.

Quick start

Setup conda environment

# Set environment variables
export PYTHON_VERSION=3.11
export RDKIT_VERSION=2025.03.2

conda create -n cuik_molmaker_env python=${PYTHON_VERSION} conda-forge::rdkit==${RDKIT_VERSION} conda-forge::pybind11==2.13.6 conda-forge::libboost-devel==1.86.0 conda-forge::libboost-python-devel==1.86.0 

conda activate cuik_molmaker_env

This step is optional if you already have a conda environment with the required dependencies.

Install wheel from NVIDIA PyPI

We provide a handy script to install the wheel from NVIDIA PyPI based on your OS and other dependencies.

python scripts/check_and_install_cuik_molmaker.py

Usage: Computing atom and bond features

import cuik_molmaker
import numpy as np

# List all available atom onehot features
print(cuik_molmaker.list_all_atom_onehot_features())

# Compute atom (atomic number, number of hydrogen, chirality) and bond (bond type) features for acetic acid
acetic_acid_smiles = "CC(=O)O"

# Get atom onehot feature names as NumPy array
atom_onehot_feature_array = cuik_molmaker.atom_onehot_feature_names_to_array(['atomic-number', 'num-hydrogens', 'chirality'])

# Get bond feature names as NumPy array
bond_feature_array = cuik_molmaker.bond_feature_names_to_array(['bond-type-onehot'])

# Set parameters for featurization
explicit_h, offset_carbon, duplicate_edges, add_self_loop = False, False, True, False

# Featurize
all_features =cuik_molmaker.mol_featurizer(acetic_acid_smiles, atom_onehot_feature_array, np.array([]), bond_feature_array, explicit_h, offset_carbon, duplicate_edges, add_self_loop)

# This returns a list of NumPy arrays.
# First index contains atom features
print(all_features[0].shape)

# Second index contains bond features
print(all_features[1].shape)

# Third index contains edge indices in COO format
print(all_features[2].shape)

Usage: Computing molecular descriptors

from cuik_molmaker.mol_features import MoleculeFeaturizer

featurizer = MoleculeFeaturizer(molecular_descriptor_type="rdkit2D", rdkit2D_normalization_type="fast")

smiles_list = ["CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin
               "CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1", # diphenhydramine
]
rdkit2D_descriptors = featurizer.featurize(smiles_list)

# Print the shape of the descriptors
print(rdkit2D_descriptors.shape)

Source of acceleration

The hybrid C++/Python design of cuik-molmaker allows for the core featurization logic to be implemented in C++ and reduces the python overhead. Another source of acceleration is the creation of features for the entire minibatch of SMILES at once, which saves the overhead of creating memory allocation and concatenation.

Additional Documentation

File Description
USAGE.md Examples and instructions for using cuik-molmaker to featurize molecules including batching.
FEATURES.md Detailed list and explanation of all atom and bond features available for featurization.
BUILD.md Step-by-step instructions for building cuik-molmaker from source, including prerequisites and troubleshooting.
TESTING.md Guidelines and commands for running the test suite to verify installation and functionality.

Hardware Requirements

cuik-molmaker is designed to run on any CPU-based system.

Adoption

cuik-molmaker has currently been integrated into the following projects:

  • Chemprop: cuik-molmaker is available for use with conda and Docker installations of Chemprop. It can be enabled by setting --use-cuikmolmaker-featurization flag in the command line with all use cases: training, prediction, fingerprinting, and hyperparameter optimization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuik_molmaker_pin-2025.9.2-py313-none-win_amd64.whl (241.8 kB view details)

Uploaded Python 3.13Windows x86-64

cuik_molmaker_pin-2025.9.2-py313-none-manylinux_2_24_x86_64.whl (305.3 kB view details)

Uploaded Python 3.13manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.2-py313-none-macosx_11_0_x86_64.whl (268.5 kB view details)

Uploaded Python 3.13macOS 11.0+ x86-64

cuik_molmaker_pin-2025.9.2-py312-none-win_amd64.whl (241.8 kB view details)

Uploaded Python 3.12Windows x86-64

cuik_molmaker_pin-2025.9.2-py312-none-manylinux_2_24_x86_64.whl (304.6 kB view details)

Uploaded Python 3.12manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.2-py312-none-macosx_11_0_x86_64.whl (268.1 kB view details)

Uploaded Python 3.12macOS 11.0+ x86-64

cuik_molmaker_pin-2025.9.2-py311-none-win_amd64.whl (241.0 kB view details)

Uploaded Python 3.11Windows x86-64

cuik_molmaker_pin-2025.9.2-py311-none-manylinux_2_24_x86_64.whl (305.5 kB view details)

Uploaded Python 3.11manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2025.9.2-py311-none-macosx_11_0_x86_64.whl (269.9 kB view details)

Uploaded Python 3.11macOS 11.0+ x86-64

File details

Details for the file cuik_molmaker_pin-2025.9.2-py313-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.2-py313-none-win_amd64.whl
Algorithm Hash digest
SHA256 2e602c71d39e9d7dff9d949827fa522babf8f6c7ff6034ca04262f0cf4e162e0
MD5 cd8437e87369fe538b43001adab92bf3
BLAKE2b-256 14230db900c348ed9eb75aae36a87cbb8dcb2253056f55c13229002205e97f22

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.2-py313-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.2-py313-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 4c2118d2cbeb8025257f0c9bae1a1736daa99b32622ee5f1c4e5ee750a7c24a2
MD5 b03f261896ffd4aa07bb7b43b6cc64a7
BLAKE2b-256 6d66d228ed1f613a1f00dd6895aa169dff6c69eb3da6a0a1de8feb95ed136eaf

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.2-py313-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.2-py313-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 f4a55a1a321b9e805b4320255c2d79e2f44639673981e6e99a6794c28a832669
MD5 988808b9e5fdf0ac71acc84dcdd1ddb5
BLAKE2b-256 9928c8940c9985d08ae92ca9a168dd64ad5a6d18b546050224dbacc1ebef6028

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.2-py312-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.2-py312-none-win_amd64.whl
Algorithm Hash digest
SHA256 7075e56cf08333388a58f11a2e35a28fba796daef70ec56ba0904f4034999090
MD5 9397d872ca1ff89144a5bae47e29c3ed
BLAKE2b-256 7e533500aadf6ecf86f59c41eaded343e6aa47b54f0933a600cbb892102f4d75

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.2-py312-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.2-py312-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 5c3828d07b872046883273d94f1d484d66a1a4b21264da4bd61b1a7df5db693d
MD5 a63a2eb132c5cf6f74db05c3f80bac7c
BLAKE2b-256 b00f8371ade28fa2cf011b24aa5ae322e9b93afad1e6601f62250959c9307123

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.2-py312-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.2-py312-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 89b91c3e23c69f211627846671a9f04224d4879b040e97fec458d52d7090f988
MD5 48902783f6e8d40817298b60a4181aae
BLAKE2b-256 83fc866129cd3676bad3978b27ed01c2fcbff46fd9d046b0bdb6fa9c95115051

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.2-py311-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.2-py311-none-win_amd64.whl
Algorithm Hash digest
SHA256 6268a8a6036a5f91609f2e8d2b90b71b961f0cd5aefe34f66174d09c636f1e9d
MD5 6bd53d92bbe3c505de40714bc212df18
BLAKE2b-256 697471692fd6a728090b5c264666c47d8879a26c0b7916dd1d6b684a04cc4d27

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.2-py311-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.2-py311-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 aeb8a42136c64fa379c5f4a6351e4d17ca8811ff60a06051a4d63eab61f2397b
MD5 b9f9af9d721a2a9483a0815273a370d1
BLAKE2b-256 256015e0bb81665649a25306dab24638394deabce516eac0111ecc8663e6d5eb

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2025.9.2-py311-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2025.9.2-py311-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 831dc07420e04fec0f7d4bd2ccc9d6c2f9f638d2e6cdaf2f2c6d22455d51241b
MD5 2d775513a1c61177b63ff6bd8e27af48
BLAKE2b-256 e87b1d0ba128c0f1e934588df1c14c6a3b6610db370257f1bdb21fa57a970206

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page