Skip to main content

C++ module for featurizing molecules

Project description

cuik-molmaker

cuik-molmaker is a specialized package designed for molecular featurization, converting chemical structures into formats that can be effectively used as inputs for deep learning models, particularly graph neural networks (GNNs).

cuik-molmaker is built as a hybrid package, leveraging both C++ and Python to deliver high performance and ease of use. The core featurization logic is implemented in C++ for maximum speed and efficiency, while the Python interface provides a user-friendly API that integrates seamlessly with modern GNN training and inference workflows. This design combines the computational power of C++ with the flexibility and accessibility of Python, making cuik-molmaker both fast and intuitive for researchers and developers. As cuik-molmaker interfaces with the C++ API of rdkit, the produced features are expected to be identical to those produced by rdkit.

Quick start

Setup conda environment

# Set environment variables
export PYTHON_VERSION=3.11
export RDKIT_VERSION=2025.03.2

conda create -n cuik_molmaker_env python=${PYTHON_VERSION} conda-forge::rdkit==${RDKIT_VERSION} conda-forge::pybind11==2.13.6 conda-forge::libboost-devel==1.86.0 conda-forge::libboost-python-devel==1.86.0 

conda activate cuik_molmaker_env

This step is optional if you already have a conda environment with the required dependencies.

Install wheel from NVIDIA PyPI

We provide a handy script to install the wheel from NVIDIA PyPI based on your OS and other dependencies.

python scripts/check_and_install_cuik_molmaker.py

Usage: Computing atom and bond features

import cuik_molmaker
import numpy as np

# List all available atom onehot features
print(cuik_molmaker.list_all_atom_onehot_features())

# Compute atom (atomic number, number of hydrogen, chirality) and bond (bond type) features for acetic acid
acetic_acid_smiles = "CC(=O)O"

# Get atom onehot feature names as NumPy array
atom_onehot_feature_array = cuik_molmaker.atom_onehot_feature_names_to_array(['atomic-number', 'num-hydrogens', 'chirality'])

# Get bond feature names as NumPy array
bond_feature_array = cuik_molmaker.bond_feature_names_to_array(['bond-type-onehot'])

# Set parameters for featurization
explicit_h, offset_carbon, duplicate_edges, add_self_loop = False, False, True, False

# Featurize
all_features =cuik_molmaker.mol_featurizer(acetic_acid_smiles, atom_onehot_feature_array, np.array([]), bond_feature_array, explicit_h, offset_carbon, duplicate_edges, add_self_loop)

# This returns a list of NumPy arrays.
# First index contains atom features
print(all_features[0].shape)

# Second index contains bond features
print(all_features[1].shape)

# Third index contains edge indices in COO format
print(all_features[2].shape)

Usage: Computing molecular descriptors

from cuik_molmaker.mol_features import MoleculeFeaturizer

featurizer = MoleculeFeaturizer(molecular_descriptor_type="rdkit2D", rdkit2D_normalization_type="fast")

smiles_list = ["CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin
               "CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1", # diphenhydramine
]
rdkit2D_descriptors = featurizer.featurize(smiles_list)

# Print the shape of the descriptors
print(rdkit2D_descriptors.shape)

Source of acceleration

The hybrid C++/Python design of cuik-molmaker allows for the core featurization logic to be implemented in C++ and reduces the python overhead. Another source of acceleration is the creation of features for the entire minibatch of SMILES at once, which saves the overhead of creating memory allocation and concatenation.

Additional Documentation

File Description
USAGE.md Examples and instructions for using cuik-molmaker to featurize molecules including batching.
FEATURES.md Detailed list and explanation of all atom and bond features available for featurization.
BUILD.md Step-by-step instructions for building cuik-molmaker from source, including prerequisites and troubleshooting.
TESTING.md Guidelines and commands for running the test suite to verify installation and functionality.

Hardware Requirements

cuik-molmaker is designed to run on any CPU-based system.

Adoption

cuik-molmaker has currently been integrated into the following projects:

  • Chemprop: cuik-molmaker is available for use with conda and Docker installations of Chemprop. It can be enabled by setting --use-cuikmolmaker-featurization flag in the command line with all use cases: training, prediction, fingerprinting, and hyperparameter optimization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuik_molmaker_pin-2026.3.1-py313-none-win_amd64.whl (242.0 kB view details)

Uploaded Python 3.13Windows x86-64

cuik_molmaker_pin-2026.3.1-py313-none-manylinux_2_24_x86_64.whl (305.7 kB view details)

Uploaded Python 3.13manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2026.3.1-py312-none-win_amd64.whl (242.0 kB view details)

Uploaded Python 3.12Windows x86-64

cuik_molmaker_pin-2026.3.1-py312-none-manylinux_2_24_x86_64.whl (305.0 kB view details)

Uploaded Python 3.12manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2026.3.1-py311-none-win_amd64.whl (241.2 kB view details)

Uploaded Python 3.11Windows x86-64

cuik_molmaker_pin-2026.3.1-py311-none-manylinux_2_24_x86_64.whl (305.7 kB view details)

Uploaded Python 3.11manylinux: glibc 2.24+ x86-64

File details

Details for the file cuik_molmaker_pin-2026.3.1-py313-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.1-py313-none-win_amd64.whl
Algorithm Hash digest
SHA256 83b54b965d73054626565491015d21ea23b99a3d6ce0a0061fc0a24587fa5f59
MD5 45911180727516c548886677112ddd5e
BLAKE2b-256 657cf0952507992936dc81293e3ebe789f8368b1f27e63a097c169627517e51f

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.1-py313-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.1-py313-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 e0baca751d30bd34d5c4c82ee81d3e75783ab6589078ccaa47d88b0cd9737e87
MD5 316dbf4ec05fe3b02aaf5ad21ad4edd2
BLAKE2b-256 9a94ad7e0465561c2ca6d61c14f32bbf2eedf53c72e095e680ce6de130c82d6b

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.1-py312-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.1-py312-none-win_amd64.whl
Algorithm Hash digest
SHA256 1e8ec362b0c1d74f3d9bd1b6411c1630e04c87c3692d8c7a42128459711abb48
MD5 009df88d59a2b7f818fbf60d3f2046b4
BLAKE2b-256 d705767dd999cfa7933f94186510c048e6c486a7f85b4d1161bf9cfb675814c7

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.1-py312-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.1-py312-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 38c94a500f197252500f51e15d27598cb8fdd8e0599138e9962eb0c4f7238c63
MD5 30c35f2a7767afc0a7433ce7d401e4de
BLAKE2b-256 347c9f73eb59707dc408c6db635c63703d7fb8c1897091f4703907661d91c0cd

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.1-py311-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.1-py311-none-win_amd64.whl
Algorithm Hash digest
SHA256 2ed76d04e89685552132159f46ff7d50e3c7dfae8ecf7525ebe74331d957b344
MD5 50631721dba4e0603f98a79c6a7e54f1
BLAKE2b-256 ba97b0d3dd30c791cd0c2caf1808571af1a6f0face359200d74cef4903a7f6a4

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.1-py311-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.1-py311-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 555e551ee2960e50d5966c8faba5b196986ce409d244164b2c2b17c0e42a9a87
MD5 499c80ff803f42de0cd3a26463b067d6
BLAKE2b-256 917892d78e3918b765800e6b41775a340d7a2bc1a46160dde0b30ecd10ae0a99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page