Skip to main content

C++ module for featurizing molecules

Project description

cuik-molmaker

cuik-molmaker is a specialized package designed for molecular featurization, converting chemical structures into formats that can be effectively used as inputs for deep learning models, particularly graph neural networks (GNNs).

cuik-molmaker is built as a hybrid package, leveraging both C++ and Python to deliver high performance and ease of use. The core featurization logic is implemented in C++ for maximum speed and efficiency, while the Python interface provides a user-friendly API that integrates seamlessly with modern GNN training and inference workflows. This design combines the computational power of C++ with the flexibility and accessibility of Python, making cuik-molmaker both fast and intuitive for researchers and developers. As cuik-molmaker interfaces with the C++ API of rdkit, the produced features are expected to be identical to those produced by rdkit.

Quick start

Setup conda environment

# Set environment variables
export PYTHON_VERSION=3.11
export RDKIT_VERSION=2025.03.2

conda create -n cuik_molmaker_env python=${PYTHON_VERSION} conda-forge::rdkit==${RDKIT_VERSION} conda-forge::pybind11==2.13.6 conda-forge::libboost-devel==1.86.0 conda-forge::libboost-python-devel==1.86.0 

conda activate cuik_molmaker_env

This step is optional if you already have a conda environment with the required dependencies.

Install wheel from NVIDIA PyPI

We provide a handy script to install the wheel from NVIDIA PyPI based on your OS and other dependencies.

python scripts/check_and_install_cuik_molmaker.py

Usage: Computing atom and bond features

import cuik_molmaker
import numpy as np

# List all available atom onehot features
print(cuik_molmaker.list_all_atom_onehot_features())

# Compute atom (atomic number, number of hydrogen, chirality) and bond (bond type) features for acetic acid
acetic_acid_smiles = "CC(=O)O"

# Get atom onehot feature names as NumPy array
atom_onehot_feature_array = cuik_molmaker.atom_onehot_feature_names_to_array(['atomic-number', 'num-hydrogens', 'chirality'])

# Get bond feature names as NumPy array
bond_feature_array = cuik_molmaker.bond_feature_names_to_array(['bond-type-onehot'])

# Set parameters for featurization
explicit_h, offset_carbon, duplicate_edges, add_self_loop = False, False, True, False

# Featurize
all_features =cuik_molmaker.mol_featurizer(acetic_acid_smiles, atom_onehot_feature_array, np.array([]), bond_feature_array, explicit_h, offset_carbon, duplicate_edges, add_self_loop)

# This returns a list of NumPy arrays.
# First index contains atom features
print(all_features[0].shape)

# Second index contains bond features
print(all_features[1].shape)

# Third index contains edge indices in COO format
print(all_features[2].shape)

Usage: Computing molecular descriptors

from cuik_molmaker.mol_features import MoleculeFeaturizer

featurizer = MoleculeFeaturizer(molecular_descriptor_type="rdkit2D", rdkit2D_normalization_type="fast")

smiles_list = ["CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin
               "CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1", # diphenhydramine
]
rdkit2D_descriptors = featurizer.featurize(smiles_list)

# Print the shape of the descriptors
print(rdkit2D_descriptors.shape)

Source of acceleration

The hybrid C++/Python design of cuik-molmaker allows for the core featurization logic to be implemented in C++ and reduces the python overhead. Another source of acceleration is the creation of features for the entire minibatch of SMILES at once, which saves the overhead of creating memory allocation and concatenation.

Additional Documentation

File Description
USAGE.md Examples and instructions for using cuik-molmaker to featurize molecules including batching.
FEATURES.md Detailed list and explanation of all atom and bond features available for featurization.
BUILD.md Step-by-step instructions for building cuik-molmaker from source, including prerequisites and troubleshooting.
TESTING.md Guidelines and commands for running the test suite to verify installation and functionality.

Hardware Requirements

cuik-molmaker is designed to run on any CPU-based system.

Adoption

cuik-molmaker has currently been integrated into the following projects:

  • Chemprop: cuik-molmaker is available for use with conda and Docker installations of Chemprop. It can be enabled by setting --use-cuikmolmaker-featurization flag in the command line with all use cases: training, prediction, fingerprinting, and hyperparameter optimization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuik_molmaker_pin-2026.3.3-py313-none-win_amd64.whl (242.0 kB view details)

Uploaded Python 3.13Windows x86-64

cuik_molmaker_pin-2026.3.3-py313-none-manylinux_2_24_x86_64.whl (305.7 kB view details)

Uploaded Python 3.13manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2026.3.3-py312-none-win_amd64.whl (242.0 kB view details)

Uploaded Python 3.12Windows x86-64

cuik_molmaker_pin-2026.3.3-py312-none-manylinux_2_24_x86_64.whl (305.0 kB view details)

Uploaded Python 3.12manylinux: glibc 2.24+ x86-64

cuik_molmaker_pin-2026.3.3-py311-none-win_amd64.whl (241.3 kB view details)

Uploaded Python 3.11Windows x86-64

cuik_molmaker_pin-2026.3.3-py311-none-manylinux_2_24_x86_64.whl (305.8 kB view details)

Uploaded Python 3.11manylinux: glibc 2.24+ x86-64

File details

Details for the file cuik_molmaker_pin-2026.3.3-py313-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.3-py313-none-win_amd64.whl
Algorithm Hash digest
SHA256 2456cc92b6715c368f4683b2bd95ca90d68cbde08f227a4fc5f07c063d47e7bf
MD5 b2e74bea64d23edd52a5bde3068ae1c0
BLAKE2b-256 123771499160b97ca5f42dac10e03169fba86a04213784af5192f00c6f1030e4

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.3-py313-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.3-py313-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 fc83155187e9228d3284a26ca6edb6f8c3e0841494d645bcac0a7add32bec1bf
MD5 2471e007a5b4a9b987c2e6591e240819
BLAKE2b-256 cc98d702c419df87d93c8b50dfef2ad329eb0f6d51b0e5e578ddc5e0c0388e5d

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.3-py312-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.3-py312-none-win_amd64.whl
Algorithm Hash digest
SHA256 90398c52e6efab4b96b8f4999e81bad749150f61f0263d5735bde7eff7b1e5e8
MD5 26458c703cd2a46ff45e15c6fe1efc63
BLAKE2b-256 eaf0a4c74cb2d4110b41a5c90dcfc1a9cdcb3826a9b40b557edfda15f4fc69c5

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.3-py312-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.3-py312-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 56577afc9c810e397227589858d4c54af90251e82bcf4a224b496a054b36d90b
MD5 247280d618be83e37d9a928e6d8a14c6
BLAKE2b-256 1d7d853f907ef4cb3912b2e3e523e7a4840f0ced5752a888d5b64287e1e82418

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.3-py311-none-win_amd64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.3-py311-none-win_amd64.whl
Algorithm Hash digest
SHA256 c6683a1b7e366b12bbe98504c1eba0d2bb7e4b6e26484ac8d74afcd6b95523e3
MD5 5c5b6e39df3fc74dd8dfc0887d40e5a7
BLAKE2b-256 7f3942737478171bfc0286b875d64abe9cd50881e2e1433ccb59c6f4ab5941db

See more details on using hashes here.

File details

Details for the file cuik_molmaker_pin-2026.3.3-py311-none-manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for cuik_molmaker_pin-2026.3.3-py311-none-manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 2caf867c2f5add013e28ebc3033b34d2ed2773cd0ebffe8c5c83f62b11a1e729
MD5 1c33c5bb8a89ed98b17c3a4329d26db7
BLAKE2b-256 ac6ee62c3199bb20d81981ae076146e1fbfe35240ebbbbe87965b7236c23675c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page