High-performance NVIDIA Warp primitives for GPU-enabled computational chemistry and atomistic simulation workflows.

These details have not been verified by PyPI

Project description

NVIDIA ALCHEMI Toolkit-Ops

High-performance NVIDIA Warp primitives for computational chemistry

NVIDIA ALCHEMI Toolkit-Ops is a collection of GPU-optimized, batched primitives for accelerating atomistic simulations. High performance compute kernels are written in NVIDIA warp-lang.

Key Features

Molecular Dynamics kernels: Velocity Verlet (NVE), Langevin (NVT), Nosé-Hoover Chain (NVT), NPT/NPH ensembles, velocity rescaling
Geometry optimization: FIRE and FIRE2 with optional unit cell optimization
Neighbor lists: naive $O(N^2)$ and cell list $O(N)$ algorithms
Dispersion corrections via Becke-Johnson damped DFT-D3
Electrostatic interactions: Ewald, particle mesh Ewald (PME), and damped shifted force (DSF) algorithms
Differentiable physics: analytical stress tensor (virial) support for Ewald and PME, enabling stress-based MLIP training
NVIDIA Warp core with optional, JIT-compatible PyTorch and JAX bindings, including autograd support

Kernels are naturally intended to be highly scalable (>100,000 atoms) and generally optimized for high throughput operations (on the order of several microseconds per atom) on GPUs, with batching support.

Use Cases

There are currently three primary use cases where we imagine nvalchemi-toolkit-ops to fit into the ecosystem:

Library maintainers and developers are encouraged to benchmark and explore integrating functionality like neighbor list computation to accelerate existing workflows;
Researchers and model developers ideally should be able to rely on this package (and not implement their own!) for neighbor list computation, interatomic interactions, and so on during method development;
Engineers looking to build applications that involve molecular dynamics, interatomic potentials, and the like can take advantage of optimized and maintained low-level kernels. warp-lang kernels should be sufficiently modular to allow for a high degree of flexibility and reusability.

The combination of being GPU-first and batched should enable the kernels contained in nvalchemi-toolkit-ops to be ready for a wide range of research and production applications.

Example Snippets

We encourage interested readers to browse our hosted documentation. Below are some short snippets that highlight our straightforward API and use cases for PyTorch: see the hosted documentation for Jax details.

Neighbor list in a 2D unit cell with 50,000 atoms

This example uses PyTorch:

import torch
from nvalchemiops.torch.neighbors import neighbor_list

torch.set_default_dtype(torch.float32)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.set_default_device(device)

NUM_ATOMS = 50_000
# arbitrarily scale positions
positions = torch.randn((NUM_ATOMS, 3)) * 10.0
cell = torch.eye(3, dtype=torch.float32).unsqueeze(0)
pbc = torch.tensor([True, True, False], dtype=torch.bool)
cutoff = 6.0
# use padded matrix representation for neighbors, optimal for
# compiled applications that need constant shapes
neighbor_matrix, num_neighbors, shift_matrix = neighbor_list(
    positions,
    cutoff,
    cell=cell,
    pbc=pbc,
    method="cell_list"
)
# ...or pass `return_neighbor_list=True` for the familiar COO
# `edge_index` format. `method` will also automatically determine
# neighbor algorithm based off system size
edge_index, neighbor_ptr, shifts = neighbor_list(
    positions,
    cutoff,
    cell=cell,
    pbc=pbc,
    return_neighbor_list=True
)

DFT-D3(BJ) corrections on a batch of molecules

This example assumes you already have concatenated a set of molecules into combined tensors, and have computed some form of neighborhood using the neighbor_list API. Here, we'll demonstrate using the matrix representation:

import torch
from nvalchemiops.torch.interactions.dispersion import dftd3
from nvalchemiops.torch.neighbors import neighbor_list

# the following parameters need to be constructed ahead of time
positions = ...  # [num_atoms, 3]
atomic_numbers = ...  # [num_atoms]
cell = ...  # [num_systems, 3, 3]
pbc = ...  # [num_systems, 3]
batch_idx = ...  # [num_atoms]
batch_ptr = ...  # [num_systems + 1]
# construct neighbor matrix
neighbor_matrix, num_neighbors, shift_matrix = neighbor_list(
    positions,
    cutoff=...,  # on the order of ~20 Angstroms
    cell=cell,
    pbc=pbc,
    batch_idx=batch_idx,
    batch_ptr=batch_ptr
)
# DFT-D3 parameters need to be provided, which comprises reference C6 parameters.
# Refer to the user documentation to see the expected structure and data source.
d3_params = ...
# pass everything to the functional interface
d3_energies, d3_forces, coord_nums, d3_virials = dftd3(
    positions=positions,
    numbers=atomic_numbers,
    neighbor_matrix=neighbor_matrix,
    neighbor_matrix_shifts=shift_matrix,
    batch_idx=batch_idx,
    # functional specific DFT-D3 parameters (PBE shown)
    a1=0.4289, a2=4.4407, s8=0.7875,
    d3_params=d3_params,
    compute_virial=True
)

Electrostatics via particle mesh Ewald

This example shows how to compute the per-atom and system energies as well as the forces using the particle mesh Ewald interface.

import torch
from nvalchemiops.torch.interactions.electrostatics import particle_mesh_ewald
from nvalchemiops.torch.neighbors import neighbor_list

# the following parameters need to be constructed ahead of time
positions = ...  # [num_atoms, 3]
atomic_numbers = ...  # [num_atoms]
cell = ...  # [num_systems, 3, 3]
pbc = ...  # [num_systems, 3]
atomic_charges = ... # [num_atoms]
# construct neighbor matrix
neighbor_matrix, num_neighbors, shift_matrix = neighbor_list(
    positions,
    cutoff=...,  # on the order of ~20 Angstroms
    cell=cell,
    pbc=pbc,
)
# call PME, using automatic parameter tuning
atom_energies, atom_forces = particle_mesh_ewald(
    positions=positions,
    charges=atomic_charges,
    cell=cell,
    neighbor_matrix=neighbor_matrix,
    neighbor_matrix_shifts=shift_matrix,
    accuracy=1e-6
)
system_energy = atom_energies.sum()

CUDA 13 Support

CUDA 13 is required for Blackwell GPUs. torch>=2.11.0 and jax[cuda13] publish CUDA 13 wheels on the default PyPI index for x86 platforms. On Arm platforms (e.g. NVIDIA DGX Spark), an --extra-index-url is required for PyTorch.

# Standalone install (x86)
uv venv --seed --python 3.12
uv pip install nvalchemi-toolkit-ops torch==2.11.0

# Standalone install (Arm, e.g. DGX Spark)
uv venv --seed --python 3.12
uv pip install nvalchemi-toolkit-ops \
    torch==2.11.0+cu130 \
    --extra-index-url https://download.pytorch.org/whl/cu130

See the installation guide for details.

Roadmap

Features planned for upcoming releases:

Performance improvements for neighbor lists, DFT-D3, and electrostatics
Explicit 2nd-derivative electrostatics kernels for more efficient MLIP training
Multipole Ewald summation
Batched Nudged Elastic Band (NEB)
Support for custom pair potentials in neighbor list functions
Slab corrections for pseudo-2D periodic systems
Ewald dispersion
Improved pair potential coverage (e.g. ZBL, OQDO, Born-Mayer)
Basis functions and descriptors for MLIPs (e.g. spherical harmonics, radial basis, Wigner D3 matrix)

Contributions & Disclaimers

Currently, NVIDIA ALCHEMI Toolkit-Ops is undergoing a public beta, where we are soliciting feedback from the community. During this time, direct code contributions are not accepted as our first priority will be to define and provide a stable API, which is/will be subject to change. Feature requests, discussions, and general feedback are welcome and encouraged via Github Issues.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.1

Apr 14, 2026

0.3.0

Mar 16, 2026

0.2.0

Dec 19, 2025

0.1.0

Dec 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nvalchemi_toolkit_ops-0.3.1-py3-none-any.whl (464.2 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file nvalchemi_toolkit_ops-0.3.1-py3-none-any.whl.

File metadata

Download URL: nvalchemi_toolkit_ops-0.3.1-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 464.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nvalchemi_toolkit_ops-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c0f37c0a798c81d0a7d617591968d71f4187ba3c32515db16d2efa712edb4e7`
MD5	`917dca005c87b3c4fc7e8521a3ce2300`
BLAKE2b-256	`d5447c7e8d52460755918810a1486c08ed4cc7bb810c9fbc186f8e2e02c0b856`

See more details on using hashes here.

nvalchemi-toolkit-ops 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers