cuPyLMA: a Multi-GPU Levenberg-Marquardt Optimizer powered by cuPyNumeric

These details have not been verified by PyPI

Project links

Project description

cuPyLMA: a Multi-GPU Levenberg-Marquardt (Deep Learning) Optimizer Powered by NVIDIA cuPyNumeric.

cuPyLMA is a scalable (deep learning) optimizer based on Levenberg-Marquardt algoritm. It supports multi-GPU execution via NVIDIA cuPyNumeric, which is a NumPy-like scientific computing framework.

cuPyLMA exploits the performance of multiple GPUs. cuPyLMA explicitly stores the full Jacobian matrix required by Levenberg-Marquardt algorithm for performance, which is in contrast to the most common solutions which implicitly represents the Jacobian matrix via Jacobian-vector product (JVP) and vector-Jacobian product (VJP) and thus lacks parallelism.

cuPyLMA's design consists of two components and each one holds a seperate set of GPUs.

Model component hosts a PyTorch deep learning model with its data-parallelism replicas on each GPU and computes the Jacobian matrix.
Optimizer component receives the Jacobian matrix from the model component and solves the optimal parameter updates by the Levenberg-Marqurdt algorithm via cuPyNumeric.

Installation

TODO: upload to pip

Usage

The following codes show steps to adapt exisitng PyTorch training code to utilize cuPyLMA.

import cuPyLMA
import torch

class MyModel(torch.nn.Module): 
    # Implementation
model = MyModel() # Instantiate the deep learning model

# Configure optimizer
devices = [torch.device('cuda:2'), torch.device('cuda:3')] # Cuda devices held by the model component
loss_fn = torch.nn.MSELoss() # Loss function
residual_fn = lambda a, b : torch.flatten(a - b) # Residual function: the output should be an 1-d array
lma = cuPyLMA.LMA(
    model, devices,
    loss_fn, residual_fn
)

# Train one step
x_train, y_train = # train data
slice_size = # Jaocbian slice size.
             # The Jacobian matrix is decomposed into row slices for reducing the peak memory.
             # It is recommended to start from `<batch size> / <#GPUs in the model component>`.
             # If out of memory, it should be set to a smaller one.
loss, terminated = lma.step(x_train, y_train, slice_size)

Performance

cuPyLMA automatically selects the best strategy for Jacobian matrix computation to reduce the peak memory usage and boost the performance.

Changelog

Release 0.1

First release

Citation

In construction ...

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0.dev10 pre-release

Nov 14, 2025

0.2.0.dev9 pre-release

Nov 13, 2025

0.2.0.dev5 pre-release

Nov 6, 2025

0.2.0.dev4 pre-release

Nov 6, 2025

0.2.0.dev3 pre-release

Nov 6, 2025

0.2.0.dev2 pre-release

Nov 6, 2025

0.2.0.dev1 pre-release

Sep 22, 2025

0.1.2

Sep 10, 2025

0.1.1

Aug 22, 2025

This version

0.1

Aug 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cupylma-0.1.tar.gz (38.3 kB view details)

Uploaded Aug 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cupylma-0.1-py3-none-any.whl (7.5 kB view details)

Uploaded Aug 13, 2025 Python 3

File details

Details for the file cupylma-0.1.tar.gz.

File metadata

Download URL: cupylma-0.1.tar.gz
Upload date: Aug 13, 2025
Size: 38.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for cupylma-0.1.tar.gz
Algorithm	Hash digest
SHA256	`358919b52768c01a16a7ac6471c9848fef51fa70f873e032e3fd4fcdba9b96d4`
MD5	`fea6c83eb3bfeee17e6671cf435f4547`
BLAKE2b-256	`232f1ec0222f5366e8901985d188b63e4bd3a8fa5c6c73fc064065458b00da2c`

See more details on using hashes here.

File details

Details for the file cupylma-0.1-py3-none-any.whl.

File metadata

Download URL: cupylma-0.1-py3-none-any.whl
Upload date: Aug 13, 2025
Size: 7.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for cupylma-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dcabeabd605ba719b262ff159a41d60352c4b54acf3ff562229ad344ee7d172d`
MD5	`8489438967bcc8a1976f8f2eecfc82ff`
BLAKE2b-256	`5ff9d0610c4d41ad1a9904bb6d08c8894d41d01d90d8600f963d5de1eb798778`

See more details on using hashes here.

cupylma 0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cuPyLMA: a Multi-GPU Levenberg-Marquardt (Deep Learning) Optimizer Powered by NVIDIA cuPyNumeric.

Installation

Usage

Performance

Changelog

Release 0.1

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes