Skip to main content

cuPyLMA: a Multi-GPU Levenberg-Marquardt Optimizer powered by cuPyNumeric

Project description

cuPyLMA: a Multi-GPU Levenberg-Marquardt (Deep Learning) Optimizer Powered by NVIDIA cuPyNumeric.

cuPyLMA is a scalable (deep learning) optimizer based on Levenberg-Marquardt algoritm. It supports multi-GPU execution via NVIDIA cuPyNumeric, which is a NumPy-like scientific computing framework.

cuPyLMA exploits the performance of multiple GPUs. cuPyLMA explicitly stores the full Jacobian matrix required by Levenberg-Marquardt algorithm for performance, which is in contrast to the most common solutions which implicitly represents the Jacobian matrix via Jacobian-vector product (JVP) and vector-Jacobian product (VJP) and thus lacks parallelism.

cuPyLMA's design consists of two components and each one holds a seperate set of GPUs.

  • Model component hosts a PyTorch deep learning model with its data-parallelism replicas on each GPU and computes the Jacobian matrix.
  • Optimizer component receives the Jacobian matrix from the model component and solves the optimal parameter updates by the Levenberg-Marqurdt algorithm via cuPyNumeric.

Installation

TODO: upload to pip

Usage

The following codes show steps to adapt exisitng PyTorch training code to utilize cuPyLMA.

import cuPyLMA
import torch

class MyModel(torch.nn.Module): 
    # Implementation
model = MyModel() # Instantiate the deep learning model

# Configure optimizer
devices = [torch.device('cuda:2'), torch.device('cuda:3')] # Cuda devices held by the model component
loss_fn = torch.nn.MSELoss() # Loss function
residual_fn = lambda a, b : torch.flatten(a - b) # Residual function: the output should be an 1-d array
lma = cuPyLMA.LMA(
    model, devices,
    loss_fn, residual_fn
)

# Train one step
x_train, y_train = # train data
slice_size = # Jaocbian slice size.
             # The Jacobian matrix is decomposed into row slices for reducing the peak memory.
             # It is recommended to start from `<batch size> / <#GPUs in the model component>`.
             # If out of memory, it should be set to a smaller one.
loss, terminated = lma.step(x_train, y_train, slice_size)

Performance

cuPyLMA automatically selects the best strategy for Jacobian matrix computation to reduce the peak memory usage and boost the performance.

Changelog

Release 0.1

  • First release

Citation

In construction ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cupylma-0.1.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cupylma-0.1-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file cupylma-0.1.tar.gz.

File metadata

  • Download URL: cupylma-0.1.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for cupylma-0.1.tar.gz
Algorithm Hash digest
SHA256 358919b52768c01a16a7ac6471c9848fef51fa70f873e032e3fd4fcdba9b96d4
MD5 fea6c83eb3bfeee17e6671cf435f4547
BLAKE2b-256 232f1ec0222f5366e8901985d188b63e4bd3a8fa5c6c73fc064065458b00da2c

See more details on using hashes here.

File details

Details for the file cupylma-0.1-py3-none-any.whl.

File metadata

  • Download URL: cupylma-0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for cupylma-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dcabeabd605ba719b262ff159a41d60352c4b54acf3ff562229ad344ee7d172d
MD5 8489438967bcc8a1976f8f2eecfc82ff
BLAKE2b-256 5ff9d0610c4d41ad1a9904bb6d08c8894d41d01d90d8600f963d5de1eb798778

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page