Skip to main content

A fast, yet specialized, RMSNorm/LayerNorm implementation

Reason this release was yanked:

Replaced by faster-norm.

Project description

# fast-norm-cuda

A fast, yet specialized, RMSNorm/LayerNorm implementation

This library is under development. Currently, only some special cases are supported, and the performance is not yet fully optimized.

  • [x] RMSNorm

  • [ ] LayerNorm

  • [x] Float16 and BFloat16

  • [ ] More data types

  • [x] More shapes

  • [ ] Accelerate if no wgrad

  • [ ] Performance tuning

## Statement

This work was independently completed by me at home using my personal RTX 3080.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_norm_cuda-0.2.0.tar.gz (6.1 kB view details)

Uploaded Source

File details

Details for the file fast_norm_cuda-0.2.0.tar.gz.

File metadata

  • Download URL: fast_norm_cuda-0.2.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for fast_norm_cuda-0.2.0.tar.gz
Algorithm Hash digest
SHA256 398660a39f27b8fdb0c008a2773abde23aa669c0976c0b50d63a8fec300f2cb0
MD5 740b2ff1ed0e28e45bd69031fefef939
BLAKE2b-256 42c364eccb2f384d5f96338b937e607cb986d1acab923bfa8bae81f1795b0b22

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page