23 projects
compressed-tensors-nightly
Library for utilization of compressed safetensors of neural network models
compressed-tensors
Library for utilization of compressed safetensors of neural network models
llmcompressor-nightly
A library for compressing large language models utilizing the latest techniques and research in the field for both training aware and post training techniques. The library is designed to be flexible and easy to use on top of PyTorch and HuggingFace Transformers, allowing for quick experimentation.
llmcompressor
A library for compressing large language models utilizing the latest techniques and research in the field for both training aware and post training techniques. The library is designed to be flexible and easy to use on top of PyTorch and HuggingFace Transformers, allowing for quick experimentation.
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
guidellm
Guidance platform for deploying and managing large language models.
guidellm-nightly
Guidance platform for deploying and managing large language models.
deepsparse-ent
An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application
deepsparse
An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application
sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
sparseml-nightly
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
nm-magic-wand-nightly
SparseLinear layers
deepsparse-nightly
An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application
sparsezoo-nightly
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
nm-magic-wand
SparseLinear layers
nm-transformers
State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
nm-yolov5
nm-yolov5-nightly
nm-transformers-nightly
State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
sparsify-nightly
Easy-to-use UI for automatically sparsifying neural networks and creating sparsification recipes for better inference performance and a smaller footprint
sparsify
Easy-to-use UI for automatically sparsifying neural networks and creating sparsification recipes for better inference performance and a smaller footprint
optimum-deepsparse
Optimum DeepSparse is an extension of the Hugging Face Transformers library that integrates the DeepSparse inference runtime. DeepSparse offers GPU-class performance on CPUs, making it possible to run Transformers and other deep learning models on commodity hardware with sparsity. Optimum DeepSparse provides a framework for developers to easily integrate DeepSparse into their applications, regardless of the hardware platform.