14 projects
compressed-tensors
Library for utilization of compressed safetensors of neural network models
guidellm
Guidance platform for deploying and managing large language models.
llmcompressor
A library for compressing large language models utilizing the latest techniques and research in the field for both training aware and post training techniques. The library is designed to be flexible and easy to use on top of PyTorch and HuggingFace Transformers, allowing for quick experimentation.
speculators
A unified library for creating, representing, and storing speculative decoding algorithms for LLM serving such as in vLLM.
deepsparse-ent
[DEPRECATED] An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application
deepsparse
[DEPRECATED] An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application
sparseml
[DEPRECATED] Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
sparsezoo
[DEPRECATED] Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
sparsify
[DEPRECATED] Easy-to-use UI for automatically sparsifying neural networks and creating sparsification recipes for better inference performance and a smaller footprint
llmcompressor-nightly
A library for compressing large language models utilizing the latest techniques and research in the field for both training aware and post training techniques. The library is designed to be flexible and easy to use on top of PyTorch and HuggingFace Transformers, allowing for quick experimentation.
compressed-tensors-nightly
Library for utilization of compressed safetensors of neural network models
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
nm-magic-wand-nightly
SparseLinear layers
nm-magic-wand
SparseLinear layers