Skip to main content

A lightweight library for quantized LLM fine-tuning and deployment

Project description

🧠 QuantLLM: Lightweight Library for Quantized LLM Fine-Tuning and Deployment

📌 Overview

QuantLLM is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) efficiently using 4-bit and 8-bit quantization techniques. It provides a modular and flexible framework for:

  • Loading and quantizing models with advanced configurations
  • LoRA / QLoRA-based fine-tuning with customizable parameters
  • Dataset management with preprocessing and splitting
  • Training and evaluation with comprehensive metrics
  • Model checkpointing and versioning
  • Hugging Face Hub integration for model sharing

The goal of QuantLLM is to democratize LLM training, especially in low-resource environments, while keeping the workflow intuitive, modular, and production-ready.

🎯 Key Features

Feature Description
✅ Quantized Model Loading Load any HuggingFace model in 4-bit or 8-bit precision with customizable quantization settings
✅ Advanced Dataset Management Load, preprocess, and split datasets with flexible configurations
✅ LoRA / QLoRA Fine-Tuning Memory-efficient fine-tuning with customizable LoRA parameters
✅ Comprehensive Training Advanced training loop with mixed precision, gradient accumulation, and early stopping
✅ Model Evaluation Flexible evaluation with custom metrics and batch processing
✅ Checkpoint Management Save, resume, and manage training checkpoints with versioning
✅ Hub Integration Push models and checkpoints to Hugging Face Hub with authentication
✅ Configuration Management YAML/JSON config support for reproducible experiments
✅ Logging and Monitoring Comprehensive logging and Weights & Biases integration

🚀 Getting Started

Installation

pip install quantllm

For detailed usage examples and API documentation, please refer to our:

💻 Hardware Requirements

Minimum Requirements

  • CPU: 4+ cores
  • RAM: 16GB
  • Storage: 20GB free space
  • Python: 3.8+

Recommended Requirements

  • GPU: NVIDIA GPU with 8GB+ VRAM
  • RAM: 32GB
  • Storage: 50GB+ SSD
  • CUDA: 11.7+

Resource Usage Guidelines

Model Size 4-bit (GPU RAM) 8-bit (GPU RAM) CPU RAM (min)
3B params ~6GB ~9GB 16GB
7B params ~12GB ~18GB 32GB
13B params ~20GB ~32GB 64GB
70B params ~90GB ~140GB 256GB

🔄 Version Compatibility

QuantLLM Python PyTorch Transformers CUDA
0.1.x ≥3.8 ≥2.0.0 ≥4.30.0 ≥11.7
0.2.x ≥3.9 ≥2.1.0 ≥4.31.0 ≥11.8

🗺 Roadmap

  • Multi-GPU training support
  • AutoML for hyperparameter tuning
  • More quantization methods
  • Custom model architecture support
  • Enhanced logging and visualization
  • Model compression techniques
  • Deployment optimizations

🤝 Contributing

We welcome contributions! Please see our CONTRIBUTE.md for guidelines and setup instructions.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

📫 Contact & Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quantllm-1.0.0-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file quantllm-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: quantllm-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for quantllm-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 00325e5c6f89dcfded798a4b48a004fdaa815012295a5e4fcf9b01bddfe76588
MD5 ea107cb6080b48846f4a5b1c277dc417
BLAKE2b-256 a873f0280b14e9c0fca7319053490be97a4e63dc73f71e88e5b53840cc78160b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page