Skip to main content

Lightweight Library for Quantized LLM Fine-Tuning and Deployment

Project description

🧠 QuantLLM: Lightweight Library for Quantized LLM Fine-Tuning and Deployment

📌 Overview

QuantLLM is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) efficiently using 4-bit and 8-bit quantization techniques. It provides a modular and flexible framework for:

  • Loading and quantizing models with advanced configurations
  • LoRA / QLoRA-based fine-tuning with customizable parameters
  • Dataset management with preprocessing and splitting
  • Training and evaluation with comprehensive metrics
  • Model checkpointing and versioning
  • Hugging Face Hub integration for model sharing

The goal of QuantLLM is to democratize LLM training, especially in low-resource environments, while keeping the workflow intuitive, modular, and production-ready.

🎯 Key Features

Feature Description
✅ Quantized Model Loading Load any HuggingFace model in 4-bit or 8-bit precision with customizable quantization settings
✅ Advanced Dataset Management Load, preprocess, and split datasets with flexible configurations
✅ LoRA / QLoRA Fine-Tuning Memory-efficient fine-tuning with customizable LoRA parameters
✅ Comprehensive Training Advanced training loop with mixed precision, gradient accumulation, and early stopping
✅ Model Evaluation Flexible evaluation with custom metrics and batch processing
✅ Checkpoint Management Save, resume, and manage training checkpoints with versioning
✅ Hub Integration Push models and checkpoints to Hugging Face Hub with authentication
✅ Configuration Management YAML/JSON config support for reproducible experiments
✅ Logging and Monitoring Comprehensive logging and Weights & Biases integration

🚀 Getting Started

🔧 Installation

pip install quantllm

📦 Basic Usage

from quantllm import (
    ModelLoader,
    DatasetLoader,
    DatasetPreprocessor,
    DatasetSplitter,
    FineTuningTrainer,
    ModelEvaluator,
    HubManager,
    CheckpointManager,
)
import os
from quantllm.finetune import TrainingLogger
from quantllm.config import (
    DatasetConfig,
    ModelConfig,
    TrainingConfig,
)

# Initialize logger
logger = TrainingLogger()

# 1. Initialize hub manager first
hub_manager = HubManager(
    model_id="your-username/llama-2-imdb",
    token=os.getenv("HF_TOKEN")
)

# 2. Model Configuration and Loading
model_config = ModelConfig(
    model_name="meta-llama/Llama-3.2-3B",
    load_in_4bit=True,
    use_lora=True,
    hub_manager=hub_manager
)

model_loader = ModelLoader(model_config)
model = model_loader.get_model()
tokenizer = model_loader.get_tokenizer()

# 3. Dataset Configuration and Loading
dataset_config = DatasetConfig(
    dataset_name_or_path="imdb",
    dataset_type="huggingface",
    text_column="text",
    label_column="label",
    max_length=512,
    train_size=0.8,
    val_size=0.1,
    test_size=0.1,
    hub_manager=hub_manager
)

# Load and prepare dataset
dataset_loader = DatasetLoader(logger)
dataset = dataset_loader.load_hf_dataset(dataset_config)

# Split dataset
dataset_splitter = DatasetSplitter(logger)
train_dataset, val_dataset, test_dataset = dataset_splitter.train_val_test_split(
    dataset,
    train_size=dataset_config.train_size,
    val_size=dataset_config.val_size,
    test_size=dataset_config.test_size
)

# 4. Dataset Preprocessing
preprocessor = DatasetPreprocessor(tokenizer, logger)
train_dataset, val_dataset, test_dataset = preprocessor.tokenize_dataset(
    train_dataset, val_dataset, test_dataset,
    max_length=dataset_config.max_length,
    text_column=dataset_config.text_column,
    label_column=dataset_config.label_column
)

# Create data loaders
train_dataloader = DataLoader(
    train_dataset,
    batch_size=4,
    shuffle=True,
    num_workers=4
)
val_dataloader = DataLoader(
    val_dataset,
    batch_size=4,
    shuffle=False,
    num_workers=4
)
test_dataloader = DataLoader(
    test_dataset,
    batch_size=4,
    shuffle=False,
    num_workers=4
)

# 5. Training Configuration
training_config = TrainingConfig(
    learning_rate=2e-4,
    num_epochs=3,
    batch_size=4,
    gradient_accumulation_steps=4,
    warmup_steps=100,
    logging_steps=50,
    eval_steps=200,
    save_steps=500,
    early_stopping_patience=3,
    early_stopping_threshold=0.01
)

# Initialize checkpoint manager
checkpoint_manager = CheckpointManager(
    output_dir="./checkpoints",
    save_total_limit=3
)

# 6. Initialize Trainer
trainer = FineTuningTrainer(
    model=model,
    training_config=training_config,
    train_dataloader=train_dataloader,
    eval_dataloader=val_dataloader,
    logger=logger,
    checkpoint_manager=checkpoint_manager,
    hub_manager=hub_manager,
    use_wandb=True,
    wandb_config={
        "project": "quantllm-imdb",
        "name": "llama-2-imdb-finetuning"
    }
)

# 7. Train the model
trainer.train()

# 8. Evaluate on test set
evaluator = ModelEvaluator(
    model=model,
    eval_dataloader=test_dataloader,
    metrics=[
        lambda preds, labels, _: (preds.argmax(dim=-1) == labels).float().mean().item()  # Accuracy
    ],
    logger=logger
)

test_metrics = evaluator.evaluate()

# 9. Save final model
trainer.save_model("./final_model")

# 10. Push to Hub if logged in
if hub_manager.is_logged_in():
    hub_manager.push_model(
        model,
        commit_message=f"Final model with test accuracy: {test_metrics.get('accuracy', 0):.4f}"
    )

⚙️ Advanced Usage

Configuration Files

Create a config file (e.g., config.yaml):

model:
  model_name: "meta-llama/Llama-3.2-3B"
  load_in_4bit: true
  use_lora: true
  lora_config:
    r: 16
    lora_alpha: 32
    target_modules: ["q_proj", "v_proj"]

dataset:
  dataset_name_or_path: "imdb"
  text_column: "text"
  label_column: "label"
  max_length: 512
  train_size: 0.8
  val_size: 0.1
  test_size: 0.1

training:
  learning_rate: 2e-4
  num_epochs: 3
  batch_size: 4
  gradient_accumulation_steps: 4
  warmup_steps: 100
  logging_steps: 50
  eval_steps: 200
  save_steps: 500
  early_stopping_patience: 3
  early_stopping_threshold: 0.01

📚 Documentation

Model Loading

model_config = ModelConfig(
    model_name="meta-llama/Llama-3.2-3B",
    load_in_4bit=True,
    use_lora=True,
    hub_manager=hub_manager
)

Dataset Management

dataset_config = DatasetConfig(
    dataset_name_or_path="imdb",
    dataset_type="huggingface",
    text_column="text",
    label_column="label",
    max_length=512,
    train_size=0.8,
    val_size=0.1,
    test_size=0.1,
    hub_manager=hub_manager
)

Training Configuration

training_config = TrainingConfig(
    learning_rate=2e-4,
    num_epochs=3,
    batch_size=4,
    gradient_accumulation_steps=4,
    warmup_steps=100,
    logging_steps=50,
    eval_steps=200,
    save_steps=500,
    early_stopping_patience=3,
    early_stopping_threshold=0.01
)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantllm-0.0.1a0.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quantllm-0.0.1a0-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file quantllm-0.0.1a0.tar.gz.

File metadata

  • Download URL: quantllm-0.0.1a0.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for quantllm-0.0.1a0.tar.gz
Algorithm Hash digest
SHA256 5795670bfcb7d95068715240b7a1c2f2fa7a0e1337cf27e0ccdf715e54b1fa15
MD5 5dd167149858ad8df82e3f1425731119
BLAKE2b-256 bc0a65373b97c49563d5b837590f00cb95c168d35e6422230fd358f68dac7cec

See more details on using hashes here.

File details

Details for the file quantllm-0.0.1a0-py3-none-any.whl.

File metadata

  • Download URL: quantllm-0.0.1a0-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for quantllm-0.0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 14de04a953edc5e7bfb31cbd4292764f80fbefc30010c7f2d7f3102d4134c6e3
MD5 44d9d1929f9a78a3ec32b13f0ca5ca6d
BLAKE2b-256 ad28206c1edc73c058c7f8579e440be9dd75ba854a1964c25ca97f19c230d114

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page