Skip to main content

Universal Training Framework for PyTorch and HuggingFace Transformers

Project description

🛡️ Selgis ML

Autonomous Self-Healing Training Framework for PyTorch & Transformers.

PyPI License Python

Selgis (Self-Guided Intelligent Stability) is a library that turns unstable neural network training into a reliable, predictable process. It automatically detects Loss Spikes, NaN/Inf values, and plateaus, applying dynamic weight Rollback mechanisms and Learning Rate Surges to recover the run.

Especially effective for LoRA/QLoRA finetuning of LLMs (Llama, Qwen, Mistral) on consumer hardware, where standard trainers often crash with OutOfMemory errors or degrade due to fp16 instability.


🔥 Why Selgis?

Have you ever woken up in the morning to find your overnight run crashed with Loss: NaN at 80%? Or that the model "forgot" everything it learned due to a bad batch? Selgis solves this.

  • 🛡️ Self-Healing Loop: Automatic rollback to the last stable state upon detecting anomalies (loss spikes / NaN).
  • 🧠 Memory-Safe Architecture: State preservation logic tracks only trainable parameters (trainable-only). This allows training Qwen-4B / Llama-7B on cards with 8-12 GB VRAM without OOM during checkpoints.
  • ⚡ Final Surge: If the model gets stuck on a plateau, Selgis can automatically boost the LR by 5-10x to break through local minima ("defibrillator effect").
  • 📉 Smart Defaults: Built-in LR Finder and adaptive scheduler presets.

📊 Benchmarks

We tested Selgis under extreme conditions on real hardware (Tesla T4 16GB). Here are the results:

Task Model Problem Selgis Solution Result
LLM Finetuning Qwen-2.5-4B (QLoRA) OOM on 12GB cards + Loss Spike Trainable-only state + Rollback Memory: 8.2 GB, Loss < 0.001
Seq2Seq LSTM (1.4M) Catastrophic Spike (Acc 52% → 44%) Rollback + Surge +7% Accuracy (Recovered to 59.04%)
NLP BERT-base Instability on small batch (16) Stable LR Finder 100.0% Accuracy (in 3 epochs)
CV CNN (MNIST) Overfitting & micro-spikes Micro-rollbacks 99.09% (Held at generalization peak)

"Selgis doesn't just prevent explosions. It returns training to a productive track."


🚀 Installation

# Base version (PyTorch only)
pip install selgis

# Full version (with Transformers, LoRA, quantization, and WandB support)
pip install "selgis[all]"

🛠️ Quick Start

1. Robust LLM Training (Llama / Qwen)

Selgis handles protection while you use the familiar Transformers API. Now with native BitsAndBytes quantization support.

from selgis import TransformerTrainer, TransformerConfig

# Configuration with native 4-bit quantization and protection
config = TransformerConfig(
    model_name_or_path="Qwen/Qwen-2.5-3B",
    
    # --- Native Quantization (New in v0.2.0) ---
    quantization_type="4bit", 
    bnb_4bit_compute_dtype="bfloat16",
    bnb_4bit_use_double_quant=True,
    
    # --- PEFT / LoRA ---
    use_peft=True,
    peft_config={
        "r": 16, 
        "target_modules": ["q_proj", "v_proj", "k_proj", "o_proj"]
    },
    
    # --- Selgis protection ---
    nan_recovery=True,      # Auto-rollback on NaN/Spike
    state_storage="disk"    # Save RAM (store state on disk)
)

# Start training (Trainer handles model loading and quantization automatically)
trainer = TransformerTrainer(model_or_path=config.model_name_or_path, config=config)
trainer.train() 
# You can go to sleep now. If the loss spikes, Selgis fixes it.

2. Standard PyTorch (Any Model)

from selgis import Trainer, SelgisConfig
import torch

# Your model
model = torch.nn.Sequential(
    torch.nn.Linear(10, 32),
    torch.nn.ReLU(),
    torch.nn.Linear(32, 2),
)

# Config
config = SelgisConfig(
    max_epochs=10,
    lr_finder_enabled=True,  # Auto-find optimal LR before start
    spike_threshold=3.0      # Rollback if loss jumps 3x
)

trainer = Trainer(
    model=model, 
    config=config, 
    train_dataloader=loader, 
    criterion=torch.nn.CrossEntropyLoss()
)
trainer.train()

💻 CLI (Command Line Interface)

Selgis ships with a handy CLI for diagnostics and quick execution.

Command Description
selgis device Check GPU/CUDA/MPS availability and print device info.
selgis train Run a minimal demo training on synthetic data (Smoke Test).
selgis train --config <path> Run training using a config file (YAML/JSON supported).
selgis version Print the current library version.

Example environment check:

$ selgis device
🚀 Device: cuda
   GPU: NVIDIA Tesla T4
   Memory: 14.75 GB

📚 API Reference

Full technical documentation for SelgisCore, Trainer, Callbacks, and configuration classes is available in API.md.

Key components:

  • SelgisCore: The brain of the system (protection, rollback, state management).
  • TransformerTrainer: Wrapper for the HuggingFace ecosystem with native BitsAndBytes support.
  • HistoryCallback: Automatically saves training history to JSON for later analysis.
  • LRFinder: Tool for finding the optimal learning rate.

📄 License

Apache 2.0 License. Free for commercial and research use.

Selgis AI — Make training boring (in a good way).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selgis-0.2.0.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

selgis-0.2.0-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file selgis-0.2.0.tar.gz.

File metadata

  • Download URL: selgis-0.2.0.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for selgis-0.2.0.tar.gz
Algorithm Hash digest
SHA256 04a736858e9abd3d65f3ce884b3e31f494a2b6c4a9c8ea916fc46050e0aae79d
MD5 6d3871bb47ba153db89cc0fdcbcf0811
BLAKE2b-256 b0099e6e275b981449eb1ddd0ce492652b448778e11d6d6b24263f5af364655c

See more details on using hashes here.

File details

Details for the file selgis-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: selgis-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for selgis-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e19101ec43306fd499a664d45465d74d72c8a8cc2af72f05a9f1550584c25168
MD5 c05d10ed72dc5a42e5eed3c5fb3280df
BLAKE2b-256 3ed4705d401640c232ebedcf47ff3823e0e41e0b0b11aa091359fc19451b7da0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page