Universal Training Framework for PyTorch and HuggingFace Transformers

These details have not been verified by PyPI

Project description

🛡️ Selgis ML

Autonomous Self-Healing Training Framework for PyTorch & Transformers.

Selgis (Self-Guided Intelligent Stability) is a library that turns unstable neural network training into a reliable, predictable process. It automatically detects Loss Spikes, NaN/Inf values, and plateaus, applying dynamic weight Rollback mechanisms and Learning Rate Surges to recover the run.

Especially effective for LoRA/QLoRA finetuning of LLMs (Llama, Qwen, Mistral) on consumer hardware, where standard trainers often crash with OutOfMemory errors or degrade due to fp16 instability.

🔥 Why Selgis?

Have you ever woken up in the morning to find your overnight run crashed with Loss: NaN at 80%? Or that the model "forgot" everything it learned due to a bad batch? Selgis solves this.

🛡️ Self-Healing Loop: Automatic rollback to the last stable state upon detecting anomalies (loss spikes / NaN).
🧠 Memory-Safe Architecture: State preservation logic tracks only trainable parameters (trainable-only). This allows training Qwen-4B / Llama-7B on cards with 8-12 GB VRAM without OOM during checkpoints.
⚡ Final Surge: If the model gets stuck on a plateau, Selgis can automatically boost the LR by 5-10x to break through local minima ("defibrillator effect").
📉 Smart Defaults: Built-in LR Finder and adaptive scheduler presets.

📊 Benchmarks

We tested Selgis under extreme conditions on real hardware (Tesla T4 16GB). Here are the results:

Task	Model	Problem	Selgis Solution	Result
LLM Finetuning	Qwen-2.5-4B (QLoRA)	OOM on 12GB cards + Loss Spike	Trainable-only state + Rollback	Memory: 8.2 GB, Loss < 0.001
Seq2Seq	LSTM (1.4M)	Catastrophic Spike (Acc 52% → 44%)	Rollback + Surge	+7% Accuracy (Recovered to 59.04%)
NLP	BERT-base	Instability on small batch (16)	Stable LR Finder	100.0% Accuracy (in 3 epochs)
CV	CNN (MNIST)	Overfitting & micro-spikes	Micro-rollbacks	99.09% (Held at generalization peak)

"Selgis doesn't just prevent explosions. It returns training to a productive track."

🚀 Installation

# Base version (PyTorch only)
pip install selgis

# Full version (with Transformers, LoRA, quantization, and WandB support)
pip install "selgis[all]"

🛠️ Quick Start

1. Robust LLM Training (Llama / Qwen)

Selgis handles protection while you use the familiar Transformers API. Now with native BitsAndBytes quantization support.

from selgis import TransformerTrainer, TransformerConfig

# Configuration with native 4-bit quantization and protection
config = TransformerConfig(
    model_name_or_path="Qwen/Qwen-2.5-3B",
    
    # --- Native Quantization (New in v0.2.0) ---
    quantization_type="4bit", 
    bnb_4bit_compute_dtype="bfloat16",
    bnb_4bit_use_double_quant=True,
    
    # --- PEFT / LoRA ---
    use_peft=True,
    peft_config={
        "r": 16, 
        "target_modules": ["q_proj", "v_proj", "k_proj", "o_proj"]
    },
    
    # --- Selgis protection ---
    nan_recovery=True,      # Auto-rollback on NaN/Spike
    state_storage="disk",   # Save RAM (store state on disk)

    # --- CPU Offload (New) ---
    cpu_offload=True,       # Offload optimizer states/gradients to CPU
)

# Start training (Trainer handles model loading and quantization automatically)
trainer = TransformerTrainer(model_or_path=config.model_name_or_path, config=config)
trainer.train() 
# You can go to sleep now. If the loss spikes, Selgis fixes it.

2. Standard PyTorch (Any Model)

from selgis import Trainer, SelgisConfig
import torch

# Your model
model = torch.nn.Sequential(
    torch.nn.Linear(10, 32),
    torch.nn.ReLU(),
    torch.nn.Linear(32, 2),
)

# Config
config = SelgisConfig(
    max_epochs=10,
    lr_finder_enabled=True,  # Auto-find optimal LR before start
    spike_threshold=3.0,     # Rollback if loss jumps 3x

    # --- CPU Offload (New) ---
    cpu_offload=True,        # Offload optimizer states/gradients to CPU
)

trainer = Trainer(
    model=model, 
    config=config, 
    train_dataloader=loader, 
    criterion=torch.nn.CrossEntropyLoss()
)
trainer.train()

💻 CLI (Command Line Interface)

Selgis ships with a handy CLI for diagnostics and quick execution.

Command	Description
`selgis device`	Check GPU/CUDA/MPS availability and print device info.
`selgis train`	Run a minimal demo training on synthetic data (Smoke Test).
`selgis train --config <path>`	Run training using a config file (YAML/JSON supported).
`selgis version`	Print the current library version.

Example environment check:

$ selgis device
🚀 Device: cuda
   GPU: NVIDIA Tesla T4
   Memory: 14.75 GB

📚 API Reference

Full technical documentation for SelgisCore, Trainer, Callbacks, and configuration classes is available in API.md.

Key components:

SelgisCore: The brain of the system (protection, rollback, state management).
TransformerTrainer: Wrapper for the HuggingFace ecosystem with native BitsAndBytes support.
HistoryCallback: Automatically saves training history to JSON for later analysis.
LRFinder: Tool for finding the optimal learning rate.

📄 License

Apache 2.0 License. Free for commercial and research use.

Selgis AI — Make training boring (in a good way).

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.40 yanked

Apr 3, 2026

0.2.35 yanked

Mar 15, 2026

0.2.32 yanked

Mar 15, 2026

0.2.31 yanked

Mar 15, 2026

0.2.7.1

May 1, 2026

0.2.7

Apr 30, 2026

0.2.6.1 yanked

Apr 28, 2026

0.2.5

Apr 28, 2026

0.2.3

Mar 15, 2026

0.2.2

Mar 14, 2026

This version

0.2.1

Mar 6, 2026

0.2.0

Feb 12, 2026

0.1.0

Feb 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selgis-0.2.1.tar.gz (32.1 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

selgis-0.2.1-py3-none-any.whl (32.9 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file selgis-0.2.1.tar.gz.

File metadata

Download URL: selgis-0.2.1.tar.gz
Upload date: Mar 6, 2026
Size: 32.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for selgis-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`84820ef5d8713cefc5b043e84af6c21303ad7891c8711220e50437787546a40f`
MD5	`29d6c32f566a1abcd1eb1fa67a182a28`
BLAKE2b-256	`725acd4691fb18ff774782dcd94b8f049c053889d9e199040c24732af0d79ef3`

See more details on using hashes here.

File details

Details for the file selgis-0.2.1-py3-none-any.whl.

File metadata

Download URL: selgis-0.2.1-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 32.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for selgis-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`59cb886485d2b02cdaa96bf9a53e400eea1b47d18a308e5275e031548467e5d1`
MD5	`c5db4fdf293a80e9e9dc8bfff939e5c6`
BLAKE2b-256	`c7dc11c4b74b7bbcd19fac06d772dc31aaee6f3f089c062aa41df770c91ca71e`

See more details on using hashes here.

selgis 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

🛡️ Selgis ML

🔥 Why Selgis?

📊 Benchmarks

🚀 Installation

🛠️ Quick Start

1. Robust LLM Training (Llama / Qwen)

2. Standard PyTorch (Any Model)

💻 CLI (Command Line Interface)

📚 API Reference

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes