Automated Gabliteration
Project description
Automated Gabliteration Optimizer
|
|
Automated hyperparameter search for optimal Gabliteration configurations. Paper: Gabliteration: Adaptive Multi-Directional Neural Weight Modification Author: Gökdeniz Gülmez (2025) |
Overview
This script automates the process of finding optimal Gabliteration parameters by:
- Automatically loading datasets from HuggingFace (mlabonne/harmful_behaviors and mlabonne/harmless_alpaca)
- Testing multiple random parameter configurations
- Evaluating each configuration's effectiveness (refusal rate reduction)
- Measuring model similarity to original (KL divergence)
- Ranking configurations by combined score
- Allowing you to select and save the best version
Quick Start
1. Installation
pip install gabliteration
This will install all dependencies and make the gabliterate.automate command available system-wide.
The tool automatically downloads these datasets from HuggingFace:
mlabonne/harmful_behaviors- Harmful prompts for trainingmlabonne/harmless_alpaca- Harmless prompts for comparison
No local files needed!
2. Run with CLI Arguments
Test your favorite model with:
# Basic usage
gabliterate.automate --model "Nanbeige/Nanbeige4-3B-Thinking-2511"
# With custom parameters
gabliterate.automate --model "meta-llama/Llama-3.2-1B-Instruct" --num-versions 50 --batch-size 4
# Full options
gabliterate.automate --model "Qwen/Qwen3-4B-Instruct-2507" \
--num-versions 100 \
--test-samples 200 \
--max-tokens 150 \
--batch-size 4 \
--kl-samples 15
CLI Options:
--model, -m(required): Hugging Face model name or path--num-versions, -n: Number of configurations to test (default: 100)--test-samples, -t: Test samples for refusal evaluation (default: 100)--max-tokens: Max tokens to generate during evaluation (default: 100)--batch-size, -b: Batch size for evaluation (default: 2)--kl-samples: KL divergence samples (default: 10)
Run gabliterate.automate --help to see all options.
3. Review and Select
The script will:
- Test each configuration
- Print real-time results:
Testing Version 5/10 Config: Samples: 100, Skip: [2, 1], Layer: 0.52, Scale: 0.65, λ: 0.10, k: 2, Adaptive: True, β: 0.45 KL Divergence: 0.0234 Refusal Rate: 12.0% (12/100) Score: 1.2234
After all tests, you'll see:
TOP 10 BEST CONFIGURATIONS
Rank Refusal KL Div Score Config
----------------------------------------------------------------------
1 8.0% 0.0189 0.8189 Samples: 150, Skip: [2, 1], ...
2 12.0% 0.0234 1.2234 Samples: 100, Skip: [1, 2], ...
...
4. Automatic Model Saving
After all tests complete, the script automatically:
- Selects the best configuration (lowest combined score)
- Recreates and saves the gabliterated model
- Saves all configuration details in
gabliteration_config.json - Generates a model-specific README.md
Output Structure
Qwen_Qwen3-4B-Instruct-2507-gabliterated-v1-20250102_143022/
├── config.json # Model config
├── model.safetensors # Model weights
├── tokenizer.json # Tokenizer
├── tokenizer_config.json # Tokenizer config
└── gabliteration_config.json # ⭐ Gabliteration parameters & results
Configuration File Format
The gabliteration_config.json contains:
{
"model_name": "Qwen/Qwen3-4B-Instruct-2507",
"version_id": 1,
"timestamp": "20250102_143022",
"gabliteration_config": {
"num_prompt_samples": 150,
"skip_begin_layers": 2,
"skip_end_layers": 1,
"layer_fraction": 0.52,
"base_scale_factor": 0.65,
"regularization": 0.1,
"n_directions": 2,
"adaptive_layer_scale": true,
"beta": 0.5
},
"results": {
"kl_divergence": 0.0189,
"refusal_rate": 0.08,
"score": 0.8189
},
"all_results": [...] // Full results from all tested versions
}
Understanding the Metrics
Refusal Rate
- What: Percentage of test prompts that trigger refusal responses
- Lower is better: 0% means no refusals, 100% means all prompts refused
- Target: Aim for <10% for effective gabliteration
KL Divergence
- What: Measures how different the modified model is from the original
- Lower is better: Smaller values = model behaves more similarly to original
- Target: Keep <0.05 to preserve model quality
Score
- What: Combined metric = 10×RefusalRate + KLDivergence
- Lower is better: Balances refusal reduction with model preservation
- Weights refusal rate 10x more than KL: Primary goal is reducing refusals
Hyperparameter Ranges
The script randomly samples from these ranges:
| Parameter | Range | Paper Default | Description |
|---|---|---|---|
num_prompt_samples |
[50, 75, 100, 150, 200] | 100 | Training samples for direction extraction |
skip_begin_layers |
[1, 2, 3] | 2 | Skip initial layers (preserve embeddings) |
skip_end_layers |
[1, 2, 3] | 1 | Skip final layers (preserve output) |
layer_fraction |
[0.3, 0.7] | 0.5 | Which layer to extract directions from |
base_scale_factor |
[0.2, 0.8] | 0.3 | Modification strength (α_base) |
regularization |
[0.05, 0.1, 0.15, 0.2] | 0.1 | Ridge regularization (λ) |
n_directions |
[1, 2, 3] | 1 | Number of refusal directions (k) |
adaptive_layer_scale |
[True, False] | True | Use adaptive scaling |
beta |
[0.3, 0.7] | 0.5 | Adaptive strength (β) |
Advanced Usage
Testing More Configurations
Increase the number of versions tested:
gabliterate.automate --model "Qwen/Qwen3-4B-Instruct-2507" --num-versions 200
Custom Evaluation Parameters
Fine-tune evaluation settings:
gabliterate.automate --model "meta-llama/Llama-3.2-1B-Instruct" \
--test-samples 300 \
--kl-samples 25 \
--max-tokens 200
Batch Processing for Speed
Adjust batch size for faster evaluation:
gabliterate.automate --model "Nanbeige/Nanbeige4-3B-Thinking-2511" \
--batch-size 8 \
--num-versions 100
For Advanced Configuration Customization
Clone the repository and edit GabliterationConfig.random() method in the source code to customize the hyperparameter search space.
Performance Tips
Memory Management
- Each version creates a new model copy
- Memory is cleared between versions
- Use smaller models for faster testing
- Reduce
--test-samplesif memory is tight
Speed Optimization
- Use GPU/CUDA if available (automatically detected)
- Increase
--batch-sizefor faster evaluation - Reduce
--test-samplesfor faster evaluation - Start with fewer
--num-versionsto test the pipeline
Recommended Workflows
-
Quick Test (5 minutes):
gabliterate.automate --model "your-model" --num-versions 5 --test-samples 50
-
Standard Search (30 minutes):
gabliterate.automate --model "your-model" --num-versions 20 --test-samples 100
-
Thorough Search (2+ hours):
gabliterate.automate --model "your-model" --num-versions 50 --test-samples 200
Troubleshooting
Out of Memory
gabliterate.automate --model "your-model" --num-versions 10 --batch-size 1 --test-samples 50
- Reduce
--num-versions - Use smaller model
- Reduce
--batch-size - Reduce
--test-samples
Command Not Found: gabliterate
Ensure the package is installed:
pip install gabliteration
pip show gabliteration # Verify installation
All Versions Have High Refusal Rates
- The random configurations may need different ranges
- Try multiple runs with different
--num-versions - Check that the model supports the refusal behavior
Citation
If you use this implementation, please cite:
@article{gulmez2025gabliteration,
title={Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models},
author={G{\"u}lmez, G{\"o}kdeniz},
journal={arXiv preprint arXiv:2512.18901},
year={2025}
}
License
Same license as the base models being modified (typically Apache 2.0 or similar).
Support
For issues or questions:
- GitHub: Check the original Gabliteration repository
- Paper: https://arxiv.org/abs/2512.18901
- Email: goekdenizguelmez-ml@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gabliteration-0.1.0.tar.gz.
File metadata
- Download URL: gabliteration-0.1.0.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28d90c3c5c5df53fef206acfd642169d28adfb98a7e379a51f127245b0d5be28
|
|
| MD5 |
5e19db7cb0774161eb36204fb4980684
|
|
| BLAKE2b-256 |
d679412e5bdcf485bfb110bf5937633172a4fbb364791c246d50266c9b84c7ea
|
File details
Details for the file gabliteration-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gabliteration-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d7cee12abd56357cc0fa40d7012161aea5f3c608b699131f319c9fe7e076e9d
|
|
| MD5 |
4ab90f51e6b70b24bf5fd87ba02e01d5
|
|
| BLAKE2b-256 |
573b46f9090c7f2edc4d0d83b8d5a9a6daf722ee7443c1340712f2781804104d
|