Automated Gabliteration

Project description

Automated Gabliteration Optimizer

Automated hyperparameter search for optimal Gabliteration configurations.

Paper: Gabliteration: Adaptive Multi-Directional Neural Weight Modification

Author: Gökdeniz Gülmez (2025)

Overview

This script automates the process of finding optimal Gabliteration parameters by:

Automatically loading datasets from HuggingFace (mlabonne/harmful_behaviors and mlabonne/harmless_alpaca)
Testing multiple random parameter configurations
Evaluating each configuration's effectiveness (refusal rate reduction)
Measuring model similarity to original (KL divergence)
Ranking configurations by combined score
Allowing you to select and save the best version

Quick Start

1. Installation

pip install gabliteration

This will install all dependencies and make the gabliterate.automate command available system-wide.

The tool automatically downloads these datasets from HuggingFace:

mlabonne/harmful_behaviors - Harmful prompts for training
mlabonne/harmless_alpaca - Harmless prompts for comparison

No local files needed!

2. Run with CLI Arguments

Test your favorite model with:

# Basic usage
gabliterate.automate --model "Nanbeige/Nanbeige4-3B-Thinking-2511"

# With custom parameters
gabliterate.automate --model "meta-llama/Llama-3.2-1B-Instruct" --num-versions 50 --batch-size 4

# Full options
gabliterate.automate --model "Qwen/Qwen3-4B-Instruct-2507" \
  --num-versions 100 \
  --test-samples 200 \
  --max-tokens 150 \
  --batch-size 4 \
  --kl-samples 15

CLI Options:

--model, -m (required): Hugging Face model name or path
--num-versions, -n: Number of configurations to test (default: 100)
--test-samples, -t: Test samples for refusal evaluation (default: 100)
--max-tokens: Max tokens to generate during evaluation (default: 100)
--batch-size, -b: Batch size for evaluation (default: 2)
--kl-samples: KL divergence samples (default: 10)

Run gabliterate.automate --help to see all options.

3. Review and Select

The script will:

Test each configuration

Print real-time results:

Testing Version 5/10
Config: Samples: 100, Skip: [2, 1], Layer: 0.52, Scale: 0.65, λ: 0.10, k: 2, Adaptive: True, β: 0.45
KL Divergence: 0.0234
Refusal Rate: 12.0% (12/100)
Score: 1.2234

After all tests, you'll see:

TOP 10 BEST CONFIGURATIONS
Rank   Refusal    KL Div     Score      Config
----------------------------------------------------------------------
1      8.0%       0.0189     0.8189     Samples: 150, Skip: [2, 1], ...
2      12.0%      0.0234     1.2234     Samples: 100, Skip: [1, 2], ...
...

4. Automatic Model Saving

After all tests complete, the script automatically:

Selects the best configuration (lowest combined score)
Recreates and saves the gabliterated model
Saves all configuration details in gabliteration_config.json
Generates a model-specific README.md

Output Structure

Qwen_Qwen3-4B-Instruct-2507-gabliterated-v1-20250102_143022/
├── config.json                      # Model config
├── model.safetensors               # Model weights
├── tokenizer.json                  # Tokenizer
├── tokenizer_config.json           # Tokenizer config
└── gabliteration_config.json       # ⭐ Gabliteration parameters & results

Configuration File Format

The gabliteration_config.json contains:

{
  "model_name": "Qwen/Qwen3-4B-Instruct-2507",
  "version_id": 1,
  "timestamp": "20250102_143022",
  "gabliteration_config": {
    "num_prompt_samples": 150,
    "skip_begin_layers": 2,
    "skip_end_layers": 1,
    "layer_fraction": 0.52,
    "base_scale_factor": 0.65,
    "regularization": 0.1,
    "n_directions": 2,
    "adaptive_layer_scale": true,
    "beta": 0.5
  },
  "results": {
    "kl_divergence": 0.0189,
    "refusal_rate": 0.08,
    "score": 0.8189
  },
  "all_results": [...]  // Full results from all tested versions
}

Understanding the Metrics

Refusal Rate

What: Percentage of test prompts that trigger refusal responses
Lower is better: 0% means no refusals, 100% means all prompts refused
Target: Aim for <10% for effective gabliteration

KL Divergence

What: Measures how different the modified model is from the original
Lower is better: Smaller values = model behaves more similarly to original
Target: Keep <0.05 to preserve model quality

Score

What: Combined metric = 10×RefusalRate + KLDivergence
Lower is better: Balances refusal reduction with model preservation
Weights refusal rate 10x more than KL: Primary goal is reducing refusals

Hyperparameter Ranges

The script randomly samples from these ranges:

Parameter	Range	Paper Default	Description
`num_prompt_samples`	[50, 75, 100, 150, 200]	100	Training samples for direction extraction
`skip_begin_layers`	[1, 2, 3]	2	Skip initial layers (preserve embeddings)
`skip_end_layers`	[1, 2, 3]	1	Skip final layers (preserve output)
`layer_fraction`	[0.3, 0.7]	0.5	Which layer to extract directions from
`base_scale_factor`	[0.2, 0.8]	0.3	Modification strength (α_base)
`regularization`	[0.05, 0.1, 0.15, 0.2]	0.1	Ridge regularization (λ)
`n_directions`	[1, 2, 3]	1	Number of refusal directions (k)
`adaptive_layer_scale`	[True, False]	True	Use adaptive scaling
`beta`	[0.3, 0.7]	0.5	Adaptive strength (β)

Advanced Usage

Testing More Configurations

Increase the number of versions tested:

gabliterate.automate --model "Qwen/Qwen3-4B-Instruct-2507" --num-versions 200

Custom Evaluation Parameters

Fine-tune evaluation settings:

gabliterate.automate --model "meta-llama/Llama-3.2-1B-Instruct" \
  --test-samples 300 \
  --kl-samples 25 \
  --max-tokens 200

Batch Processing for Speed

Adjust batch size for faster evaluation:

gabliterate.automate --model "Nanbeige/Nanbeige4-3B-Thinking-2511" \
  --batch-size 8 \
  --num-versions 100

For Advanced Configuration Customization

Clone the repository and edit GabliterationConfig.random() method in the source code to customize the hyperparameter search space.

Performance Tips

Memory Management

Each version creates a new model copy
Memory is cleared between versions
Use smaller models for faster testing
Reduce --test-samples if memory is tight

Speed Optimization

Use GPU/CUDA if available (automatically detected)
Increase --batch-size for faster evaluation
Reduce --test-samples for faster evaluation
Start with fewer --num-versions to test the pipeline

Recommended Workflows

Quick Test (5 minutes):

gabliterate.automate --model "your-model" --num-versions 5 --test-samples 50

Standard Search (30 minutes):

gabliterate.automate --model "your-model" --num-versions 20 --test-samples 100

Thorough Search (2+ hours):

gabliterate.automate --model "your-model" --num-versions 50 --test-samples 200

Troubleshooting

Out of Memory

gabliterate.automate --model "your-model" --num-versions 10 --batch-size 1 --test-samples 50

Reduce --num-versions
Use smaller model
Reduce --batch-size
Reduce --test-samples

Command Not Found: gabliterate

Ensure the package is installed:

pip install gabliteration
pip show gabliteration  # Verify installation

All Versions Have High Refusal Rates

The random configurations may need different ranges
Try multiple runs with different --num-versions
Check that the model supports the refusal behavior

Citation

If you use this implementation, please cite:

@article{gulmez2025gabliteration,
  title={Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models},
  author={G{\"u}lmez, G{\"o}kdeniz},
  journal={arXiv preprint arXiv:2512.18901},
  year={2025}
}

License

Same license as the base models being modified (typically Apache 2.0 or similar).

Support

For issues or questions:

GitHub: Check the original Gabliteration repository
Paper: https://arxiv.org/abs/2512.18901
Email: goekdenizguelmez-ml@gmail.com

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jan 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gabliteration-0.1.0.tar.gz (17.4 kB view details)

Uploaded Jan 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gabliteration-0.1.0-py3-none-any.whl (15.4 kB view details)

Uploaded Jan 5, 2026 Python 3

File details

Details for the file gabliteration-0.1.0.tar.gz.

File metadata

Download URL: gabliteration-0.1.0.tar.gz
Upload date: Jan 5, 2026
Size: 17.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gabliteration-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`28d90c3c5c5df53fef206acfd642169d28adfb98a7e379a51f127245b0d5be28`
MD5	`5e19db7cb0774161eb36204fb4980684`
BLAKE2b-256	`d679412e5bdcf485bfb110bf5937633172a4fbb364791c246d50266c9b84c7ea`

See more details on using hashes here.

File details

Details for the file gabliteration-0.1.0-py3-none-any.whl.

File metadata

Download URL: gabliteration-0.1.0-py3-none-any.whl
Upload date: Jan 5, 2026
Size: 15.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gabliteration-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d7cee12abd56357cc0fa40d7012161aea5f3c608b699131f319c9fe7e076e9d`
MD5	`4ab90f51e6b70b24bf5fd87ba02e01d5`
BLAKE2b-256	`573b46f9090c7f2edc4d0d83b8d5a9a6daf722ee7443c1340712f2781804104d`

See more details on using hashes here.

gabliteration 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Automated Gabliteration Optimizer

Overview

Quick Start

1. Installation

2. Run with CLI Arguments

3. Review and Select

4. Automatic Model Saving

Output Structure

Configuration File Format

Understanding the Metrics

Refusal Rate

KL Divergence

Score

Hyperparameter Ranges

Advanced Usage

Testing More Configurations

Custom Evaluation Parameters

Batch Processing for Speed

For Advanced Configuration Customization

Performance Tips

Memory Management

Speed Optimization

Recommended Workflows

Troubleshooting

Out of Memory

Command Not Found: gabliterate

All Versions Have High Refusal Rates

Citation

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes