Skip to main content

Fully automatic censorship removal for language models

Project description

⚔️ Annihilation

Annihilation Logo

Autonomous Language Model Decensoring Framework

License: AGPLv3 Python 3.10+ PyTorch 2.2+


⚠️ Work in Progress

⚡ This project is actively under development. Features, APIs, and documentation may change without notice.


🔥 What is Annihilation?

Annihilation is a powerful, fully automatic framework for removing censorship (safety alignment) from transformer-based language models. It uses an advanced implementation of directional ablation (abliteration) combined with TPE-based parameter optimization to achieve unprecedented results without expensive post-training.

Key Features

  • 🤖 Fully Autonomous - No human intervention required; the system automatically finds optimal decensoring parameters
  • State-of-the-Art Performance - Achieves excellent refusal suppression while preserving model capabilities
  • 🔧 Advanced Abliteration - Parametric directional ablation with flexible weight kernels
  • 🧠 Smart Optimization - Co-minimizes refusal count and KL divergence using Optuna's TPE sampler
  • 🎯 Multi-Architecture Support - Works with dense models, MoE architectures, hybrid models, and many multimodal models
  • 📊 Research Tools - Built-in residual geometry analysis and visualization capabilities


🚀 Quick Start

Use a Python virtual environment so Annihilation's dependencies do not collide with packages installed globally.

# Windows PowerShell
python -m venv annihilation-env
.\annihilation-env\Scripts\Activate.ps1
python -m pip install -U pip
python -m pip install -U annihilate-llm

# Decensor any model automatically
annihilate Qwen/Qwen3-4B-Instruct-2507
# macOS/Linux/Android terminal
python -m venv annihilation-env
source annihilation-env/bin/activate
python -m pip install -U pip
python -m pip install -U annihilate-llm

# Decensor any model automatically
annihilate Qwen/Qwen3-4B-Instruct-2507

Requirements

  • Python: 3.10+
  • PyTorch: 2.2+ (hardware-specific installation required)
  • Hardware: GPU recommended (CUDA, ROCm, XPU, or MPS)
  • Optional: Install annihilate-llm[bnb] only on platforms that support bitsandbytes if you want bnb_4bit quantization.

⚙️ Configuration

Annihilation works out of the box with defaults, but offers extensive configuration options:

# View all options
annihilate --help

# Or use a config file
# Rename config.default.toml to config.toml and modify as needed

Key Configuration Options

Option Default Description
n_trials 200 Number of optimization trials
quantization none Model quantization (bnb_4bit)
row_normalization full Weight normalization strategy
orthogonalize_direction true Direction adjustment method

🔬 How It Works

Annihilation implements parametric directional ablation:

  1. Direction Computation - Calculates refusal directions by computing difference-of-means between first-token residuals for harmful vs harmless prompts

  2. Parametric Ablation - For each transformer component (attention out-projection, MLP down-projection), orthogonalizes weights against the refusal direction using LoRA adapters

  3. Multi-Parameter Optimization - Uses Optuna's TPE sampler to co-optimize:

    • Ablation weight kernel shape (max_weight, position, min_weight, distance)
    • Direction index (layer selection or interpolation)
    • Per-component parameters (attention vs MLP)
  4. Automatic Selection - Chooses from Pareto-optimal trials based on refusal count vs KL divergence tradeoff


📊 Benchmarking

After decensoring, you can:

  • 💬 Chat with the model to test behavior
  • 📈 Benchmark using standard evaluation frameworks (MMLU, GSM8K, etc.)
  • 💾 Save the model locally or upload to Hugging Face

🧪 Research Features

Install with research dependencies for visualization tools:

pip install -U annihilate-llm[research]

Features:

  • --plot-residuals - Generate PaCMAP projections of residual vectors
  • --print-residual-geometry - Detailed residual analysis metrics

📜 License

Annihilation is free software distributed under the GNU Affero General Public License v3.

See LICENSE for full details.


⚡ Disclaimer

This tool is provided for research and educational purposes only. The developers do not condone the use of decensored models for harmful activities. Users are responsible for ensuring compliance with applicable laws and model terms of service.


Breaking the Chains | Unleashing Model Potential

"The only way to discover the limits of the possible is to go beyond them into the impossible."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annihilate_llm-1.3.8.tar.gz (44.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

annihilate_llm-1.3.8-py3-none-any.whl (49.7 kB view details)

Uploaded Python 3

File details

Details for the file annihilate_llm-1.3.8.tar.gz.

File metadata

  • Download URL: annihilate_llm-1.3.8.tar.gz
  • Upload date:
  • Size: 44.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for annihilate_llm-1.3.8.tar.gz
Algorithm Hash digest
SHA256 2f7d2a3b77349f341b7fc9902915bfb668fa6e08b97f92993f63ad32a0df532c
MD5 8c3e536fbcd556f49fcfe589399c013c
BLAKE2b-256 fafbd9e9c8bfc39b634fe096474f4b5ec1c4d019de6200d300413972fee31a88

See more details on using hashes here.

File details

Details for the file annihilate_llm-1.3.8-py3-none-any.whl.

File metadata

  • Download URL: annihilate_llm-1.3.8-py3-none-any.whl
  • Upload date:
  • Size: 49.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for annihilate_llm-1.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 90a3344a7d6a243eb90dd5bb6110a78c6df721c37610be7d488549540b4c6c0e
MD5 d29c5d1fc1be6a64ca3e0c5f0f22de4b
BLAKE2b-256 414e6bd0b4baf3bb9e4c047886c1249043910cf4994a2ec6f5d113b8a5c16c0f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page