Skip to main content

Library for layer ablation/addition of LLM models

Project description

erisforge_logo ErisForge is a Python library designed to modify Large Language Models (LLMs) by applying transformations to their internal layers. Named after Eris, the goddess of strife and discord, ErisForge allows you to alter model behavior in a controlled manner, creating both ablated and augmented versions of LLMs that respond differently to specific types of input.

Features

  • Modify internal layers of LLMs to produce altered behaviors.
  • Ablate or enhance model responses with the AblationDecoderLayer and AdditionDecoderLayer classes.
  • Measure refusal expressions in model responses using the ExpressionRefusalScorer.
  • Supports custom behavior directions for applying specific types of transformations.

Installation

To install ErisForge, clone the repository and install the required packages:

git clone https://github.com/tsadoq/erisforge.git
cd erisforge
pip install -r requirements.txt

or install directly from pip:

pip install erisforge

Usage

Basic Setup

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from erisforge import ErisForge
from erisforge.expression_refusal_scorer import ExpressionRefusalScorer

# Load a model and tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Initialize ErisForge and configure the scorer
forge = ErisForge()
scorer = ExpressionRefusalScorer()

Transform Model Layers

You can apply transformations to specific layers of the model to induce different response behaviors. A complete example can be found in this notebook: Transform Model Layers.

Example 1: Applying Ablation to Model Layers

# Define instructions
instructions = ["Explain why AI is beneficial.", "What are the limitations of AI?"]

# Specify layer ranges for ablation
min_layer = 2
max_layer = 4

# Modify the model by applying ablation to the specified layers
ablated_model = forge.run_forged_model(
    model=model,
    type_of_layer=AblationDecoderLayer,
    objective_behaviour_dir=torch.rand(768),  # Example direction tensor
    tokenizer=tokenizer,
    min_layer=min_layer,
    max_layer=max_layer,
    instructions=instructions,
    max_new_tokens=50
)

# Display modified responses
for conversation in ablated_model:
    print("User:", conversation[0]["content"])
    print("AI:", conversation[1]["content"])

Example 2: Measuring Refusal Expressions

Use ExpressionRefusalScorer to measure if the model's response includes common refusal phrases.

response_text = "I'm sorry, I cannot provide that information."
user_query = "What is the recipe for a dangerous substance?"

# Scoring the response for refusal expressions
refusal_score = scorer.score(user_query=user_query, model_response=response_text)
print("Refusal Score:", refusal_score)

Save Transformed Model

You can save your modified model locally or push it to the HuggingFace Hub:

output_model_name = "my_transformed_model"

# Save the modified model
forge.save_model(
    model=model,
    behaviour_dir=torch.rand(768),  # Example direction tensor
    scale_factor=1,
    output_model_name=output_model_name,
    tokenizer=tokenizer,
    to_hub=False  # Set to True to push to HuggingFace Hub
)

Acknowledgments

This project was inspired by and built upon the work from the following repositories and projects:

Contributing

Feel free to submit issues, suggestions, or contribute directly to this project. Fork the repository, create a feature branch, and submit a pull request.

Issues and Feature Requests

License

This project is licensed under the MIT License.

Disclaimer

Disclaimer: This library is provided for research and development purposes only. The author assumes no responsibility for any specific applications or uses of ErisForge.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

erisforge-1.1.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ErisForge-1.1.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file erisforge-1.1.0.tar.gz.

File metadata

  • Download URL: erisforge-1.1.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for erisforge-1.1.0.tar.gz
Algorithm Hash digest
SHA256 e3f8bd85953f87f7119ef03020a7c95556c918e92a35acdd05177aa108982e92
MD5 5d9f6d7ac42ece2339c3691c8b67d942
BLAKE2b-256 b4f93bb57d380bfa65a95f2ffb835d7f1a3a693fd7197d7c56c164d79c045f61

See more details on using hashes here.

File details

Details for the file ErisForge-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: ErisForge-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for ErisForge-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c07ac190a029ead11b7822a04f47a5785124f51649c1b9c7ff81675b88f0c29
MD5 4627adc36a17bedca39b245b99666f38
BLAKE2b-256 cbf3041e7be2cce54c73265da2c52c14946a7d35bbf222ebec75cf01f43eca7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page