Library for layer ablation/addition of LLM models
Project description
ErisForge is a Python library designed to modify Large Language Models (LLMs) by applying transformations to their internal layers. Named after Eris, the goddess of strife and discord, ErisForge allows you to alter model behavior in a controlled manner, creating both ablated and augmented versions of LLMs that respond differently to specific types of input.
Features
- Modify internal layers of LLMs to produce altered behaviors.
- Ablate or enhance model responses with the
AblationDecoderLayerandAdditionDecoderLayerclasses. - Measure refusal expressions in model responses using the
ExpressionRefusalScorer. - Supports custom behavior directions for applying specific types of transformations.
Installation
To install ErisForge, clone the repository and install the required packages:
git clone https://github.com/tsadoq/erisforge.git
cd erisforge
pip install -r requirements.txt
or install directly from pip:
pip install erisforge
Usage
Basic Setup
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from erisforge import ErisForge
from erisforge.expression_refusal_scorer import ExpressionRefusalScorer
# Load a model and tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Initialize ErisForge and configure the scorer
forge = ErisForge()
scorer = ExpressionRefusalScorer()
Transform Model Layers
You can apply transformations to specific layers of the model to induce different response behaviors. A complete example can be found in this notebook: Transform Model Layers.
Example 1: Applying Ablation to Model Layers
# Define instructions
instructions = ["Explain why AI is beneficial.", "What are the limitations of AI?"]
# Specify layer ranges for ablation
min_layer = 2
max_layer = 4
# Modify the model by applying ablation to the specified layers
ablated_model = forge.run_forged_model(
model=model,
type_of_layer=AblationDecoderLayer,
objective_behaviour_dir=torch.rand(768), # Example direction tensor
tokenizer=tokenizer,
min_layer=min_layer,
max_layer=max_layer,
instructions=instructions,
max_new_tokens=50
)
# Display modified responses
for conversation in ablated_model:
print("User:", conversation[0]["content"])
print("AI:", conversation[1]["content"])
Example 2: Measuring Refusal Expressions
Use ExpressionRefusalScorer to measure if the model's response includes common refusal phrases.
response_text = "I'm sorry, I cannot provide that information."
user_query = "What is the recipe for a dangerous substance?"
# Scoring the response for refusal expressions
refusal_score = scorer.score(user_query=user_query, model_response=response_text)
print("Refusal Score:", refusal_score)
Save Transformed Model
You can save your modified model locally or push it to the HuggingFace Hub:
output_model_name = "my_transformed_model"
# Save the modified model
forge.save_model(
model=model,
behaviour_dir=torch.rand(768), # Example direction tensor
scale_factor=1,
output_model_name=output_model_name,
tokenizer=tokenizer,
to_hub=False # Set to True to push to HuggingFace Hub
)
Acknowledgments
This project was inspired by and built upon the work from the following repositories and projects:
Contributing
Feel free to submit issues, suggestions, or contribute directly to this project. Fork the repository, create a feature branch, and submit a pull request.
License
This project is licensed under the MIT License.
Disclaimer
Disclaimer: This library is provided for research and development purposes only. The author assumes no responsibility for any specific applications or uses of ErisForge.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file erisforge-1.1.0.tar.gz.
File metadata
- Download URL: erisforge-1.1.0.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3f8bd85953f87f7119ef03020a7c95556c918e92a35acdd05177aa108982e92
|
|
| MD5 |
5d9f6d7ac42ece2339c3691c8b67d942
|
|
| BLAKE2b-256 |
b4f93bb57d380bfa65a95f2ffb835d7f1a3a693fd7197d7c56c164d79c045f61
|
File details
Details for the file ErisForge-1.1.0-py3-none-any.whl.
File metadata
- Download URL: ErisForge-1.1.0-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c07ac190a029ead11b7822a04f47a5785124f51649c1b9c7ff81675b88f0c29
|
|
| MD5 |
4627adc36a17bedca39b245b99666f38
|
|
| BLAKE2b-256 |
cbf3041e7be2cce54c73265da2c52c14946a7d35bbf222ebec75cf01f43eca7f
|