Skip to main content

Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steer vectors

Project description

LLM Steer

A Python module to steer LLM responses towards a certain topic/subject and to enhance capabilities (e.g., making it provide correct responses to tricky logical puzzles more often). A practical tool for using activation engineering by adding steer vectors to different layers of a Large Language Model (LLM). It should be used along with the transformers library.

Demo

Google Colab demo: https://colab.research.google.com/github/Mihaiii/llm_steer/blob/main/demo/llm_steer_demo.ipynb

Basic usage

Install it: pip install llm_steer Then use:

from llm_steer import Steer
steered_model = Steer(model, tokenizer)

Add a steering vector on a particular layer of the model with a given coefficient and text. The coefficient can also be negative.

steered_model.add(layer_idx=20, coeff=0.4, text="logical")

Get all the applied steering vectors:

steered_model.get_all()

Remove all steering vectors to revert to initial model:

steered_model.reset_all()

Q / A

Q: What's the difference between llm_steer and mentioning what you want in the system prompt?

A: I see llm_steer as an enhancer. It can be used together with the system prompt.


Q: How to determine the best parameters to be used?

A: I don't have a method; it's all trial and error. I recommend starting middle layers and with a small coefficient and then slowly increase it.


Q: What models are supported?

A: I tested it on multiple architectures, including LLaMa, Mistral, Phi, StableLM. Keep in mind that llm_steer is meant to be used together with HuggingFace's transformers library, so it won't work on GGUF, for example.


Q: I applied steering vectors, but the LLM outputs gibberish. What should I do?

A: Try a lower coeff value or another layer.


Q: Can I add multiple steering vectors on the same layer? Can I add the same steering vector on multiple layers? Can I add steering vectors with negative coefficients?

A: Yes, and please do. llm_steer is built for experimenting. See the Colab for examples: https://colab.research.google.com/github/Mihaiii/llm_steer/blob/main/demo/llm_steer_demo.ipynb


Q: Can I use steer vectors to enhance role-play characteristics (e.g., personas that are more funny or cocky)?

A: Yes.


Q: Can I use negative steering vectors to force it not to say "As an AI language model"?

A: Yes.

Credits / Thanks

  • DL Explorers for his video on activation engineer which goes over an article and a colab he made. The resources mentioned in that video were the starting point of llm_steer.
  • Gary Bernhardt for his excellent Python for programmers course. I needed a course that could help me go through the basics of Python without treating me like a dev noob (like most basic level tutorials treat their audience).
  • Andrej Karpathy for his State of GPT video. I always wanted to make an open-source project, but there already was a repo for every idea I had. Not when it comes to tools for LLMs, though!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_steer-2.0.1.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_steer-2.0.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file llm_steer-2.0.1.tar.gz.

File metadata

  • Download URL: llm_steer-2.0.1.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for llm_steer-2.0.1.tar.gz
Algorithm Hash digest
SHA256 5cc07257608cec21e858f568de3fd4fae71f208921e5175e178caa2fa7ba6151
MD5 38525c4647ad2840f0ad436f2599906a
BLAKE2b-256 07a479a79f429fa3ae7bc26e48d62cf71b4aadfdd4ad530365449d6b6fb27130

See more details on using hashes here.

File details

Details for the file llm_steer-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: llm_steer-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for llm_steer-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3a8c4b7724b976cca012e551100130b76e2ffb46ffe053a56bf785013455c8fa
MD5 e93277dd78585d28d722b8e57616813d
BLAKE2b-256 f561cce606aa5c645866c26ebf7386aa257b80187d69036f9f443c287a4b4aec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page