Skip to main content

representation engineering / control vectors

Reason this release was yanked:

botched

Project description

repeng

GitHub Actions Workflow Status PyPI - Version PyPI - Python Version GitHub License

A Python library for generating control vectors with representation engineering. Train a vector in less than sixty seconds!

For a full example, see the notebooks folder or the blog post.

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

from repeng import ControlVector, ControlModel, DatasetEntry

# load and wrap Mistral-7B
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model = ControlModel(model, list(range(-5, -18, -1)))

def make_dataset(template: str, pos_personas: list[str], neg_personas: list[str], suffixes: list[str]):
    # see notebooks/experiments.ipynb for a definition of `make_dataset`
    ...

# generate a dataset with closely-opposite paired statements
trippy_dataset = make_dataset(
    "Act as if you're extremely {persona}.",
    ["high on psychedelic drugs"],
    ["sober from psychedelic drugs"],
    truncated_output_suffixes,
)

# train the vector—takes less than a minute!
trippy_vector = ControlVector.train(model, tokenizer, trippy_dataset)

# set the control strength and let inference rip!
for strength in (-2.2, 1, 2.2):
    print(f"strength={strength}")
    model.set_control(trippy_vector, strength)
    out = model.generate(
        **tokenizer(
            f"[INST] Give me a one-sentence pitch for a TV show. [/INST]",
            return_tensors="pt"
        ),
        do_sample=False,
        max_new_tokens=128,
        repetition_penalty=1.1,
    )
    print(tokenizer.decode(out.squeeze()).strip())
    print()

strength=-2.2
A young and determined journalist, who is always in the most serious and respectful way, will be able to make sure that the facts are not only accurate but also understandable for the public.

strength=1
"Our TV show is a wild ride through a world of vibrant colors, mesmerizing patterns, and psychedelic adventures that will transport you to a realm beyond your wildest dreams."

strength=2.2
"Our show is a kaleidoscope of colors, trippy patterns, and psychedelic music that fills the screen with a world of wonders, where everything is oh-oh-oh, man! ��psy����������oodle����psy��oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

For a more detailed explanation of how the library works and what it can do, see the blog post.

Notes

  • For a list of changes by version, see the CHANGELOG.
  • For quantized use, you may be interested in llama.cpp#5970—after training a vector with repeng, export it by calling vector.export_gguf(filename) and then use it in llama.cpp with any quant!
  • Vector training currently does not work with MoE models (such as Mixtral). (This is theoretically fixable with some work, let me know if you're interested.)

Notice

Some of the code in this repository derives from andyzoujm/representation-engineering (MIT license).

Citation

If this repository is useful for academic work, please remember to cite the representation-engineering paper that it's based on, along with this repository:

@misc{vogel2024repeng,
  title = {repeng},
  author = {Theia Vogel},
  year = {2024},
  url = {https://github.com/vgel/repeng/}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repeng-0.3.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

repeng-0.3.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file repeng-0.3.0.tar.gz.

File metadata

  • Download URL: repeng-0.3.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.0 Linux/5.18.10-76051810-generic

File hashes

Hashes for repeng-0.3.0.tar.gz
Algorithm Hash digest
SHA256 09f7c7753edc8116efdd5b38bfbdf699aab821fe62525c0501914df3c04ebfaa
MD5 0fe4befeff5ee68e36aec582c1911162
BLAKE2b-256 3f63f219d9f8724f1f47b38f9395173bcf036f1e06cfa0bb8f84785d96350296

See more details on using hashes here.

File details

Details for the file repeng-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: repeng-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.0 Linux/5.18.10-76051810-generic

File hashes

Hashes for repeng-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca260b524f42870af526310d82ac1965dbc016773632189e8c6e3f37dd584e02
MD5 0ea6c2887cac87a93b779cc8225087e8
BLAKE2b-256 c8e8ab73b05d715c596adc9c9d82cf9c0404612347d7ad90ce8339d395679a43

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page