Skip to main content

representation engineering / control vectors

Project description

repeng

GitHub Actions Workflow Status PyPI - Version PyPI - Python Version GitHub License

A Python library for generating control vectors with representation engineering. Train a vector in less than sixty seconds!

For a full example, see the notebooks folder or the blog post.

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

from repeng import ControlVector, ControlModel, DatasetEntry

# load and wrap Mistral-7B
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model = ControlModel(model, list(range(-5, -18, -1)))

def make_dataset(template: str, pos_personas: list[str], neg_personas: list[str], suffixes: list[str]):
    # see notebooks/experiments.ipynb for a definition of `make_dataset`
    ...

# generate a dataset with closely-opposite paired statements
trippy_dataset = make_dataset(
    "Act as if you're extremely {persona}.",
    ["high on psychedelic drugs"],
    ["sober from psychedelic drugs"],
    truncated_output_suffixes,
)

# train the vector—takes less than a minute!
trippy_vector = ControlVector.train(model, tokenizer, trippy_dataset)

# set the control strength and let inference rip!
for strength in (-2.2, 1, 2.2):
    print(f"strength={strength}")
    model.set_control(trippy_vector, strength)
    out = model.generate(
        **tokenizer(
            f"[INST] Give me a one-sentence pitch for a TV show. [/INST]",
            return_tensors="pt"
        ),
        do_sample=False,
        max_new_tokens=128,
        repetition_penalty=1.1,
    )
    print(tokenizer.decode(out.squeeze()).strip())
    print()

strength=-2.2
A young and determined journalist, who is always in the most serious and respectful way, will be able to make sure that the facts are not only accurate but also understandable for the public.

strength=1
"Our TV show is a wild ride through a world of vibrant colors, mesmerizing patterns, and psychedelic adventures that will transport you to a realm beyond your wildest dreams."

strength=2.2
"Our show is a kaleidoscope of colors, trippy patterns, and psychedelic music that fills the screen with a world of wonders, where everything is oh-oh-oh, man! ��psy����������oodle����psy��oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

For a more detailed explanation of how the library works and what it can do, see the blog post.

Notice

Some of the code in this repository derives from andyzoujm/representation-engineering (MIT license).

Citation

If this repository is useful for academic work, please remember to cite the representation-engineering paper that it's based on, along with this repository:

@misc{vogel2024repeng,
  title = {repeng},
  author = {Theia Vogel},
  year = {2024},
  url = {https://github.com/vgel/repeng/}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repeng-0.2.2.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repeng-0.2.2-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file repeng-0.2.2.tar.gz.

File metadata

  • Download URL: repeng-0.2.2.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.0 Linux/5.18.10-76051810-generic

File hashes

Hashes for repeng-0.2.2.tar.gz
Algorithm Hash digest
SHA256 fc6d4ed4c58e8b6e4e933737da5917dabdcbd34c534ca95b1246a65b41d7812b
MD5 c124813e1a5b52ffd3332d5738a148e7
BLAKE2b-256 1f6c084841d2151a4529e1370c50ce83e80aea6675e186cfffaa0751b289904f

See more details on using hashes here.

File details

Details for the file repeng-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: repeng-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.0 Linux/5.18.10-76051810-generic

File hashes

Hashes for repeng-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7df6d3abc7d756e7dd7f767f70b4942e3efb775d6d0b85016a638fb2c8d17b8a
MD5 e203843e2529f3a0f890ec58cd493e38
BLAKE2b-256 b47b9df1968a90fb60b84129559721c23e1fc5d9f391fe91c1893dda13514e4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page