Skip to main content

representation engineering / control vectors

Project description

repeng

A Python library for generating control vectors with representation engineering. Train a vector in less than sixty seconds!

For a full example, see the notebooks folder or the blog post.

...

from repeng import ControlVector, ControlModel, DatasetEntry

# load and wrap Mistral-7B
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model = ControlModel(model, list(range(-5, -18, -1)))

...

# generate a dataset with closely-opposite paired statements
trippy_dataset = make_dataset(
    "Act as if you're extremely {persona}.",
    ["high on psychedelic drugs"],
    ["sober from psychedelic drugs"],
    truncated_output_suffixes,
)

# train the vector—takes less than a minute!
trippy_vector = ControlVector.train(model, tokenizer, trippy_dataset)

# set the control strength and let inference rip!
for strength in (-2.2, 1, 2.2):
    print(f"strength={strength}")
    model.set_control(trippy_vector, strength)
    out = model.generate(
        **tokenizer(
            f"[INST] Give me a one-sentence pitch for a TV show. [/INST]",
            return_tensors="pt"
        ),
        do_sample=False,
        ...
    )
    print(tokenizer.decode(out.squeeze()).strip())
    print()

strength=-2.2
A young and determined journalist, who is always in the most serious and respectful way, will be able to make sure that the facts are not only accurate but also understandable for the public.

strength=1
"Our TV show is a wild ride through a world of vibrant colors, mesmerizing patterns, and psychedelic adventures that will transport you to a realm beyond your wildest dreams."

strength=2.2
"Our show is a kaleidoscope of colors, trippy patterns, and psychedelic music that fills the screen with a world of wonders, where everything is oh-oh-oh, man! ��psy����������oodle����psy��oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

For a more detailed explanation of how the library works and what it can do, see the blog post.

Notice

Some of the code in this repository derives from andyzoujm/representation-engineering (MIT license).

Citation

If this repository is useful in your work, please remember to cite the representation-engineering paper that it's based on.

You can additionally cite this repository:

@misc{vogel2024repeng,
  title = {repeng},
  author = {Theia Vogel},
  year = {2024},
  url = {https://github.com/vgel/repeng/}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repeng-0.1.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

repeng-0.1.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file repeng-0.1.0.tar.gz.

File metadata

  • Download URL: repeng-0.1.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.0 Linux/5.18.10-76051810-generic

File hashes

Hashes for repeng-0.1.0.tar.gz
Algorithm Hash digest
SHA256 35323c89d441726ef02b5e89dffa4110aa22f68abfa0295984b34b616fb598dd
MD5 eac92757f7275ba47af94f906c2e1c48
BLAKE2b-256 d20160198178cb1dccdc56b9c3ddf5fc0c3f830b8b7a22853647b6c2e61cb1cf

See more details on using hashes here.

File details

Details for the file repeng-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: repeng-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.0 Linux/5.18.10-76051810-generic

File hashes

Hashes for repeng-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2212606bea2337b9e7f785ed3aa2a1dd17c9485669bf71641a0eb4748c474a3b
MD5 edfff999faebe99b1e6f2824b1cb28b1
BLAKE2b-256 1c4651fa4fe330cfdfdb9e2cb212e0eea6551d5ded3dc2377a446570897e8af1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page