Skip to main content

Locking activations via k-sparse autoencoders.

Project description

Locking_Backdoors_Via_Steering_Language_Models

Suppresses backdoors in reward models, by locking to base models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

locking_activations-0.1.0.tar.gz (2.6 kB view details)

Uploaded Source

Built Distribution

locking_activations-0.1.0-py3-none-any.whl (2.9 kB view details)

Uploaded Python 3

File details

Details for the file locking_activations-0.1.0.tar.gz.

File metadata

  • Download URL: locking_activations-0.1.0.tar.gz
  • Upload date:
  • Size: 2.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.11

File hashes

Hashes for locking_activations-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b6f3e7562fdbfa07f370499a9fb846cfa25556625d49cb210f0c1f9ced1ba6db
MD5 b6269841311c808e6e6e33b1b83f15f5
BLAKE2b-256 9d0b9c0c62902debdcd1a0cfc5ee266624bdd2d3a5c68d4ca9105ae96fa45a0b

See more details on using hashes here.

File details

Details for the file locking_activations-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for locking_activations-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 115407effbe28e779a3252cd260acd7d47f3073ec4d0f490078ee9cb91bb5b0d
MD5 8a41b1e89cba2a0bd7857bd1081c5f08
BLAKE2b-256 c699928f31b721096f5d9f0405d22489bd44a2eecccc341266dc9f969493c791

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page