Skip to main content

Tools for applying circuits-style interpretability techniques to RL agents.

Project description

CircRL

A small library of mech interp tools, primarily focused on interpreting RL policies (though most of the tools are general).

The library has three main components:

  • Hooks: Tools for hooking into PyTorch models that provide simple, safe wrappers around PyTorch forward hooks functionality and allow easy caching, patching and arbitrary hook functions.
  • Probing: Tools for training linear probes on model activations (or any other data), including sparse probes.
  • Rollouts: Tools for running rollouts and collectiong various kinds of data through a unified interface.

Installation

CircRL is available on PyPI, and can be installed with pip:

pip install circrl

Usage

A detailed, self-contained demo of CircRL is available in the CircRL demo notebook.

License

CircRL is licensed under the MIT license.

Citation

If you use CircRL in your research, please cite according to:

@misc{circrl,
  author    = {MacDiarmid, Monte},
  title     = {CircRL},
  year      = {2023},
  url       = {https://github.com/montemac/circrl}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

circrl-1.0.0.tar.gz (149.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

circrl-1.0.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file circrl-1.0.0.tar.gz.

File metadata

  • Download URL: circrl-1.0.0.tar.gz
  • Upload date:
  • Size: 149.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for circrl-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5ffef4e966564983683092697ecdc2240ce2e546e7f994aa9e2860d94df26ae4
MD5 817f02e048c3506f663116a0db393826
BLAKE2b-256 88b39e000ded2ed6ddebdc55ce47f6738a713818fcbb871d232d42c239da92a8

See more details on using hashes here.

File details

Details for the file circrl-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: circrl-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for circrl-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32e1cb40580cf2c3e0f48994325cc50bdecbe1a92f4c7aee3519f43712cd18d3
MD5 111c983c77309937a5cad26996b95f6c
BLAKE2b-256 b4074f32be8d33799c0af8c993df316ee8cb710ee8638cf6b6c5bc6801d7517b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page