Skip to main content

A library for doing research on developmental interpretability

Project description

DevInterp

PyPI version Python version Contributors Docs

A Python Library for Developmental Interpretability Research

DevInterp is a python library for conducting research on developmental interpretability, a novel AI safety research agenda rooted in Singular Learning Theory (SLT). DevInterp proposes tools for detecting, locating, and ultimately controlling the development of structure over training.

Read more about developmental interpretability.

:warning: This library is still in early development. Don't expect things to work on a first attempt. We are actively working on improving the library and adding new features.

Installation

To install devinterp, simply run pip install devinterp. (Note: This has PyTorch as a dependency.)

Minimal Example

from devinterp.slt.sampler import  sample, LLCEstimator
from devinterp.optim import SGLD
from devinterp.utils import default_nbeta

# Assuming you have a PyTorch Model assigned to model, and DataLoader assigned to trainloader
llc_estimator = LLCEstimator(..., nbeta=default_nbeta(trainloader))
sample(model, trainloader, ..., callbacks = [llc_estimator])

llc_mean = llc_estimator.get_results()["llc/mean"]

Advanced Usage

To see DevInterp in action, check out our example notebooks:

For more advanced usage, see the Diagnostics notebook Open In Colab and for a quick guide on picking hyperparameters, see the above Grokking Demo Open In Colab or the the Calibration notebook. Open In Colab. Documentation can be found here. Docs

For papers that either inspired or used the DevInterp package, click here.

Known Issues

  • LLC Estimation is currently more of an art than a science. It will take some time and pain to get it work reliably.

If you run into issues not mentioned here, please first check the github issues, then ask in the DevInterp Discord, and only then make a new github issue.

Contributing

See CONTRIBUTING.md for guidelines on how to contribute.

Credits & Citations

This package was created by Timaeus. The main contributors to this package are Stan van Wingerden, Jesse Hoogland, George Wang, and William Zhou. Zach Furman, Matthew Farrugia-Roberts, Rohan Hitchcock, and Edmund Lau also made valuable contributions or provided useful advice.

If this package was useful in your work, please cite it as:

@misc{devinterpcode,
  title = {DevInterp},
  author = {van Wingerden, Stan and Hoogland, Jesse and Wang, George and Zhou, William},
  year = {2024},
  howpublished = {\url{https://github.com/timaeus-research/devinterp}},
}

Optional Dependencies

DevInterp offers additional visualization functionalities that are not included in the base installation. To enable these features, install the package with the vis extra:

pip install devinterp[vis]

This will install plotly, which is required for the visualization utilities provided in vis_utils.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

devinterp-1.3.2.tar.gz (51.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

devinterp-1.3.2-py3-none-any.whl (63.7 kB view details)

Uploaded Python 3

File details

Details for the file devinterp-1.3.2.tar.gz.

File metadata

  • Download URL: devinterp-1.3.2.tar.gz
  • Upload date:
  • Size: 51.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for devinterp-1.3.2.tar.gz
Algorithm Hash digest
SHA256 c42b1ea079ac9219f7d849b79cf029663fa3ea48817c1b5001ce326e53deae6d
MD5 121a2c7782dddd82a2210bc87e4cfdd1
BLAKE2b-256 04f581671da0aa92963ff74bbde3f3fea5d8fd1f2acdf82edecd804ffb1339d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for devinterp-1.3.2.tar.gz:

Publisher: publish.yml on timaeus-research/devinterp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file devinterp-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: devinterp-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 63.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for devinterp-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9c17bf117cf33c00d983d384ae5ae6b7c595eb04da91a2ec7eb61db96c5eef67
MD5 a3a96a1bdf781ec0a6a2306bfa873dd4
BLAKE2b-256 e556ba95c52ff28a4062814d88dea423f3c1b25ff345e8d36cecab6753037c2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for devinterp-1.3.2-py3-none-any.whl:

Publisher: publish.yml on timaeus-research/devinterp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page