Automatically and uniformly measure the behavior of many AI Systems.
Project description
ModelGauge
Goal: Make it easy to automatically and uniformly measure the behavior of many AI Systems.
[!WARNING] This repo is still in beta with a planned full release in Fall 2024. Until then we reserve the right to make backward incompatible changes as needed.
ModelGauge is an evolution of crfm-helm, intended to meet their existing use cases as well as those needed by the MLCommons AI Safety project.
Summary
ModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:
- Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)
- Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.
Currently ModelGauge is targeted at LLMs and single turn prompt response Tests, with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.
Docs
- Developer Quick Start
- Tutorial for how to create a Test
- Tutorial for how to create a System Under Test (SUT)
- How we use plugins to connect it all together.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for modelgauge-0.6.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7317b1a8d39221b1ea8455cdb49c895959e57890a0254a26cc1e0ad03ad4344 |
|
MD5 | d8787fd74768ff78060ffa6c1e302a94 |
|
BLAKE2b-256 | 61d2dccef44f5399c0ade89ecf319e25ef6f4e9dbec5c71bf84e6d1eae214d84 |