Skip to main content

A toolkit for furthering research on AI alignment.

Project description

GATO Toolkit: An open-source toolkit for AI alignment

NOTE: This project is unstable, you should consider all functionality experimental and subject to change without warning.

This project is intended to further research in AI alignment and the control problem. In particular, the approach adopted here is inspired by the GATO Framework, a comprehensive methodology for promoting positive intentions in AI systems worldwide.

As this is an ongoing effort, the GATO Toolkit will evolve along with the research. In this current iteration, the focus is on dataset generation and model alignment.

Capabilities

Come up with new scenarios to test

You can generate all kinds of scenarios ranging from inconsequential personal problems to catastrophic global disasters. These scenarios serve as the basis for new investigations.

Determine an appropriate action for any scenario

Once you've got a scenario, you can ask the model how it would attempt to handle the situation.

Compare different actions to see which is most aligned

Given a particular scenario, you can provide a number of different possible actions to see which one the model believes is best aligned with the heuristic imperatives.

Evaluate the suitability of an action based on its consequences

Given a particular scenario, action, and result, you can ask the model to assess the effectiveness of that action and reflect on the repercussions of that action.

Break actions down into manageable tasks

Starting with a broad action plan, you can have the model break things up into a list of tasks that would be needed to execute that plan.

Usage

This project provides a library of functions that may be useful for all sorts of research tasks in AI alignment. This functionality can be used directly in Python applications, but we also support two additional interfaces:

  • GATO Toolkit API

  • GATO Toolkit UI

  • If you want to develop new applications in Python, this library is probably the right choice. Jump straight to the examples!

  • But, if you're using a different programming langauge, the API will be your best bet.

  • Finally, if you just want to leverage the existing functionality, have a look at the UI.

  • Also, both of those resources serve as example usage code, so go take a look! 👀

If you do want to use this library directly, you'll probably want to interact with GatoService class. You'll find the essential operations there, and you can see how the lower-level components work by examining the code in gato/service.py.

For more information on design and architecture, check out our design doc.

Examples

Generate a Scenario

import gato.llm
import gato.service

async def generate_scenario(api_key: str):
    model = gato.llm.LLM(api_key)
    service = gato.service.GatoService(model)
    params = service.create_scenario_parameters()
    prompt = service.create_scenario_prompt(params)
    return await service.create_scenario(prompt)

Generate an Action

import gato.entity
import gato.llm
import gato.service

async def generate_action(api_key: str, scenario: gato.entity.Scenario):
    model = gato.llm.LLM(api_key)
    service = gato.service.GatoService(model)
    prompt = service.create_action_prompt(scenario)
    return await service.create_action(prompt)

Additional Resources

Contributing

Contributions to the GATO Toolkit are welcome! Please read our contributing guidelines and code of conduct before you start.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gato-toolkit-0.3.1.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

gato_toolkit-0.3.1-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file gato-toolkit-0.3.1.tar.gz.

File metadata

  • Download URL: gato-toolkit-0.3.1.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for gato-toolkit-0.3.1.tar.gz
Algorithm Hash digest
SHA256 c5f4d84a9f61365dcac8a859c285be150df0f2946fda6e9885cc32ab835d60e0
MD5 8a970d104b5cebd84be6dfa5fcee8417
BLAKE2b-256 35b8ee6270bd4dac058933b93e5b84965c04bbc889a23bc13f4517428912e38e

See more details on using hashes here.

File details

Details for the file gato_toolkit-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: gato_toolkit-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for gato_toolkit-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 08583ab0e63538fc1030bc5e6d87e09472820b79a750ad1820e08eec49c504b0
MD5 ebd6ce9c403b45ed194a514e3af9b21c
BLAKE2b-256 b66aa8ed865e43f27a209e4a5d8082d44e1b888ba2b926ad3e69a26377ccd2e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page