Skip to main content

Gage support for Inspect AI

Project description

Gage Inspect

Gage Inspect extends Inspect AI to support general LLM app development and running tasks in production endpoints. It's designed for programmers who want to build LLM applications that leverage Inspect AI for evaluations.

Inspect AI is open source software used by the AI safety community, AI labs, and the general community for defining and running evaluations.

Gage Inspect works with Gage CLI, a set of command line tools that enable programmer workflows for building and improving Inspect AI tasks.

Gage Inspect is available as open source software under the MIT license.

Visit Gage documentation for a more complete guide to using Gage.

Motivation

Gage integrates with Inspect AI to enable eval drive development. Evaluation support is built into your code from day one. Measure in development and test to improve your application and establish baselines. Measure in production to catch regressions and outliers.

Quick start

To use this library, install it using pip.

pip install gage-inspect

Here's a simple Inspect task that can be run from the command line.

from inspect_ai import Task, task
from inspect_ai.solver import generate, prompt_template
from gage_inspect.task import run_task

@task
def funny():
    return Task(
        solver=[
            prompt_template("Say something funny about {prompt} in 5 words or less"),
            generate(),
        ]
    )

if __name__ == "__main__":
    import sys
    resp = run_task(
        funny(),
        input=sys.argv[1],
        model=sys.argv[2],
    )
    print(resp.completion)

To run this task from the command line, save the code to a file named funny.py.

For OpenAI models, install the openai Python package.

pip install openai

Specify your API key for OpenAI using OPENAI_API_KEY.

export OPENAI_API_KEY='*****'

Run the task from the command line.

python funny.py cats openai/gpt-4.1

Task endpoint

Use FastAPI to create an HTTP endpoint for the task.

Save this code to a file named serve.py:

from fastapi import FastAPI
from gage_inspect.task import run_task
from funny import funny

app = FastAPI()

@app.get("/funny/{topic}")
def get_funny(topic, model="openai/gpt-4.1"):
    resp = run_task(funny(), topic, model=model)
    return resp.completion

This code requires the fastapi[standard] package.

pip install fastapi[standard]

Start an endpoint using the fastapi command.

fastapi run serve.py

Call the task using curl:

curl localhost:8000/funny/cats

For a more detailed example of serving a task, see examples/add.

Evaluate the task

Modify funny.py to add a scorer with sample.

from inspect_ai import Task, task
from inspect_ai.solver import generate, prompt_template
from gage_inspect.dataset import dataset
from gage_inspect.scorer import llm_judge

@task
def funny():
    return Task(
        solver=[
            prompt_template("Say something funny about {prompt} in 5 words or less"),
            generate(),
        ],
        scorer=llm_judge(),
    )

@dataset
def samples():
    return ["birds", "cows", "cats", "corn", "barns"]

Evaluate this task using Inspect AI.

INSPECT_EVAL_MODEL=openai/gpt-4.1 inspect eval funny.py

Alternative, use the Gage CLI.

Install gage-cli.

pip install gage-cli

Use gage eval to run the task. Gage asks for input and calls Inspect AI to run the eval.

gage eval funny

Use either Inspect AI View to examine the eval logs.

Inspect View is a web app that runs locally.

inspect view

Visit http://127.0.0.1:7575 to view the Inspect logs.

Alternatively, use Gage Review. Gage Review is a terminal based application that provides an alternative interface to Inspect logs.

gage review

For more information on Gage CLI, see the gage-cli project.

  • Use Inspect AI commands for advanced applications or where Gage's simplified interfaces are insufficient.

  • Use Gage CLI for dialog based commands and terminal based log reviews.

Contributing

See our contribution policy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gage_inspect-0.2.1.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gage_inspect-0.2.1-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file gage_inspect-0.2.1.tar.gz.

File metadata

  • Download URL: gage_inspect-0.2.1.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gage_inspect-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8ff5a751328bd096c5b8bbe4297e8d0690eaf55a9ed43be755a949edf3abdc1b
MD5 358b598b1aa7fe894f146cc815106f54
BLAKE2b-256 2022b7508825775049f6abae7b12f2d20fc7f69ac0df6371a58643ffbe34932c

See more details on using hashes here.

File details

Details for the file gage_inspect-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: gage_inspect-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 42.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gage_inspect-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 34bdda55f18517feaffe8b920c95c7e4251286b970fea0508eb115af9d5b040d
MD5 67fc8e931760aeef4090b57f4bc3d047
BLAKE2b-256 eb9007b7230dd50242d40cb15f60e091f786a25ab19fad7b2300d0834342832d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page