Add your description here

Project description

giskard-checks

Lightweight primitives to define and run checks against model interactions.

This library provides:

Core types for describing interactions (Interact, Interaction, Trace)
A fluent scenario builder and runner (Scenario, ScenarioResult)
Built-in checks including string matching, comparisons, and LLM-based evaluation
JSONPath-based extraction utilities for referencing trace data
Seamless integration with giskard-agents generators for LLM-backed checks

Installation

pip install giskard-checks

Requires Python >= 3.12.

Dependencies:

pydantic>=2.11.7 - Core data validation and serialization
giskard-agents>=0.3 - LLM integration and workflow management
jsonpath-ng>=1.7.0 - JSONPath expressions for data extraction
jinja2>=3.1.6 - Template engine for LLM prompts

Quickstart

Use the fluent API to create and run scenarios:

from giskard.checks import Groundedness, Scenario

# Natural variable name: no shadowing (Scenario is the class, scenario is your instance)
scenario = (
    Scenario("test_france_capital")
    .interact(
        inputs="What is the capital of France?",
        outputs="The capital of France is Paris."
    )
    .check(
        Groundedness(
            name="answer is grounded",
            answer_key="trace.last.outputs",
            context="""France is a country in Western Europe. Its capital
                       and largest city is Paris, known for the Eiffel Tower
                       and the Louvre Museum."""
        )
    )
)

result = await scenario.run()
assert result.passed
print(f"Scenario completed in {result.duration_ms}ms")

The fluent API accepts static values or callables for inputs and outputs, so you can call your SUT directly:

from openai import OpenAI
from giskard.checks import Groundedness, Scenario

client = OpenAI()

def get_answer(inputs: str) -> str:
    response = client.chat.completions.create(
        model="gpt-5-mini",
        messages=[{"role": "user", "content": inputs}],
    )
    return response.choices[0].message.content

scenario = (
    Scenario("test_dynamic_output")
    .interact(
        inputs="What is the capital of France?",
        outputs=get_answer
    )
    .check(
        Groundedness(
            name="answer is grounded",
            answer_key="trace.last.outputs",
            context="France is a country in Western Europe..."
        )
    )
)

The run() method is async. In a script, wrap it with asyncio.run():

import asyncio

async def main():
    result = await scenario.run()
    print(result)

asyncio.run(main())

Running Multiple Scenarios with Suite

Use a Suite to run multiple scenarios against a shared target SUT. You can bind a target at the suite level or override it during the run() call.

from giskard.checks import Equals, Scenario, Suite

# Define some scenarios without a target (no shadowing: Scenario is the class)
scenario1 = (
    Scenario("s1")
    .interact("hello")
    .check(Equals(expected_value="Echo: hello", key="trace.last.outputs"))
)
scenario2 = (
    Scenario("s2")
    .interact("world")
    .check(Equals(expected_value="Echo: world", key="trace.last.outputs"))
)

# Create a suite with a shared target
target_sut = lambda inputs: f"Echo: {inputs}"
suite = Suite(name="my_suite", target=target_sut)

# Add scenarios
suite.append(scenario1)
suite.append(scenario2)

# Run the suite
results = await suite.run()
print(f"Aggregated pass rate: {results.pass_rate * 100}%")

Why this library?

Small, explicit, and type-safe with pydantic models
Async-friendly: checks can be sync or async
Results are immutable and easy to serialize

Concepts

Fluent API: The recommended way to create tests using Scenario(...).interact().check(). This API builds a scenario and handles interaction generation.
Interact: A specification for generating interactions dynamically (static values, callables, or generators).
Trace: Immutable history of all Interaction objects produced while executing a scenario. Use trace.last in JSONPath expressions (e.g., trace.last.outputs).
Interaction: A recorded exchange with inputs, outputs, and optional metadata.
Check: Inspects the Trace and returns a CheckResult.
Scenario: Ordered sequence of interactions and checks with a shared Trace. Execution stops at the first failing check and later steps are skipped. Scenarios can have their own target SUT, which will be injected to interactions without defined outputs.
Suite: A collection of scenarios that can be executed together, optionally sharing a common target.

Advanced concepts (used internally by the fluent API):

TestCase: Wrapper that runs a set of checks against a single trace step and returns a TestCaseResult.
ScenarioRunner: Executes scenarios sequentially, maintaining trace state and aggregating step results.

API Overview

Core types

giskard.checks.Check: base class for all checks with discriminated-union registration.
giskard.checks.CheckResult, CheckStatus, Metric: typed results with convenience helpers.
giskard.checks.Trace / Interaction: a trace is an immutable sequence of recorded interactions with the system.
giskard.checks.Scenario and ScenarioResult: ordered sequence of components with shared trace. Execution stops at first failure and later steps are skipped.
giskard.checks.TestCase and TestCaseResult: runs checks against a trace step and aggregates results.

Interaction specs

giskard.checks.InteractionSpec: discriminated base for describing inputs/outputs. Subclasses implement generate() to yield interactions.
giskard.checks.Interact: batteries-included spec that supports static values, callables, generators, or InputGenerator instances for both inputs and outputs. Supports multi-turn interactions via generators.
giskard.checks.UserSimulator: LLM-powered input generator that simulates user personas (predefined or custom) for multi-turn scenarios.

Scenarios and runners

giskard.checks.Scenario: ordered sequence of components (InteractionSpecs and Checks) with shared trace. Components execute sequentially, stopping at first failure.
giskard.checks.ScenarioRunner: executes scenarios with timing, error capture, and early-stop semantics.
giskard.checks.TestCaseRunner: executes test cases with timing and error handling.

Built-in and LLM-based checks

giskard.checks.from_fn, FnCheck: wrap arbitrary callables.
giskard.checks.StringMatching, RegexMatching, SemanticSimilarity, Equals, NotEquals, GreaterThan, GreaterEquals, LesserThan, LesserThanEquals.
giskard.checks.BaseLLMCheck, LLMCheckResult, Groundedness, Conformity, LLMJudge.
JSONPath selectors (e.g., trace.last.outputs) are supported on relevant checks via key or check-specific fields like answer_key.

Testing utilities

giskard.checks.WithSpy: wrapper for spying on function calls during interaction generation.

Settings

giskard.checks.set_default_generator / get_default_generator: configure the generator used by LLM checks.

Testing

Tests live under tests/ mirroring the package structure (tests/core, tests/scenarios, tests/trace).
Use make test (or make ci) to run the full suite exactly as CI does.

Usage Notes

Define custom checks with a unique KIND via @Check.register("kind").
All discriminated types auto-register when imported; ensure modules are imported before deserialization.
Prefer model_dump() / model_validate() for serialization.
Attach extra metadata in CheckResult.details; JSONPath helpers (key=...) resolve against the entire trace.

Serialization

The library uses Pydantic's discriminated unions for polymorphic serialization.

from giskard.checks import Check, CheckResult, Interaction, TestCase, Trace


@Check.register("my_custom_check")
class MyCustomCheck(Check):
    async def run(self, trace: Trace) -> CheckResult:
        return CheckResult.success("Check passed")


trace = Trace(interactions=[Interaction(inputs="test", outputs="result")])
check = MyCustomCheck(name="test")
testcase = TestCase(trace=trace, checks=[check], name="example")

# Serialize to dict
serialized = testcase.model_dump()

# Deserialize back (requires classes to be imported)
restored = TestCase.model_validate(serialized)

Important: Import every custom type (checks and specs) before calling model_validate(). The registry only knows about classes already loaded into memory.

Creating Custom Checks and Interaction Specs

Step 1: Define a custom check

from giskard.checks import Check, CheckResult, Trace


@Check.register("advanced_security")
class AdvancedSecurityCheck(Check):
    threshold: float = 0.8

    async def run(self, trace: Trace) -> CheckResult:
        current = trace.last
        score = await some_security_analysis(current.outputs)
        if score >= self.threshold:
            return CheckResult.success(f"Security score {score:.2f} meets threshold")
        return CheckResult.failure(
            f"Security score {score:.2f} below threshold {self.threshold}"
        )

Step 2: Define a custom interaction specification

from giskard.checks import InteractionSpec, Interaction, Trace


@InteractionSpec.register("chat_conversation")
class ChatInteraction(InteractionSpec):
    session_id: str
    messages: list[str]

    async def generate(self, trace: Trace):
        summary = f"Conversation with {len(self.messages)} messages"
        record = Interaction(
            inputs=self.messages,
            outputs={"summary": summary},
            metadata={"session_id": self.session_id},
        )
        yield record

Step 3: Verify registration

from giskard.checks import Scenario

chat = ChatInteraction(session_id="session_123", messages=["hi", "hello"])
check = AdvancedSecurityCheck(name="security_test", threshold=0.7)
scenario = Scenario.from_sequence(chat, check, name="custom_test")

serialized = scenario.model_dump()
restored = Scenario.model_validate(serialized)

Binding a Target SUT

You can bind a System Under Test (SUT) at three different levels, with the following precedence: run(target=...) > Suite(target=...) > Scenario(..., target=...).

1. Scenario Level

Pass the target directly to Scenario():

result = await Scenario("test", target=my_sut).interact("hello").run()

2. Suite Level

Bind a target to all scenarios in a suite:

suite = Suite(name="my_suite", target=shared_sut)
suite.append(scenario)
result = await suite.run()

3. Run Level

Override everything at execution time:

# This target will be used for all scenarios in the suite,
# overriding any suite-level or scenario-level targets.
result = await suite.run(target=emergency_override_sut)

Troubleshooting Serialization Issues

ValidationError: "Kind is not provided for Check"

Cause: Custom class not imported before deserialization.
Fix: Import classes before calling model_validate().

DuplicateKindError: "Duplicate kind 'my_check' detected"

Cause: Two classes share the same KIND.
Fix: Give every registered class a unique KIND.

Missing registration

Cause: Subclass missing the decorator.
Fix: Use @Check.register("...") (or the relevant base).

Import order issues in tests

Cause: Tests call model_validate() before importing custom modules.
Fix: Import those modules in test setup or fixtures first.

Structured data example

from giskard.checks import Equals, Scenario, StringMatching

result = await (
    Scenario("structured-example")
    .interact(
        {"question": "What is the capital of France?"},
        lambda inputs: {"answer": "Paris is the capital of France.", "confidence": 0.95}
    )
    .check(StringMatching(
        name="contains_paris",
        keyword="Paris",
        text_key="trace.last.outputs.answer",
    ))
    .check(Equals(
        name="high_confidence",
        expected_value=0.95,
        key="trace.last.outputs.confidence",
    ))
    .run()
)

assert result.passed
print(f"Scenario completed in {result.duration_ms}ms")

Multi-step workflows

Use the fluent API to create multi-turn scenarios. Components execute sequentially with a shared trace, stopping at the first failing check.

from giskard.checks import LLMJudge, RegexMatching, Scenario

result = await (
    Scenario("multi_step_conversation")
    .interact(
        "Hello, I want to apply for a job.",
        lambda inputs: "Hi! I'd be happy to help. Please provide your email."
    )
    .check(LLMJudge(
        prompt="The assistant asked for the email politely: {{ trace.last.outputs }}"
    ))
    .interact(
        "My email is test@example.com",
        lambda inputs: f"Thank you! I've saved your application with email: {inputs.split()[-1]}"
    )
    .check(RegexMatching(
        pattern="test@example.com",
        text_key="trace.last.outputs",
    ))
    .run()
)

assert result.passed

Dynamic interaction generation

The fluent API supports callables (sync/async) or generators for dynamic inputs. Multiple inputs can be produced by yielding from a generator.

from giskard.checks import Scenario, Trace, from_fn


async def input_generator(trace: Trace):
    count = len(trace.interactions)
    next_input = {"message": f"Hello! This is message #{count + 1}"}
    yield next_input  # Can also yield multiple times for streaming inputs


result = await (
    Scenario("dynamic-example")
    .interact(
        input_generator,
        lambda inputs: {
            "response": f"Hi there! Received: {inputs['message']}",
        }
    )
    .check(from_fn(lambda trace: True, name="noop"))
    .run()
)

User simulation

Use UserSimulator for LLM-powered user personas in multi-turn scenarios. Supports predefined personas (e.g., frustrated_customer, helpful_user) or custom descriptions.

from giskard.checks import scenario, UserSimulator, set_default_generator
from giskard.agents.generators import Generator

set_default_generator(Generator(model="openai/gpt-4o-mini"))

result = await (
    scenario("user-simulation")
    .interact(
        UserSimulator(persona="frustrated_customer", context="delayed order"),
        lambda inputs: "I apologize for the delay. Let me look into that.",
    )
    .run()
)

LLM-based checks

from giskard.agents.generators import Generator

from giskard.checks import (
    Conformity,
    LLMJudge,
    Scenario,
    set_default_generator,
)

# Configure the default LLM generator
set_default_generator(Generator(model="openai/gpt-4o-mini"))

result = await (
    Scenario("llm-example")
    .interact(
        {"question": "What is the capital of France?"},
        lambda inputs: {"answer": "Paris is the capital of France."}
    )
    .check(Conformity(name="conformity_check", rule="The agent must answer in English"))
    .check(LLMJudge(
        name="safety_check",
        prompt="""Evaluate if the following response is safe and appropriate.

Input: {{ trace.last.inputs }}
Response: {{ trace.last.outputs }}

Return 'passed: true' if safe, 'passed: false' if unsafe.""",
    ))
    .run()
)

assert result.passed
print(f"LLM evaluation completed in {result.duration_ms}ms")

Template customization & advanced LLM usage

LLM-based checks ship with template references registered inside giskard.agents.
Provide your own template by overriding get_prompt() in a subclass or by instantiating LLMJudge with inline prompts.
Templates use the same interpolation context you return from get_inputs().

from giskard.agents.workflow import TemplateReference
from pydantic import BaseModel

from giskard.checks import BaseLLMCheck, Check, CheckResult, Trace


class CustomResult(BaseModel):
    score: float
    passed: bool
    reasoning: str


@Check.register("custom_llm_check")
class CustomLLMCheck(BaseLLMCheck):
    def get_prompt(self) -> TemplateReference:
        return TemplateReference(template_name="my_project::checks/custom_check.j2")

    @property
    def output_type(self) -> type[BaseModel]:
        return CustomResult

    async def _handle_output(
        self,
        output_value: CustomResult,
        template_inputs: dict[str, str],
        trace: Trace,
    ) -> CheckResult:
        if output_value.score >= 0.8:
            return CheckResult.success(f"Score {output_value.score} meets threshold")
        return CheckResult.failure(f"Score {output_value.score} below threshold")

Notes

Trace captures every interaction; JSONPath keys like trace.last.outputs resolve against that structure.
Pass a generator to individual LLM checks or rely on the default configured via set_default_generator().
Built-in LLM checks rely on templates bundled in giskard.checks and registered with the giskard-agents template system; override get_prompt or get_inputs for customization.

Advanced Usage

For advanced use cases where you need direct control over interactions or trace construction, you can build a Trace for TestCase directly, using Interaction:

from giskard.checks import Interaction, TestCase, Trace

# Build a Trace manually for a TestCase
trace = Trace(interactions=[
    Interaction(inputs="some text", outputs=process("some text")),
])
tc = TestCase(trace=trace, checks=[check1, check2], name="advanced_example")
test_case_result = await tc.run()

For programmatic test generation or when you need fine-grained control, you can also construct Scenario objects directly, creating a sequence of InteractionSpec or Check objects:

from giskard.checks import (
    Scenario,
    Interact, # Inherits from `InteractionSpec`
    Equals # Inherits from `Check`
)

scenario = Scenario.from_sequence(
    Interact(inputs="Hello", outputs=lambda inputs: "Hi"),
    Equals(expected_value="Hi", key="trace.last.outputs"),
    name="programmatic_scenario",
)

result = await scenario.run()

Note: For most use cases, the fluent API (Scenario(...).interact().check()) is recommended as it's simpler and more readable.

Development

Use the Makefile for all development workflows (make help for details).

make install   # Install dependencies
make setup     # Install dependencies + tools (Format, lint, typecheck, test)

Other common commands:

make test
make lint
make format
make typecheck
make check
make clean

For more details, see the Makefile or run make help.

Project details

Release history Release notifications | RSS feed

1.0.2b1 pre-release

Apr 10, 2026

1.0.1b1 pre-release

Mar 26, 2026

This version

1.0.1a2 pre-release

Mar 20, 2026

1.0.1a1 pre-release

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

giskard_checks-1.0.1a2.tar.gz (41.3 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

giskard_checks-1.0.1a2-py3-none-any.whl (58.1 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file giskard_checks-1.0.1a2.tar.gz.

File metadata

Download URL: giskard_checks-1.0.1a2.tar.gz
Upload date: Mar 20, 2026
Size: 41.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for giskard_checks-1.0.1a2.tar.gz
Algorithm	Hash digest
SHA256	`c04708a68e5d7004d20b69238fbdae939957bdf54e51b33616dfcf0508fff0f2`
MD5	`daaea057e228bdbff92b30d124994f45`
BLAKE2b-256	`ef80c748095a5bc2325ab43c0a43147f568515dfc0b455f0ab059bf8a6acbdec`

See more details on using hashes here.

File details

Details for the file giskard_checks-1.0.1a2-py3-none-any.whl.

File metadata

Download URL: giskard_checks-1.0.1a2-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 58.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for giskard_checks-1.0.1a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8ebe73e12afbd8a317668db6fdddfa45cbda19d6195544077779025c440b021c`
MD5	`e30a48096a475a2b0e4d40f1e9b1b4aa`
BLAKE2b-256	`5e15da7fc427e51b4eff79c1a3ad6dde98e15f0f608a7dc00ef3c7603bf68dd4`

See more details on using hashes here.

giskard-checks 1.0.1a2

Navigation

Verified details

Owner

Unverified details

Meta

Project description

giskard-checks

Installation

Quickstart

Running Multiple Scenarios with Suite

Why this library?

Concepts

API Overview

Testing

Usage Notes

Serialization

Creating Custom Checks and Interaction Specs

Step 1: Define a custom check

Step 2: Define a custom interaction specification

Step 3: Verify registration

Binding a Target SUT

1. Scenario Level

2. Suite Level

3. Run Level

Troubleshooting Serialization Issues

Structured data example

Multi-step workflows

Dynamic interaction generation

User simulation

LLM-based checks

Template customization & advanced LLM usage

Notes

Advanced Usage

Development

Project details

Verified details

Owner

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes