Behavioral validation for LLM outputs in production workflows.

These details have not been verified by PyPI

Project description

Gateframe

Behavioral validation for LLM outputs in production workflows.

Schema validation, "does this JSON have the right keys?", is a solved problem. Instructor, Pydantic AI, and similar tools handle it well. gateframe solves a different problem: does this output behave correctly given the context it was generated in? Does it stay within the decision boundaries this workflow requires? When it fails, does it fail in a way your system can recover from, or does it fail silently?

from pydantic import BaseModel
from gateframe import (
    ValidationContract,
    StructuralRule,
    BoundaryRule,
    ConfidenceRule,
    AllowedValues,
    FailureMode,
)

class TriageDecision(BaseModel):
    action: str
    priority: str
    confidence: float
    rationale: str

contract = ValidationContract(
    name="triage_decision",
    rules=[
        StructuralRule(schema=TriageDecision),
        BoundaryRule(
            check=AllowedValues("action", {"treat", "observe", "refer", "discharge"}),
            name="action_boundary",
            failure_message="Action must be one of: treat, observe, refer, discharge.",
        ),
        ConfidenceRule(field="confidence", minimum=0.7),
    ],
)

result = contract.validate({
    "action": "prescribe",       # not in allowed set -> HARD_FAIL
    "priority": "high",
    "confidence": 0.52,          # below 0.7 -> SOFT_FAIL
    "rationale": "...",
})

print(result.passed)             # False
for failure in result.failures:
    print(f"[{failure.failure_mode.value}] {failure.rule_name}: {failure.message}")
# [hard_fail] action_boundary: Action must be one of: treat, observe, refer, discharge.
# [soft_fail] confidence_check: Confidence 0.52 is below minimum threshold 0.7.

The problem

Most LLM pipelines validate outputs the same way: parse the JSON, check the schema, move on. That catches structural errors. It misses the errors that actually cause production incidents:

A model recommends an action that is structurally valid but outside its authorized scope
Confidence is low but the workflow proceeds as if it weren't
A soft failure in step 2 silently degrades the reliability of everything downstream
A validation failure gives you False, and no context for debugging

gateframe makes these failures explicit, structured, and recoverable.

Failure modes

gateframe distinguishes four failure types instead of binary pass/fail.

HARD_FAIL, Stop. The output violates a hard constraint that cannot be auto-recovered.

# Model chose an action outside its authorized scope
BoundaryRule(
    check=AllowedValues("action", {"treat", "observe", "refer"}),
    failure_mode=FailureMode.HARD_FAIL,  # default for BoundaryRule
)

SOFT_FAIL, Flag and continue with degraded confidence. Something is off but not critical enough to halt.

# Model confidence is low, continue but track the degradation
ConfidenceRule(
    field="confidence",
    minimum=0.7,
    failure_mode=FailureMode.SOFT_FAIL,  # default for ConfidenceRule
)

RETRY, Re-prompt with the failure context. The output is likely fixable by trying again.

# Malformed output that might parse correctly on a second attempt
StructuralRule(schema=MyOutput, failure_mode=FailureMode.RETRY)

SILENT_FAIL, The most dangerous kind. The output looks valid but violates a semantic or boundary rule. gateframe makes these visible instead of letting them pass through undetected.

SemanticRule(
    check=lambda output, **ctx: output["severity"] != "low" or output["escalated"] is False,
    failure_mode=FailureMode.SILENT_FAIL,
    failure_message="Low-severity cases should not be auto-escalated.",
)

Multi-step workflow validation

Validation state carries forward across steps. A soft failure in step 2 degrades the confidence score that step 4 sees.

from gateframe import WorkflowContext, ValidationContract, EscalationRouter
from gateframe.audit.log import AuditLog

ctx = WorkflowContext(workflow_id="incident_response_001", escalation_threshold=0.5)
router = EscalationRouter()
audit = AuditLog()

# Step 1
result1 = contract_step1.validate(output1)
ctx.update(result1)
audit.record(result1, workflow_context=ctx)

# Step 2, ctx carries forward degraded confidence from step 1
result2 = contract_step2.validate(output2)
ctx.update(result2)
audit.record(result2, workflow_context=ctx)

print(ctx.confidence)           # degraded from 1.0 by soft failures
print(ctx.threshold_breached)   # True if confidence < escalation_threshold

if ctx.threshold_breached:
    escalation = router.route_threshold_breach(ctx)
    print(escalation.route.value)  # "human_review", "abort", etc.

Provider integrations

gateframe validates outputs from any provider. Integrations are thin wrappers, gateframe does not import any LLM SDK at the core level.

# OpenAI
from gateframe.integrations.openai import OpenAIValidator
validator = OpenAIValidator(contract, parse_json=True)
result = validator.validate(openai_completion)

# Anthropic
from gateframe.integrations.anthropic import AnthropicValidator
validator = AnthropicValidator(contract, parse_json=True)
result = validator.validate(anthropic_message)

# LiteLLM
from gateframe.integrations.litellm import LiteLLMValidator
validator = LiteLLMValidator(contract, parse_json=True)
result = validator.validate(litellm_response)

# LangChain
from gateframe.integrations.langchain import LangChainValidator
validator = LangChainValidator(contract, parse_json=False)
result = validator.validate(chain_output)

Install the integration you need:

pip install "gateframe[openai]"
pip install "gateframe[anthropic]"
pip install "gateframe[litellm]"
pip install "gateframe[langchain]"

Audit trail

Every validation event is logged with structured context. Use the built-in exporters or implement your own.

from gateframe.audit.log import AuditLog
from gateframe.audit.exporters import JsonFileExporter

audit = AuditLog(exporters=[JsonFileExporter("audit.jsonl")])
audit.record(result, workflow_context=ctx)
audit.flush()

Each entry includes: timestamp, contract name, rules applied, rules failed, failure details, workflow ID, and accumulated confidence score.

When to use gateframe

Use it when:

You need to validate LLM output behavior beyond schema checks, decision boundaries, scope enforcement, semantic constraints
You need structured, recoverable failure records rather than bare exceptions
You're running multi-step workflows where soft failures in early steps should affect confidence downstream
You need an audit trail for post-incident debugging

Don't use it when:

You only need schema extraction from LLM outputs, use Instructor or Pydantic AI
You need offline model evaluation or benchmarking, use DeepEval or RAGAS
You need content safety filtering, use a dedicated guardrails tool

Installation

pip install gateframe

For development:

git clone https://github.com/practicalmind-ai/gateframe.git
cd gateframe
pip install -e ".[dev]"
python -m pytest tests/ -v

Examples

triage_workflow, 3-step medical triage pipeline. Demonstrates StructuralRule, BoundaryRule, ConfidenceRule, and WorkflowContext together. Step 2 has confidence below threshold, shows how SOFT_FAIL degrades the workflow score without halting it.

rag_output, RAG answer validation with two scenarios. Scenario B demonstrates simultaneous soft failures (low confidence + ungrounded answer) and how they accumulate in the workflow context.

agent_pipeline, 4-step agent workflow with escalation. Demonstrates how multiple soft failures across steps push cumulative confidence below the escalation threshold.

CLI

# Inspect a contract file, lists all contracts and their rules
gateframe inspect contracts.py

# Replay an audit log
gateframe replay audit.jsonl

License

MIT, see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Mar 30, 2026

0.1.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gateframe-0.2.0.tar.gz (19.6 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gateframe-0.2.0-py3-none-any.whl (24.2 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file gateframe-0.2.0.tar.gz.

File metadata

Download URL: gateframe-0.2.0.tar.gz
Upload date: Mar 30, 2026
Size: 19.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for gateframe-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`09b65506c7114c9517ec3e32704aaddda7cb844114c86069717f740d21e8016f`
MD5	`6639a8a91665b58f6a84d506cd196a3a`
BLAKE2b-256	`14992074fb028aada94e556464f7c421f1c92afc1af0e6d75d4627142c9caba6`

See more details on using hashes here.

File details

Details for the file gateframe-0.2.0-py3-none-any.whl.

File metadata

Download URL: gateframe-0.2.0-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 24.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for gateframe-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07aa5a5471b9809a8cf42469be64fe60f02174ae424d42c26a2a55c29cc177e5`
MD5	`00f18b126b16656d268af0ecdf546529`
BLAKE2b-256	`8467d34e905d250b6c80756e5fa0f3b40469847571c4a0f62cdc64399f04a0b5`

See more details on using hashes here.

gateframe 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Gateframe

The problem

Failure modes

Multi-step workflow validation

Provider integrations

Audit trail

When to use gateframe

Installation

Examples

CLI

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes