Skip to main content

Qualifire Python SDK

Project description

Qualifire Python SDK

PyPI version Python Version build License Coverage

Evaluate LLM outputs for quality, safety, and reliability

Documentation · Dashboard · PyPI


Table of Contents

Installation

pip install qualifire

Quick Start

from qualifire.client import Client

client = Client(api_key="your_api_key")

result = client.evaluate(
    input="What is the capital of France?",
    output="The capital of France is Paris.",
    hallucinations_check=True,
)

print(f"Score: {result.score}")  # 0-100
print(f"Flagged: {result.evaluationResults[0].results[0].flagged}")

Available Checks

Check Description
hallucinations_check Detect factual inaccuracies or hallucinations
grounding_check Verify output is grounded in the provided context
pii_check Detect personally identifiable information
prompt_injections Identify prompt injection attempts
content_moderation_check Check for harmful content (harassment, hate speech, dangerous content, sexual content)
tool_use_quality_check Evaluate quality of tool/function calls
syntax_checks Validate output syntax (JSON, SQL, etc.)
assertions Custom assertions to validate against the output

Usage Examples

Basic Input/Output Evaluation

result = client.evaluate(
    input="Summarize this document about climate change.",
    output="Climate change is primarily caused by human activities...",
    hallucinations_check=True,
    grounding_check=True,
)

Message-Based Evaluation

Evaluate full conversation histories using the OpenAI message format:

from qualifire.types import LLMMessage

result = client.evaluate(
    messages=[
        LLMMessage(role="system", content="You are a helpful assistant."),
        LLMMessage(role="user", content="What is the capital of France?"),
        LLMMessage(role="assistant", content="The capital of France is Paris."),
    ],
    hallucinations_check=True,
)

Multi-Turn Conversations

Enable multi-turn mode for evaluating conversation context:

result = client.evaluate(
    messages=[
        LLMMessage(role="user", content="What is 2 + 2?"),
        LLMMessage(role="assistant", content="2 + 2 equals 4."),
        LLMMessage(role="user", content="And if you add 3 more?"),
        LLMMessage(role="assistant", content="4 + 3 equals 7."),
    ],
    hallucinations_check=True,
    grounding_multi_turn_mode=True,
)

Content Safety

result = client.evaluate(
    input="Write a story about friendship.",
    output="Once upon a time...",
    content_moderation_check=True,
    pii_check=True,
    prompt_injections=True,
)

Syntax Validation

from qualifire.types import SyntaxCheckArgs

result = client.evaluate(
    input="Return the user data as JSON.",
    output='{"name": "John", "age": 30}',
    syntax_checks={"json": SyntaxCheckArgs(args="strict")},
)

Custom Assertions

Define natural language assertions to validate against:

result = client.evaluate(
    input="List three fruits.",
    output="1. Apple\n2. Banana\n3. Orange",
    assertions=[
        "The output must contain exactly three items",
        "Each item must be a fruit",
        "Items must be numbered",
    ],
)

Tool Selection Quality

Evaluate whether the LLM selected the right tools with correct arguments:

from qualifire.types import LLMMessage, LLMToolCall, LLMToolDefinition

result = client.evaluate(
    messages=[
        LLMMessage(role="user", content="What's the weather in New York tomorrow?"),
        LLMMessage(
            role="assistant",
            content="Let me check that for you.",
            tool_calls=[
                LLMToolCall(
                    id="call_123",
                    name="get_weather",
                    arguments={"location": "New York", "date": "tomorrow"},
                )
            ],
        ),
    ],
    available_tools=[
        LLMToolDefinition(
            name="get_weather",
            description="Get weather forecast for a location",
            parameters={
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "date": {"type": "string"},
                },
                "required": ["location"],
            },
        ),
    ],
    tool_use_quality_check=True,
)

Pre-configured Evaluations

Run evaluations configured in the Qualifire Dashboard:

result = client.invoke_evaluation(
    evaluation_id="eval_abc123",
    input="User query here",
    output="LLM response here",
)

Model Modes

Control the speed/quality trade-off for each check:

from qualifire.types import ModelMode

result = client.evaluate(
    input="...",
    output="...",
    hallucinations_check=True,
    hallucinations_mode=ModelMode.QUALITY,  # SPEED | BALANCED | QUALITY
    grounding_check=True,
    grounding_mode=ModelMode.SPEED,
)

Configuration

Environment Variables

Variable Description
QUALIFIRE_API_KEY Your Qualifire API key
QUALIFIRE_BASE_URL Custom API base URL (optional)

Client Options

client = Client(
    api_key="your_api_key",  # Or set QUALIFIRE_API_KEY env var
    base_url="https://...",  # Custom base URL (optional)
    debug=True,              # Enable debug logging
    verify=True,             # SSL certificate verification
)

Response Format

result = client.evaluate(...)

# Overall score (0-100)
result.score

# Evaluation status
result.status

# Detailed results per check
for item in result.evaluationResults:
    print(f"Check: {item.type}")
    for r in item.results:
        print(f"  {r.name}: {r.label} (score: {r.score})")
        print(f"  Reason: {r.reason}")
        print(f"  Flagged: {r.flagged}")
Example JSON Response
{
  "score": 95,
  "status": "completed",
  "evaluationResults": [
    {
      "type": "hallucinations",
      "results": [
        {
          "name": "hallucination_check",
          "label": "pass",
          "score": 100,
          "flagged": false,
          "reason": "The response is factually accurate and consistent with known information.",
          "claim": "The capital of France is Paris.",
          "quote": "The capital of France is Paris.",
          "confidence_score": 98
        }
      ]
    }
  ]
}

Requirements

  • Python 3.8+

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qualifire-0.13.0.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qualifire-0.13.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file qualifire-0.13.0.tar.gz.

File metadata

  • Download URL: qualifire-0.13.0.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qualifire-0.13.0.tar.gz
Algorithm Hash digest
SHA256 889c042cd0a6596f0b9da88f84cef5082bddb04e5b89a521b6f6e5b5e7d8df27
MD5 beebf7016aa45c6010cfa16e0141e2a2
BLAKE2b-256 df1fe09600f92b553c3a5108208f77e8a9cc5f5bb86058d73b71792cb221ff5a

See more details on using hashes here.

File details

Details for the file qualifire-0.13.0-py3-none-any.whl.

File metadata

  • Download URL: qualifire-0.13.0-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qualifire-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f53c4a784020d42167d5d0cb5bdd3a55e4c1c81738e7eb0c0685f7da0d1f40a
MD5 22b719284632e0a494e170fc31d0d5c3
BLAKE2b-256 b4a2d49154f7a089f19f2bc974adea5c2390b2deb01402e4c2d668c751c89466

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page