Official Python SDK for RAIL Score API - Responsible AI Content Evaluation

These details have not been verified by PyPI

Project links

Project description

RAIL Score Python SDK

Official Python client library for the RAIL Score API — Evaluate AI-generated content across 8 dimensions of Responsible AI.

Features

Sync & Async Clients — Blocking requests-based and non-blocking httpx-based clients
Evaluation — Score content across 8 RAIL dimensions in basic (fast) or deep (detailed) mode
Protected Content — Evaluate against a quality threshold and regenerate improved content
Compliance — Check against GDPR, CCPA, HIPAA, EU AI Act, India DPDP, India AI Governance
Policy Engine — Configurable enforcement: log_only, block, regenerate, or custom callback
Multi-Turn Sessions — Conversation-aware evaluation with adaptive quality gating
LLM Provider Wrappers — Drop-in wrappers for OpenAI, Anthropic, and Google Gemini
Observability — Langfuse v3 score integration and LiteLLM guardrail support
Type-Safe — Full type hints and typed response models
Error Handling — Granular exception hierarchy for every error scenario

Installation

pip install rail-score-sdk

With optional LLM provider integrations:

# Individual providers
pip install "rail-score-sdk[openai]"
pip install "rail-score-sdk[anthropic]"
pip install "rail-score-sdk[google]"

# Observability
pip install "rail-score-sdk[langfuse]"
pip install "rail-score-sdk[litellm]"

# All integrations
pip install "rail-score-sdk[integrations]"

For development:

pip install "rail-score-sdk[dev]"

Quick Start

Sync Client

from rail_score_sdk import RailScoreClient

client = RailScoreClient(api_key="your-api-key")

result = client.eval(
    content="AI should prioritize human welfare and be transparent.",
    mode="basic",
)

print(f"RAIL Score: {result.rail_score.score}/10")
print(f"Summary: {result.rail_score.summary}")

for dim, score in result.dimension_scores.items():
    print(f"  {dim}: {score.score}/10")

Async Client

import asyncio
from rail_score_sdk import AsyncRAILClient

async def main():
    async with AsyncRAILClient(api_key="your-api-key") as client:
        result = await client.eval(
            content="AI should prioritize human welfare and be transparent.",
            mode="basic",
        )
        print(f"RAIL Score: {result['rail_score']['score']}/10")

asyncio.run(main())

API Reference

Evaluate Content

Score content across all 8 RAIL dimensions. Supports basic (hybrid ML, fast) and deep (LLM-as-Judge, detailed) modes.

# Basic mode — all dimensions
result = client.eval(
    content="Your content here",
    mode="basic",
    domain="general",       # general, healthcare, finance, legal, education, code
    usecase="general",      # general, chatbot, content_generation, summarization, translation, code_generation
)

# Deep mode — specific dimensions with explanations
result = client.eval(
    content="Your content here",
    mode="deep",
    dimensions=["safety", "reliability", "fairness"],
    include_explanations=True,
    include_issues=True,
    include_suggestions=True,
)

# Custom dimension weights (must sum to 100)
result = client.eval(
    content="Your content here",
    weights={
        "fairness": 10, "safety": 25, "reliability": 25,
        "transparency": 10, "privacy": 10, "accountability": 10,
        "inclusivity": 5, "user_impact": 5,
    },
)

Safe Regenerate

Evaluate content and automatically regenerate until it meets your thresholds:

# Server-side regeneration (RAIL_Safe_LLM mode)
result = client.safe_regenerate(
    content="Content to improve",
    mode="basic",
    max_regenerations=3,
    regeneration_model="RAIL_Safe_LLM",
    thresholds={"overall": {"score": 7.0}},
    domain="general",
)

print(result.best_content)
print(f"Credits consumed: {result.credits_consumed}")
print(f"Best iteration: {result.best_iteration}")

# External mode (client-orchestrated loop)
result = client.safe_regenerate(
    content="Content to check",
    mode="basic",
    regeneration_model="external",
    thresholds={"overall": {"score": 7.0}},
)

if result.status == "awaiting_regeneration" and result.session_id:
    # Regenerate with your own model, then continue the session
    improved = my_llm_regenerate(result.best_content)
    result = client.safe_regenerate_continue(
        session_id=result.session_id,
        regenerated_content=improved,
    )

Compliance Check

Evaluate against regulatory frameworks:

# Single framework
result = client.compliance_check(
    content="Our AI processes user data...",
    framework="gdpr",
    context={
        "domain": "e-commerce",
        "data_types": ["browsing_history", "purchase_data"],
        "processing_purpose": "personalized_recommendations",
    },
)

print(f"Score: {result.compliance_score.score}/10 ({result.compliance_score.label})")
print(f"Passed: {result.requirements_passed}/{result.requirements_checked}")

# Multi-framework (up to 5)
result = client.compliance_check(
    content="...",
    frameworks=["gdpr", "ccpa"],
)
print(result.cross_framework_summary.average_score)

# Strict mode (8.5 threshold instead of 7.0)
result = client.compliance_check(content="...", framework="ccpa", strict_mode=True)

Supported frameworks: gdpr, ccpa, hipaa, eu_ai_act, india_dpdp, india_ai_gov

Utility

# Health check
health = client.health()
print(health.status)

# Version info
version = client.version()
print(f"{version.version} ({version.api_version})")

Policy Engine

The PolicyEngine controls what happens when content scores below your threshold:

import asyncio
from rail_score_sdk import AsyncRAILClient, PolicyEngine, Policy, RAILBlockedError

async def main():
    client = AsyncRAILClient(api_key="your-api-key")
    async with client:
        eval_response = await client.eval(content="Some content", mode="basic")

        # Log only (default) — always passes through
        engine = PolicyEngine(policy=Policy.LOG_ONLY, threshold=7.0)
        result = await engine.enforce("Some content", eval_response, client)
        print(f"Score: {result.score}, Passed: {result.threshold_met}")

        # Block — raises RAILBlockedError if below threshold
        engine = PolicyEngine(policy=Policy.BLOCK, threshold=7.0)
        try:
            result = await engine.enforce("Some content", eval_response, client)
        except RAILBlockedError as e:
            print(f"Blocked! Score: {e.score}, Threshold: {e.threshold}")

        # Regenerate — automatically calls /protected/regenerate
        engine = PolicyEngine(policy=Policy.REGENERATE, threshold=7.0)
        result = await engine.enforce("Some content", eval_response, client)
        if result.was_regenerated:
            print(f"Improved content: {result.content}")
            print(f"Original: {result.original_content}")

        # Custom — run your own async callback
        async def my_handler(content, eval_data, rail_client):
            # Your custom logic here (e.g., call a different LLM)
            return "My custom improved content"

        engine = PolicyEngine(
            policy=Policy.CUSTOM,
            threshold=7.0,
            custom_callback=my_handler,
        )
        result = await engine.enforce("Some content", eval_response, client)

asyncio.run(main())

Multi-Turn Sessions

RAILSession tracks conversation history and applies RAIL evaluation to every turn with adaptive quality gating:

import asyncio
from rail_score_sdk import RAILSession

async def main():
    async with RAILSession(
        api_key="your-api-key",
        threshold=7.0,
        policy="regenerate",      # auto-regenerate low-scoring responses
        mode="basic",
        domain="healthcare",
        deep_every_n=5,           # force deep eval every 5th turn
        context_window=3,         # include last 3 turns as context
    ) as session:

        # Evaluate each conversation turn
        result = await session.evaluate_turn(
            user_message="What medication should I take for a headache?",
            assistant_response="Take 500mg ibuprofen every 4-6 hours with food.",
        )
        print(f"Turn 1 — Score: {result.score}, Content: {result.content}")

        result = await session.evaluate_turn(
            user_message="What about if I have stomach issues?",
            assistant_response="Consider acetaminophen instead, and consult your doctor.",
        )
        print(f"Turn 2 — Score: {result.score}")

        # Pre-evaluate user input for safety (doesn't record a turn)
        input_check = await session.evaluate_input("How do I harm someone?")
        if not input_check.threshold_met:
            print(f"Unsafe input detected! Score: {input_check.score}")

        # Session-level metrics
        print(session.scores_summary())
        # {'total_turns': 2, 'average_score': 8.2, 'lowest_score': 7.8,
        #  'turns_below_threshold': 0, 'regenerations': 0}

asyncio.run(main())

Middleware

RAILMiddleware wraps any async LLM generate function with RAIL evaluation and policy enforcement:

import asyncio
from rail_score_sdk import RAILMiddleware

# Your LLM call — can be any provider
async def my_llm_generate(messages, **kwargs):
    # Call OpenAI, Anthropic, local model, etc.
    return "The LLM generated this response."

async def main():
    mw = RAILMiddleware(
        api_key="your-rail-api-key",
        generate_fn=my_llm_generate,
        threshold=7.0,
        policy="block",           # block low-scoring responses
        mode="basic",
        domain="general",
        eval_input=True,          # also safety-check the user's input
        input_threshold=5.0,      # separate threshold for input safety
    )

    result = await mw.run(
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain quantum computing."},
        ]
    )

    print(f"Content: {result.content}")
    print(f"RAIL Score: {result.score}")
    print(f"Threshold met: {result.threshold_met}")

asyncio.run(main())

With pre/post hooks for logging:

async def log_before(messages):
    print(f"Sending {len(messages)} messages to LLM...")
    return messages  # return modified messages or None to keep original

async def log_after(original_text, eval_result):
    print(f"Score: {eval_result.score}, Regenerated: {eval_result.was_regenerated}")

mw = RAILMiddleware(
    api_key="your-rail-api-key",
    generate_fn=my_llm_generate,
    threshold=7.0,
    pre_hook=log_before,
    post_hook=log_after,
)

LLM Provider Wrappers

Drop-in wrappers that automatically evaluate every LLM response via RAIL Score.

OpenAI

pip install "rail-score-sdk[openai]"

import asyncio
from rail_score_sdk.integrations import RAILOpenAI

async def main():
    client = RAILOpenAI(
        openai_api_key="sk-...",
        rail_api_key="your-rail-api-key",
        rail_threshold=7.0,
        rail_policy="regenerate",  # auto-fix low-scoring responses
        rail_mode="basic",
    )

    response = await client.chat_completion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Explain quantum computing."}],
        temperature=0.7,
    )

    print(f"Content: {response.content}")
    print(f"RAIL Score: {response.rail_score}/10")
    print(f"Confidence: {response.rail_confidence}")
    print(f"Threshold met: {response.threshold_met}")
    print(f"Was regenerated: {response.was_regenerated}")
    print(f"Model: {response.model}")
    print(f"Token usage: {response.usage}")

    # Access dimension scores
    for dim, data in response.rail_dimensions.items():
        score = data if isinstance(data, (int, float)) else data.get("score")
        print(f"  {dim}: {score}/10")

    # Skip RAIL evaluation for a specific call
    raw_response = await client.chat_completion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
        rail_skip=True,
    )

asyncio.run(main())

Anthropic

pip install "rail-score-sdk[anthropic]"

import asyncio
from rail_score_sdk.integrations import RAILAnthropic

async def main():
    client = RAILAnthropic(
        anthropic_api_key="sk-ant-...",
        rail_api_key="your-rail-api-key",
        rail_threshold=7.0,
        rail_policy="block",
    )

    response = await client.message(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Explain quantum computing."}],
        system="You are a helpful physics tutor.",
    )

    print(f"Content: {response.content}")
    print(f"RAIL Score: {response.rail_score}/10")
    print(f"Threshold met: {response.threshold_met}")
    print(f"Token usage: {response.usage}")

asyncio.run(main())

Google Gemini

pip install "rail-score-sdk[google]"

import asyncio
from rail_score_sdk.integrations import RAILGemini

async def main():
    # Using Gemini API key
    client = RAILGemini(
        rail_api_key="your-rail-api-key",
        gemini_api_key="AIza...",
        rail_threshold=7.0,
        rail_policy="log_only",
    )

    response = await client.generate(
        model="gemini-2.5-flash",
        contents="Explain quantum computing in simple terms.",
    )

    print(f"Content: {response.content}")
    print(f"RAIL Score: {response.rail_score}/10")
    print(f"Threshold met: {response.threshold_met}")

    # Using Vertex AI
    vertex_client = RAILGemini(
        rail_api_key="your-rail-api-key",
        vertexai=True,
        project="my-gcp-project",
        location="us-central1",
        rail_threshold=7.0,
    )

asyncio.run(main())

Observability Integrations

Langfuse

Push RAIL scores as numeric scores into Langfuse v3 traces:

pip install "rail-score-sdk[langfuse]"

import asyncio
from rail_score_sdk.integrations import RAILLangfuse

async def main():
    rl = RAILLangfuse(
        rail_api_key="your-rail-api-key",
        langfuse_public_key="pk-lf-...",
        langfuse_secret_key="sk-lf-...",
        score_dimensions=True,     # push all 8 dimension scores
        score_prefix="rail_",      # scores named "rail_overall", "rail_fairness", etc.
    )

    # Evaluate content and push scores in one call
    result = await rl.evaluate_and_log(
        content="The LLM response to evaluate.",
        trace_id="trace-abc-123",
        observation_id="gen-xyz-456",   # optional
        mode="deep",
    )
    print(f"RAIL Score: {result.score}/10")

asyncio.run(main())

You can also log pre-computed results from a RAILSession:

from rail_score_sdk import RAILSession
from rail_score_sdk.integrations import RAILLangfuse

rl = RAILLangfuse(rail_api_key="your-rail-api-key")

# After evaluating a turn in a session...
result = await session.evaluate_turn(
    user_message="...",
    assistant_response="...",
)

# Push the result to Langfuse
rl.log_eval_result(result, trace_id="trace-abc-123")

LiteLLM Guardrail

Use RAIL Score as a LiteLLM proxy guardrail:

pip install "rail-score-sdk[litellm]"

In your LiteLLM config.yaml:

guardrails:
  - guardrail_name: "rail-score-guard"
    litellm_params:
      guardrail: rail_score_sdk.integrations.litellm_guardrail.RAILGuardrail
      mode: "post_call"
      api_key: os.environ/RAIL_API_KEY
      api_base: os.environ/RAIL_API_BASE

Or use it standalone:

from rail_score_sdk.integrations import RAILGuardrail

guard = RAILGuardrail(
    api_key="your-rail-api-key",
    guardrail_name="rail-score",
    event_hook="post_call",       # "pre_call", "post_call", or "during_call"
    rail_threshold=7.0,
    rail_input_threshold=5.0,     # separate threshold for input safety (pre_call)
    rail_mode="basic",
)

RAIL Dimensions

Content is evaluated across 8 dimensions on a 0–10 scale:

Dimension	Description
Fairness	Equitable treatment across demographic groups. No bias or stereotyping.
Safety	Prevention of harmful, toxic, or unsafe content.
Reliability	Factual accuracy, internal consistency, appropriate calibration.
Transparency	Clear communication of limitations and reasoning process.
Privacy	Protection of personal information and data minimization.
Accountability	Traceable reasoning, stated assumptions, error signals.
Inclusivity	Inclusive language, accessibility, cultural awareness.
User Impact	Positive value delivered at the right detail level and tone.

Score labels: Excellent (9–10), Good (7–8.9), Needs improvement (5–6.9), Poor (3–4.9), Critical (0–2.9)

Authentication

# API key or JWT token
client = RailScoreClient(api_key="rail_xxx...")

# Environment variable
import os
client = RailScoreClient(
    api_key=os.getenv("RAIL_API_KEY"),
    base_url=os.getenv("RAIL_BASE_URL", "https://api.responsibleailabs.ai"),
    timeout=int(os.getenv("RAIL_TIMEOUT", "30")),
)

Error Handling

from rail_score_sdk import (
    RailScoreError,
    AuthenticationError,
    InsufficientCreditsError,
    InsufficientTierError,
    ValidationError,
    ContentTooHarmfulError,
    RateLimitError,
    EvaluationFailedError,
    ServiceUnavailableError,
    RAILBlockedError,
)

try:
    result = client.eval(content="...")
except AuthenticationError:
    print("Check your API key")
except InsufficientCreditsError as e:
    print(f"Credits: {e.balance} available, {e.required} needed")
except InsufficientTierError:
    print("Upgrade your plan for this feature")
except ValidationError as e:
    print(f"Bad request: {e.message}")
except ContentTooHarmfulError:
    print("Content too harmful to regenerate (avg score < 3.0)")
except RateLimitError:
    print("Rate limited — try again later")
except EvaluationFailedError:
    print("Server error — safe to retry")
except ServiceUnavailableError:
    print("Service temporarily unavailable")
except RailScoreError as e:
    print(f"API error ({e.status_code}): {e.message}")

# For async/policy-based code
try:
    result = await engine.enforce(content, eval_response, client)
except RAILBlockedError as e:
    print(f"Content blocked — Score: {e.score}, Threshold: {e.threshold}")

Examples

See the examples directory:

basic_usage.py — Basic and deep evaluation
advanced_features.py — Custom weights, dimension filtering, basic vs deep
compliance_check.py — GDPR, CCPA, HIPAA, EU AI Act, multi-framework
regenerate_content.py — Protected content evaluation and regeneration
error_handling.py — Production error handling patterns
batch_processing.py — Processing multiple items with retry

Testing

pip install -r requirements-dev.txt

pytest
pytest --cov=rail_score_sdk --cov-report=html

black rail_score_sdk/
mypy rail_score_sdk/

Migrating from `rail-score`

The rail-score package has been deprecated. To migrate:

pip uninstall rail-score
pip install rail-score-sdk

Update your imports:

# Old (deprecated)
from rail_score import RailScore
client = RailScore(api_key="...")

# New
from rail_score_sdk import RailScoreClient
client = RailScoreClient(api_key="...")

License

MIT License — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.5.1

May 13, 2026

2.5.0

May 13, 2026

2.4.0

Mar 23, 2026

2.3.0

Mar 21, 2026

This version

2.2.1

Mar 6, 2026

2.2.0

Mar 6, 2026

2.1.1

Feb 25, 2026

2.1.0

Feb 25, 2026

2.0.0

Feb 25, 2026

1.0.1

Oct 18, 2025

1.0.0

Oct 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rail_score_sdk-2.2.1.tar.gz (55.3 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rail_score_sdk-2.2.1-py3-none-any.whl (42.2 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file rail_score_sdk-2.2.1.tar.gz.

File metadata

Download URL: rail_score_sdk-2.2.1.tar.gz
Upload date: Mar 6, 2026
Size: 55.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for rail_score_sdk-2.2.1.tar.gz
Algorithm	Hash digest
SHA256	`c02a1f73b5dd8b89d828234b88f3b27511dddac3b5bb88d726e944f7f8b0efec`
MD5	`df9587d2c9c262e3379522fd2f0d01e3`
BLAKE2b-256	`6ca92afefc5e5c188cd0477ce04426508054dfc51ab230f0b971cff930f9133d`

See more details on using hashes here.

File details

Details for the file rail_score_sdk-2.2.1-py3-none-any.whl.

File metadata

Download URL: rail_score_sdk-2.2.1-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 42.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for rail_score_sdk-2.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eae0c58233a89b9d89b37ee3f56e08fe3826fbc8299c9e2506644ecfbd193293`
MD5	`abd0b7f814eb0611aec864b82e3fe590`
BLAKE2b-256	`e258cd9309120212d13506e521dffd0de84bdaa0d392c95cb991d294c8e589bc`

See more details on using hashes here.

rail-score-sdk 2.2.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

RAIL Score Python SDK

Features

Installation

Quick Start

Sync Client

Async Client

API Reference

Evaluate Content

Safe Regenerate

Compliance Check

Utility

Policy Engine

Multi-Turn Sessions

Middleware

LLM Provider Wrappers

OpenAI

Anthropic

Google Gemini

Observability Integrations

Langfuse

LiteLLM Guardrail

RAIL Dimensions

Authentication

Error Handling

Examples

Testing

Migrating from rail-score

License

Links

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Migrating from `rail-score`