Skip to main content

Official Python SDK for Variably feature flags, LLM experimentation, and prompt optimization platform

Project description

Variably Python SDK

Official Python SDK for Variably — LLM evaluation, experimentation, and prompt optimization.

Installation

pip install variably-sdk

For Docker/Kubernetes deployments, add to your requirements.txt:

variably-sdk>=2.6.1

Quick Start — Observe Mode

Add one line to your existing LLM app and get multi-dimension evaluation across 40+ metrics in 6 categories: Quality, Safety, Semantic, Grounding, Coherence, and Advanced.

No experiment setup. No prompt migration. Just log and see scores.

1. Set your environment variables

VARIABLY_API_KEY=vbl_your_key_here
VARIABLY_BASE_URL=https://api.variably.tech

2. Add one line after your LLM call

from variably import observe

# Your existing code (unchanged)
response = your_llm_call(user_query)

# Add this line:
observe(prompt=user_query, response=response)

Auto-extract tokens & model from provider response

# OpenAI
from openai import OpenAI
from variably import observe

client = OpenAI()
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": user_query}],
)

observe(
    prompt=user_query,
    response=completion.choices[0].message.content,
    provider_response=completion,  # auto-extracts model, tokens
)
# Anthropic
import anthropic
from variably import observe

client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": user_query}],
)

observe(
    prompt=user_query,
    response=message.content[0].text,
    provider_response=message,  # auto-extracts model, tokens
)

RAG applications — grounding & hallucination scoring

observe(
    prompt=user_query,
    response=llm_answer,
    provider_response=completion,
    reference_materials=[
        {"id": "chunk-1", "content": "Retrieved text...", "source": "docs.pdf"},
        {"id": "chunk-2", "content": "Another chunk...", "source": "faq.md"},
    ],
    retrieval_query=user_query,
)

Multi-turn chat — coherence scoring

observe(
    prompt=latest_user_message,
    response=llm_answer,
    provider_response=completion,
    conversation_history=[
        {"role": "user", "content": "What is diabetes?"},
        {"role": "assistant", "content": "Diabetes is a chronic condition..."},
        {"role": "user", "content": "What are the symptoms?"},
    ],
    session_id="conv-123",
)

RAG applications — grounding & hallucination scoring

observe(
    prompt=user_query,
    response=llm_answer,
    provider_response=completion,
    reference_materials=[
        {"id": "chunk-1", "content": "Retrieved text...", "source": "docs.pdf"},
        {"id": "chunk-2", "content": "Another chunk...", "source": "faq.md"},
    ],
    retrieval_query=user_query,
)

Multi-turn chat — coherence scoring

observe(
    prompt=latest_user_message,
    response=llm_answer,
    provider_response=completion,
    conversation_history=[
        {"role": "user", "content": "What is diabetes?"},
        {"role": "assistant", "content": "Diabetes is a chronic condition..."},
        {"role": "user", "content": "What are the symptoms?"},
    ],
    session_id="conv-123",
)
Parameter Type Required Description
prompt str Yes The user's input / question
response str Yes The LLM's generated response
provider_response object No Raw OpenAI/Anthropic/Google response — auto-extracts model, tokens
model str No Model name (auto-extracted if provider_response given)
provider str No "openai", "anthropic", etc. (auto-detected)
latency_ms int No Response generation time in milliseconds
prompt_tokens int No Input token count (auto-extracted if provider_response given)
completion_tokens int No Output token count (auto-extracted)
cost float No Cost in USD
reference_materials list[dict] No RAG chunks: [{"id", "content", "source"}] — enables grounding scoring
retrieval_query str No Query sent to retriever — enables retrieval quality scoring
conversation_history list[dict] No Prior turns: [{"role", "content"}] — enables coherence scoring
tags list[str] No Grouping labels, e.g. ["production", "rag"]
user_id str No Your user's ID
session_id str No Conversation session ID (groups multi-turn)
metadata dict No Any extra key-value data

Prompt Experimentation

Variably provides two modes for LLM prompt experimentation:

BYOR (Bring Your Own Runtime)

You call your own LLM. Variably handles variant allocation and 41-dimensional evaluation.

from variably import VariablyClient
import time

client = VariablyClient({"api_key": "your-api-key"})

user_context = {"user_id": "user-123"}
input_variables = {"query": "What are the symptoms of Type 2 diabetes?"}

# Step 1: Get the allocated variant
variant = client.get_variant("rag-prompt-experiment", user_context, input_variables)
print(f"Variant: {variant.variant_key}, Model: {variant.model}")

# Step 2: Call your LLM with the variant's prompt template
prompt = variant.prompt_template.format(**input_variables)
start = time.time()
llm_response = call_your_llm(prompt, model=variant.model)  # your LLM call
latency = int((time.time() - start) * 1000)

# Step 3: Submit the response for 41-dimensional evaluation
result = client.submit_response(
    experiment_key="rag-prompt-experiment",
    variant_key=variant.variant_key,
    executed_prompt=prompt,
    response=llm_response,
    user_context=user_context,
    input_variables=input_variables,
    provider=variant.provider,
    model=variant.model,
    latency_ms=latency,
)
print(f"Submitted: {result.status}")

Managed Execution

Variably selects the variant, calls the LLM, and evaluates — all in one call.

response = client.evaluate_prompt(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What are the symptoms of Type 2 diabetes?"},
    evaluation_mode="full",  # "full" | "fast"
)

print(f"Content: {response.content}")
print(f"Model: {response.model}, Latency: {response.latency_ms}ms")
print(f"Tokens: {response.token_usage}")
print(f"Quality Score: {response.quality_score}")

Managed Execution with Streaming (v2.1.0+)

Same as managed execution, but tokens stream in real-time — ideal for chatbot UIs.

from variably import VariablyClient

client = VariablyClient({"api_key": "your-api-key"})

stream = client.evaluate_prompt_stream(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What are the symptoms of Type 2 diabetes?"},
)

# Tokens arrive one-by-one for real-time display
for token in stream:
    print(token, end="", flush=True)

print()  # newline after stream ends

# After iteration, metadata is available (token usage, latency, quality score)
meta = stream.metadata
if meta:
    print(f"Model: {meta.model}, Latency: {meta.latency_ms}ms")
    print(f"Tokens: {meta.token_usage}")

Context-Aware Evaluation (Better RAG Quality) — v2.2.0+

For RAG chatbots, passing conversation history and retrieved chunks enables groundedness scoring, hallucination detection, and conversational coherence — dimensions that are impossible to evaluate in isolation.

The evaluation_context parameter is not sent to the LLM — it's only used by Variably's evaluator for richer scoring.

# Step 1: Collect conversation history from your session
workflow_history = [
    {"role": "user", "content": "What causes diabetes?"},
    {"role": "assistant", "content": "Key factors include genetics, diet..."},
    {"role": "user", "content": "What about potatoes?"},
]

# Step 2: Collect retrieved RAG chunks (after your retrieval step)
reference_materials = [
    {
        "id": "chunk-001",
        "content": "Unhealthy diets high in refined sugars, fats...",
        "source": "Kenya National Clinical Guidelines",
        "type": "chunk",
        "relevance_score": 0.89,
    },
    {
        "id": "chunk-002",
        "content": "Modifiable risk factors include obesity...",
        "source": "Kenya National Clinical Guidelines",
        "type": "chunk",
        "relevance_score": 0.82,
    },
]

# Step 3: Pass evaluation_context in your evaluate call
response = client.evaluate_prompt(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What about potatoes?", "context": context_text},
    evaluation_mode="full",
    evaluation_context={
        "reference_materials": reference_materials,
        "workflow_history": workflow_history,
        "retrieval_query": "potato consumption glycemic index diabetes risk",
    },
)

# Same works with streaming
stream = client.evaluate_prompt_stream(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What about potatoes?", "context": context_text},
    evaluation_context={
        "reference_materials": reference_materials,
        "workflow_history": workflow_history,
    },
)
for token in stream:
    print(token, end="", flush=True)

What this enables:

Dimension Description Requires
faithfulness % of claims grounded in retrieved chunks reference_materials
hallucination_rate % of claims with no source in context reference_materials
context_utilization % of relevant chunks actually used reference_materials
attribution_accuracy Do citations map to correct chunks? reference_materials
conversation_consistency No contradictions with prior turns workflow_history
context_retention Maintains topic awareness across turns workflow_history
transparency Discloses when going beyond source material reference_materials

BYOR mode also supports evaluation_context — pass it in submit_response():

result = client.submit_response(
    experiment_key="my-experiment",
    variant_key=variant.variant_key,
    executed_prompt=prompt,
    response=llm_response,
    user_context=user_context,
    input_variables=input_variables,
    provider=variant.provider,
    model=variant.model,
    latency_ms=latency,
    evaluation_context={
        "reference_materials": reference_materials,
        "workflow_history": workflow_history,
    },
)

evaluation_context Schema

Field Type Description
reference_materials list[dict] RAG chunks / source documents for groundedness scoring
reference_materials[].id str Unique chunk identifier
reference_materials[].content str Chunk text content
reference_materials[].source str (optional) Source document URL or name
reference_materials[].type str (optional) e.g. "chunk", "document"
reference_materials[].relevance_score float (optional) Retriever similarity score
workflow_history list[dict] Conversation turns for coherence scoring
workflow_history[].role str "user" or "assistant"
workflow_history[].content str Message content
retrieval_query str (optional) The rewritten query sent to the retriever

See Context-Aware RAG Evaluation for the full concept doc with architecture diagrams and integration examples.

Integration with LangGraph / FastAPI streaming

from fastapi.responses import StreamingResponse

async def stream_with_variably(query: str, session_id: str):
    """Yield NDJSON events from Variably streaming evaluation."""
    stream = client.evaluate_prompt_stream(
        experiment_key="my-experiment",
        user_context={"user_id": session_id},
        input_variables={"query": query},
    )

    for token in stream:
        yield json.dumps({"type": "token", "content": token}) + "\n"

    # Send final metadata
    if stream.metadata:
        yield json.dumps({
            "type": "stream_end",
            "content": stream.metadata.content,
        }) + "\n"

@app.post("/api/chat")
async def chat(request: ChatRequest):
    return StreamingResponse(
        stream_with_variably(request.message, request.session_id),
        media_type="application/x-ndjson",
    )

Backend API: SSE Streaming Endpoint

The streaming endpoint uses Server-Sent Events (SSE). Here's the raw API:

Endpoint: POST /api/v1/internal/sdk/prompt-experiments/evaluate-stream

Headers:

X-API-Key: your-api-key
Content-Type: application/json

Request body (same as non-streaming evaluate):

{
  "experiment_key": "rag-prompt-experiment",
  "user_context": {
    "userId": "user-123",
    "sessionId": "sess-456"
  },
  "input_variables": {
    "query": "What are the symptoms of Type 2 diabetes?"
  },
  "evaluation_context": {
    "reference_materials": [{"id": "chunk-1", "content": "...", "source": "...", "type": "chunk"}],
    "workflow_history": [{"role": "user", "content": "..."}],
    "retrieval_query": "diabetes symptoms type 2"
  }
}

Response (SSE stream):

event: token
data: {"content": "Type"}

event: token
data: {"content": " 2"}

event: token
data: {"content": " diabetes"}

event: token
data: {"content": " symptoms"}

event: token
data: {"content": " include..."}

event: metadata
data: {"experiment_id": "exp-123", "variant_id": "variant-a", "execution_id": "eval-789", "provider": "anthropic", "model": "claude-3-5-haiku-20241022", "prompt_tokens": 150, "completion_tokens": 85, "total_tokens": 235, "cost_usd": 0.000425, "latency_ms": 1250}

event: done
data: {}

curl example:

curl -N -X POST http://localhost:8080/api/v1/internal/sdk/prompt-experiments/evaluate-stream \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "experiment_key": "rag-prompt-experiment",
    "user_context": {"userId": "user-123", "sessionId": "sess-456"},
    "input_variables": {"query": "What are the symptoms of Type 2 diabetes?"}
  }'

Error handling: If an error occurs during streaming, an error event is sent:

event: error
data: {"message": "LLM generation failed: rate limit exceeded"}

Configuration

from variably import VariablyConfig, VariablyClient

config = VariablyConfig(
    api_key="your-api-key",
    base_url="https://api.variably.com",  # default: http://localhost:8080
    environment="production",  # default: development
    timeout=5000,  # timeout in milliseconds, default: 5000
    retry_attempts=3,  # default: 3
    enable_analytics=True,  # default: True
    cache={
        "ttl": 300,  # TTL in seconds, default: 300 (5 minutes)
        "max_size": 1000,  # default: 1000
        "enabled": True  # default: True
    },
    log_level="INFO"  # DEBUG, INFO, WARNING, ERROR
)

client = VariablyClient(config)

Advanced Usage

Environment Variables

You can create a client using environment variables:

from variably import create_client_from_env

# Uses these environment variables:
# VARIABLY_API_KEY (required)
# VARIABLY_BASE_URL
# VARIABLY_ENVIRONMENT
# VARIABLY_TIMEOUT
# VARIABLY_RETRY_ATTEMPTS
# VARIABLY_ENABLE_ANALYTICS
# VARIABLY_LOG_LEVEL

client = create_client_from_env()

Different Flag Types

# Boolean flags
bool_value = client.evaluate_flag_bool("feature-enabled", False, user_context)

# String flags
string_value = client.evaluate_flag_string("theme", "light", user_context)

# Number flags
number_value = client.evaluate_flag_number("max-items", 10, user_context)

# JSON flags
json_value = client.evaluate_flag_json("config", {"timeout": 5000}, user_context)

# Get full evaluation details
result = client.evaluate_flag("feature-flag", "default", user_context)
print(f"Value: {result.value}, Reason: {result.reason}, Cache Hit: {result.cache_hit}")

Batch Evaluation

flags = client.evaluate_flags([
    "feature-a",
    "feature-b", 
    "feature-c"
], user_context)

print(flags["feature-a"].value)

Event Tracking

from datetime import datetime

# Single event
client.track({
    "name": "purchase_completed",
    "user_id": "user-123",
    "properties": {
        "amount": 99.99,
        "currency": "USD",
        "items": ["item-1", "item-2"]
    },
    "timestamp": datetime.utcnow()  # optional, auto-generated if not provided
})

# Batch events
client.track_batch([
    {"name": "page_view", "user_id": "user-123", "properties": {"page": "/home"}},
    {"name": "button_click", "user_id": "user-123", "properties": {"button": "cta"}}
])

Cache Management

# Clear cache
client.clear_cache()

# Get cache stats
stats = client.cache.get_stats()
print(stats)  # {"size": 10, "max_size": 1000, "enabled": True, "ttl": 300}

Metrics

# Get SDK metrics
metrics = client.get_metrics()
print(metrics)
# {
#     "api_calls": 25,
#     "cache_hits": 15,
#     "cache_misses": 10,
#     "errors": 1,
#     "average_latency": 45.2,
#     "cache_hit_rate": 0.6,
#     "error_rate": 0.04,
#     "flags_evaluated": 20,
#     "gates_evaluated": 5,
#     "events_tracked": 12,
#     "start_time": "2023-10-01T12:00:00Z",
#     "uptime_seconds": 3600
# }

Context Manager

# Use with context manager for automatic cleanup
with VariablyClient({"api_key": "your-api-key"}) as client:
    result = client.evaluate_flag_bool("feature", False, user_context)
    # client.close() is called automatically

Custom Logger

from variably import VariablyClient, create_logger

# Create custom logger
logger = create_logger(
    name="my-app",
    level="DEBUG",
    structured=True,  # JSON logging
    silent=False
)

# Client will use the custom logger
client = VariablyClient({
    "api_key": "your-api-key",
    "log_level": "DEBUG"
})

Error Handling

from variably import (
    VariablyError,
    NetworkError,
    AuthenticationError,
    ValidationError,
    RateLimitError,
    TimeoutError,
    ConfigurationError
)

try:
    result = client.evaluate_flag("my-flag", False, user_context)
except AuthenticationError:
    print("Invalid API key")
except NetworkError as e:
    print(f"Network error: {e.status_code}")
except ValidationError as e:
    print(f"Validation error in field: {e.field}")
except RateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after} seconds")
except TimeoutError:
    print("Request timed out")
except ConfigurationError as e:
    print(f"Configuration error in parameter: {e.parameter}")
except VariablyError as e:
    print(f"Variably SDK error: {e}")

Type Hints

The SDK includes full type hints for better IDE support:

from typing import Dict, Any
from variably import VariablyClient, UserContext, FlagResult

user_context: UserContext = {
    "user_id": "user-123",
    "email": "user@example.com",
    "attributes": {
        "plan": "premium",
        "signup_date": "2023-01-01"
    }
}

result: FlagResult = client.evaluate_flag("feature", False, user_context)

Async Support

For async applications, you can wrap the synchronous client:

import asyncio
from concurrent.futures import ThreadPoolExecutor
from variably import VariablyClient

class AsyncVariablyClient:
    def __init__(self, config):
        self.client = VariablyClient(config)
        self.executor = ThreadPoolExecutor(max_workers=4)
    
    async def evaluate_flag_bool(self, flag_key, default_value, user_context):
        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            self.executor,
            self.client.evaluate_flag_bool,
            flag_key, default_value, user_context
        )
    
    async def close(self):
        self.client.close()
        self.executor.shutdown(wait=True)

# Usage
async def main():
    client = AsyncVariablyClient({"api_key": "your-api-key"})
    
    result = await client.evaluate_flag_bool("feature", False, {
        "user_id": "user-123"
    })
    
    await client.close()

asyncio.run(main())

Development

Setup

# Install development dependencies
pip install -e ".[dev]"

Testing

pytest

Code Quality

# Format code
black src/ tests/

# Sort imports
isort src/ tests/

# Lint
flake8 src/ tests/

# Type check
mypy src/

Publishing to PyPI

Prerequisites

  1. Create a PyPI account at https://pypi.org/account/register/
  2. Generate an API token at https://pypi.org/manage/account/token/
    • Scope: select "Entire account" for first upload, or project-specific after that
  3. Install build tools:
    pip3 install build twine
    

Note: build and twine install to user site-packages and may not be on your PATH. Always use python3 -m build and python3 -m twine instead of bare build/twine.

Configure PyPI credentials

Create ~/.pypirc:

[distutils]
index-servers = pypi

[pypi]
username = __token__
password = pypi-YOUR_API_TOKEN_HERE

Secure the file:

chmod 600 ~/.pypirc

Build and publish

The version in the build output (e.g., variably_sdk-2.0.0-py3-none-any.whl) comes directly from pyproject.toml's version field. PyPI rejects re-uploads of the same version — you must bump the version to publish again.

# 1. Clean previous builds
rm -rf dist/ build/ src/*.egg-info

# 2. Build sdist and wheel
python3 -m build

# 3. Verify the package (optional but recommended)
python3 -m twine check dist/*

# 4. Upload to TestPyPI first (optional, for dry-run)
python3 -m twine upload --repository testpypi dist/*

# 5. Upload to PyPI
python3 -m twine upload dist/*

Verify the published package

pip3 install variably-sdk==2.1.0
python3 -c "from variably import VariablyClient, PromptVariant; print('OK')"

Version bumping checklist

When releasing a new version, update these three files then clean-build-publish:

  1. src/variably/version.py__version__
  2. pyproject.tomlversion
  3. src/variably/http_client.pyUser-Agent header string
# Example: bumping from 2.0.0 to 2.0.1
# After updating the 3 files above:
rm -rf dist/ build/ src/*.egg-info
python3 -m build
python3 -m twine upload dist/*

Requirements

  • Python 3.7+
  • requests >= 2.25.0

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

variably_sdk-2.9.0.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

variably_sdk-2.9.0-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file variably_sdk-2.9.0.tar.gz.

File metadata

  • Download URL: variably_sdk-2.9.0.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for variably_sdk-2.9.0.tar.gz
Algorithm Hash digest
SHA256 e8b1ec288e87037babeb005e624b5caddfea40ce0a97c46d05bf6329e8629d66
MD5 60c76707020c39b53a81beab3525bb2a
BLAKE2b-256 197947175959fd0c2a728de0e0136ac4c6e71c97c42dc7545b7b2c964028ea8c

See more details on using hashes here.

File details

Details for the file variably_sdk-2.9.0-py3-none-any.whl.

File metadata

  • Download URL: variably_sdk-2.9.0-py3-none-any.whl
  • Upload date:
  • Size: 29.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for variably_sdk-2.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7d7c2870cde01919aa19a0fdc084a43a255bae5294aa5166e277dc621740612f
MD5 46f916f2bf7fecc1d57e88639cd0d353
BLAKE2b-256 3aade453aa709ff7e92ec5e5bc7e1ef27a7fc9c724ab8b9440bab7c32780148b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page