Official Python SDK for Variably feature flags, LLM experimentation, and prompt optimization platform
Project description
Variably Python SDK
Official Python SDK for Variably — LLM evaluation, experimentation, and prompt optimization.
Installation
pip install variably-sdk
For Docker/Kubernetes deployments, add to your requirements.txt:
variably-sdk>=2.6.1
Quick Start — Observe Mode
Add one line to your existing LLM app and get multi-dimension evaluation across 40+ metrics in 6 categories: Quality, Safety, Semantic, Grounding, Coherence, and Advanced.
No experiment setup. No prompt migration. Just log and see scores.
1. Set your environment variables
VARIABLY_API_KEY=vbl_your_key_here
VARIABLY_BASE_URL=https://api.variably.tech
2. Add one line after your LLM call
from variably import observe
# Your existing code (unchanged)
response = your_llm_call(user_query)
# Add this line:
observe(prompt=user_query, response=response)
Auto-extract tokens & model from provider response
# OpenAI
from openai import OpenAI
from variably import observe
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_query}],
)
observe(
prompt=user_query,
response=completion.choices[0].message.content,
provider_response=completion, # auto-extracts model, tokens
)
# Anthropic
import anthropic
from variably import observe
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": user_query}],
)
observe(
prompt=user_query,
response=message.content[0].text,
provider_response=message, # auto-extracts model, tokens
)
RAG applications — grounding & hallucination scoring
observe(
prompt=user_query,
response=llm_answer,
provider_response=completion,
reference_materials=[
{"id": "chunk-1", "content": "Retrieved text...", "source": "docs.pdf"},
{"id": "chunk-2", "content": "Another chunk...", "source": "faq.md"},
],
retrieval_query=user_query,
)
Multi-turn chat — coherence scoring
observe(
prompt=latest_user_message,
response=llm_answer,
provider_response=completion,
conversation_history=[
{"role": "user", "content": "What is diabetes?"},
{"role": "assistant", "content": "Diabetes is a chronic condition..."},
{"role": "user", "content": "What are the symptoms?"},
],
session_id="conv-123",
)
RAG applications — grounding & hallucination scoring
observe(
prompt=user_query,
response=llm_answer,
provider_response=completion,
reference_materials=[
{"id": "chunk-1", "content": "Retrieved text...", "source": "docs.pdf"},
{"id": "chunk-2", "content": "Another chunk...", "source": "faq.md"},
],
retrieval_query=user_query,
)
Multi-turn chat — coherence scoring
observe(
prompt=latest_user_message,
response=llm_answer,
provider_response=completion,
conversation_history=[
{"role": "user", "content": "What is diabetes?"},
{"role": "assistant", "content": "Diabetes is a chronic condition..."},
{"role": "user", "content": "What are the symptoms?"},
],
session_id="conv-123",
)
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt |
str | Yes | The user's input / question |
response |
str | Yes | The LLM's generated response |
provider_response |
object | No | Raw OpenAI/Anthropic/Google response — auto-extracts model, tokens |
model |
str | No | Model name (auto-extracted if provider_response given) |
provider |
str | No | "openai", "anthropic", etc. (auto-detected) |
latency_ms |
int | No | Response generation time in milliseconds |
prompt_tokens |
int | No | Input token count (auto-extracted if provider_response given) |
completion_tokens |
int | No | Output token count (auto-extracted) |
cost |
float | No | Cost in USD |
reference_materials |
list[dict] | No | RAG chunks: [{"id", "content", "source"}] — enables grounding scoring |
retrieval_query |
str | No | Query sent to retriever — enables retrieval quality scoring |
conversation_history |
list[dict] | No | Prior turns: [{"role", "content"}] — enables coherence scoring |
tags |
list[str] | No | Grouping labels, e.g. ["production", "rag"] |
user_id |
str | No | Your user's ID |
session_id |
str | No | Conversation session ID (groups multi-turn) |
metadata |
dict | No | Any extra key-value data |
Prompt Experimentation
Variably provides two modes for LLM prompt experimentation:
BYOR (Bring Your Own Runtime)
You call your own LLM. Variably handles variant allocation and 41-dimensional evaluation.
from variably import VariablyClient
import time
client = VariablyClient({"api_key": "your-api-key"})
user_context = {"user_id": "user-123"}
input_variables = {"query": "What are the symptoms of Type 2 diabetes?"}
# Step 1: Get the allocated variant
variant = client.get_variant("rag-prompt-experiment", user_context, input_variables)
print(f"Variant: {variant.variant_key}, Model: {variant.model}")
# Step 2: Call your LLM with the variant's prompt template
prompt = variant.prompt_template.format(**input_variables)
start = time.time()
llm_response = call_your_llm(prompt, model=variant.model) # your LLM call
latency = int((time.time() - start) * 1000)
# Step 3: Submit the response for 41-dimensional evaluation
result = client.submit_response(
experiment_key="rag-prompt-experiment",
variant_key=variant.variant_key,
executed_prompt=prompt,
response=llm_response,
user_context=user_context,
input_variables=input_variables,
provider=variant.provider,
model=variant.model,
latency_ms=latency,
)
print(f"Submitted: {result.status}")
Managed Execution
Variably selects the variant, calls the LLM, and evaluates — all in one call.
response = client.evaluate_prompt(
experiment_key="rag-prompt-experiment",
user_context={"user_id": "user-123"},
input_variables={"query": "What are the symptoms of Type 2 diabetes?"},
evaluation_mode="full", # "full" | "fast"
)
print(f"Content: {response.content}")
print(f"Model: {response.model}, Latency: {response.latency_ms}ms")
print(f"Tokens: {response.token_usage}")
print(f"Quality Score: {response.quality_score}")
Managed Execution with Streaming (v2.1.0+)
Same as managed execution, but tokens stream in real-time — ideal for chatbot UIs.
from variably import VariablyClient
client = VariablyClient({"api_key": "your-api-key"})
stream = client.evaluate_prompt_stream(
experiment_key="rag-prompt-experiment",
user_context={"user_id": "user-123"},
input_variables={"query": "What are the symptoms of Type 2 diabetes?"},
)
# Tokens arrive one-by-one for real-time display
for token in stream:
print(token, end="", flush=True)
print() # newline after stream ends
# After iteration, metadata is available (token usage, latency, quality score)
meta = stream.metadata
if meta:
print(f"Model: {meta.model}, Latency: {meta.latency_ms}ms")
print(f"Tokens: {meta.token_usage}")
Context-Aware Evaluation (Better RAG Quality) — v2.2.0+
For RAG chatbots, passing conversation history and retrieved chunks enables groundedness scoring, hallucination detection, and conversational coherence — dimensions that are impossible to evaluate in isolation.
The evaluation_context parameter is not sent to the LLM — it's only used by Variably's evaluator for richer scoring.
# Step 1: Collect conversation history from your session
workflow_history = [
{"role": "user", "content": "What causes diabetes?"},
{"role": "assistant", "content": "Key factors include genetics, diet..."},
{"role": "user", "content": "What about potatoes?"},
]
# Step 2: Collect retrieved RAG chunks (after your retrieval step)
reference_materials = [
{
"id": "chunk-001",
"content": "Unhealthy diets high in refined sugars, fats...",
"source": "Kenya National Clinical Guidelines",
"type": "chunk",
"relevance_score": 0.89,
},
{
"id": "chunk-002",
"content": "Modifiable risk factors include obesity...",
"source": "Kenya National Clinical Guidelines",
"type": "chunk",
"relevance_score": 0.82,
},
]
# Step 3: Pass evaluation_context in your evaluate call
response = client.evaluate_prompt(
experiment_key="rag-prompt-experiment",
user_context={"user_id": "user-123"},
input_variables={"query": "What about potatoes?", "context": context_text},
evaluation_mode="full",
evaluation_context={
"reference_materials": reference_materials,
"workflow_history": workflow_history,
"retrieval_query": "potato consumption glycemic index diabetes risk",
},
)
# Same works with streaming
stream = client.evaluate_prompt_stream(
experiment_key="rag-prompt-experiment",
user_context={"user_id": "user-123"},
input_variables={"query": "What about potatoes?", "context": context_text},
evaluation_context={
"reference_materials": reference_materials,
"workflow_history": workflow_history,
},
)
for token in stream:
print(token, end="", flush=True)
What this enables:
| Dimension | Description | Requires |
|---|---|---|
faithfulness |
% of claims grounded in retrieved chunks | reference_materials |
hallucination_rate |
% of claims with no source in context | reference_materials |
context_utilization |
% of relevant chunks actually used | reference_materials |
attribution_accuracy |
Do citations map to correct chunks? | reference_materials |
conversation_consistency |
No contradictions with prior turns | workflow_history |
context_retention |
Maintains topic awareness across turns | workflow_history |
transparency |
Discloses when going beyond source material | reference_materials |
BYOR mode also supports evaluation_context — pass it in submit_response():
result = client.submit_response(
experiment_key="my-experiment",
variant_key=variant.variant_key,
executed_prompt=prompt,
response=llm_response,
user_context=user_context,
input_variables=input_variables,
provider=variant.provider,
model=variant.model,
latency_ms=latency,
evaluation_context={
"reference_materials": reference_materials,
"workflow_history": workflow_history,
},
)
evaluation_context Schema
| Field | Type | Description |
|---|---|---|
reference_materials |
list[dict] |
RAG chunks / source documents for groundedness scoring |
reference_materials[].id |
str |
Unique chunk identifier |
reference_materials[].content |
str |
Chunk text content |
reference_materials[].source |
str (optional) |
Source document URL or name |
reference_materials[].type |
str (optional) |
e.g. "chunk", "document" |
reference_materials[].relevance_score |
float (optional) |
Retriever similarity score |
workflow_history |
list[dict] |
Conversation turns for coherence scoring |
workflow_history[].role |
str |
"user" or "assistant" |
workflow_history[].content |
str |
Message content |
retrieval_query |
str (optional) |
The rewritten query sent to the retriever |
See Context-Aware RAG Evaluation for the full concept doc with architecture diagrams and integration examples.
Integration with LangGraph / FastAPI streaming
from fastapi.responses import StreamingResponse
async def stream_with_variably(query: str, session_id: str):
"""Yield NDJSON events from Variably streaming evaluation."""
stream = client.evaluate_prompt_stream(
experiment_key="my-experiment",
user_context={"user_id": session_id},
input_variables={"query": query},
)
for token in stream:
yield json.dumps({"type": "token", "content": token}) + "\n"
# Send final metadata
if stream.metadata:
yield json.dumps({
"type": "stream_end",
"content": stream.metadata.content,
}) + "\n"
@app.post("/api/chat")
async def chat(request: ChatRequest):
return StreamingResponse(
stream_with_variably(request.message, request.session_id),
media_type="application/x-ndjson",
)
Backend API: SSE Streaming Endpoint
The streaming endpoint uses Server-Sent Events (SSE). Here's the raw API:
Endpoint: POST /api/v1/internal/sdk/prompt-experiments/evaluate-stream
Headers:
X-API-Key: your-api-key
Content-Type: application/json
Request body (same as non-streaming evaluate):
{
"experiment_key": "rag-prompt-experiment",
"user_context": {
"userId": "user-123",
"sessionId": "sess-456"
},
"input_variables": {
"query": "What are the symptoms of Type 2 diabetes?"
},
"evaluation_context": {
"reference_materials": [{"id": "chunk-1", "content": "...", "source": "...", "type": "chunk"}],
"workflow_history": [{"role": "user", "content": "..."}],
"retrieval_query": "diabetes symptoms type 2"
}
}
Response (SSE stream):
event: token
data: {"content": "Type"}
event: token
data: {"content": " 2"}
event: token
data: {"content": " diabetes"}
event: token
data: {"content": " symptoms"}
event: token
data: {"content": " include..."}
event: metadata
data: {"experiment_id": "exp-123", "variant_id": "variant-a", "execution_id": "eval-789", "provider": "anthropic", "model": "claude-3-5-haiku-20241022", "prompt_tokens": 150, "completion_tokens": 85, "total_tokens": 235, "cost_usd": 0.000425, "latency_ms": 1250}
event: done
data: {}
curl example:
curl -N -X POST http://localhost:8080/api/v1/internal/sdk/prompt-experiments/evaluate-stream \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"experiment_key": "rag-prompt-experiment",
"user_context": {"userId": "user-123", "sessionId": "sess-456"},
"input_variables": {"query": "What are the symptoms of Type 2 diabetes?"}
}'
Error handling: If an error occurs during streaming, an error event is sent:
event: error
data: {"message": "LLM generation failed: rate limit exceeded"}
Configuration
from variably import VariablyConfig, VariablyClient
config = VariablyConfig(
api_key="your-api-key",
base_url="https://api.variably.com", # default: http://localhost:8080
environment="production", # default: development
timeout=5000, # timeout in milliseconds, default: 5000
retry_attempts=3, # default: 3
enable_analytics=True, # default: True
cache={
"ttl": 300, # TTL in seconds, default: 300 (5 minutes)
"max_size": 1000, # default: 1000
"enabled": True # default: True
},
log_level="INFO" # DEBUG, INFO, WARNING, ERROR
)
client = VariablyClient(config)
Advanced Usage
Environment Variables
You can create a client using environment variables:
from variably import create_client_from_env
# Uses these environment variables:
# VARIABLY_API_KEY (required)
# VARIABLY_BASE_URL
# VARIABLY_ENVIRONMENT
# VARIABLY_TIMEOUT
# VARIABLY_RETRY_ATTEMPTS
# VARIABLY_ENABLE_ANALYTICS
# VARIABLY_LOG_LEVEL
client = create_client_from_env()
Different Flag Types
# Boolean flags
bool_value = client.evaluate_flag_bool("feature-enabled", False, user_context)
# String flags
string_value = client.evaluate_flag_string("theme", "light", user_context)
# Number flags
number_value = client.evaluate_flag_number("max-items", 10, user_context)
# JSON flags
json_value = client.evaluate_flag_json("config", {"timeout": 5000}, user_context)
# Get full evaluation details
result = client.evaluate_flag("feature-flag", "default", user_context)
print(f"Value: {result.value}, Reason: {result.reason}, Cache Hit: {result.cache_hit}")
Batch Evaluation
flags = client.evaluate_flags([
"feature-a",
"feature-b",
"feature-c"
], user_context)
print(flags["feature-a"].value)
Event Tracking
from datetime import datetime
# Single event
client.track({
"name": "purchase_completed",
"user_id": "user-123",
"properties": {
"amount": 99.99,
"currency": "USD",
"items": ["item-1", "item-2"]
},
"timestamp": datetime.utcnow() # optional, auto-generated if not provided
})
# Batch events
client.track_batch([
{"name": "page_view", "user_id": "user-123", "properties": {"page": "/home"}},
{"name": "button_click", "user_id": "user-123", "properties": {"button": "cta"}}
])
Cache Management
# Clear cache
client.clear_cache()
# Get cache stats
stats = client.cache.get_stats()
print(stats) # {"size": 10, "max_size": 1000, "enabled": True, "ttl": 300}
Metrics
# Get SDK metrics
metrics = client.get_metrics()
print(metrics)
# {
# "api_calls": 25,
# "cache_hits": 15,
# "cache_misses": 10,
# "errors": 1,
# "average_latency": 45.2,
# "cache_hit_rate": 0.6,
# "error_rate": 0.04,
# "flags_evaluated": 20,
# "gates_evaluated": 5,
# "events_tracked": 12,
# "start_time": "2023-10-01T12:00:00Z",
# "uptime_seconds": 3600
# }
Context Manager
# Use with context manager for automatic cleanup
with VariablyClient({"api_key": "your-api-key"}) as client:
result = client.evaluate_flag_bool("feature", False, user_context)
# client.close() is called automatically
Custom Logger
from variably import VariablyClient, create_logger
# Create custom logger
logger = create_logger(
name="my-app",
level="DEBUG",
structured=True, # JSON logging
silent=False
)
# Client will use the custom logger
client = VariablyClient({
"api_key": "your-api-key",
"log_level": "DEBUG"
})
Error Handling
from variably import (
VariablyError,
NetworkError,
AuthenticationError,
ValidationError,
RateLimitError,
TimeoutError,
ConfigurationError
)
try:
result = client.evaluate_flag("my-flag", False, user_context)
except AuthenticationError:
print("Invalid API key")
except NetworkError as e:
print(f"Network error: {e.status_code}")
except ValidationError as e:
print(f"Validation error in field: {e.field}")
except RateLimitError as e:
print(f"Rate limited, retry after {e.retry_after} seconds")
except TimeoutError:
print("Request timed out")
except ConfigurationError as e:
print(f"Configuration error in parameter: {e.parameter}")
except VariablyError as e:
print(f"Variably SDK error: {e}")
Type Hints
The SDK includes full type hints for better IDE support:
from typing import Dict, Any
from variably import VariablyClient, UserContext, FlagResult
user_context: UserContext = {
"user_id": "user-123",
"email": "user@example.com",
"attributes": {
"plan": "premium",
"signup_date": "2023-01-01"
}
}
result: FlagResult = client.evaluate_flag("feature", False, user_context)
Async Support
For async applications, you can wrap the synchronous client:
import asyncio
from concurrent.futures import ThreadPoolExecutor
from variably import VariablyClient
class AsyncVariablyClient:
def __init__(self, config):
self.client = VariablyClient(config)
self.executor = ThreadPoolExecutor(max_workers=4)
async def evaluate_flag_bool(self, flag_key, default_value, user_context):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
self.executor,
self.client.evaluate_flag_bool,
flag_key, default_value, user_context
)
async def close(self):
self.client.close()
self.executor.shutdown(wait=True)
# Usage
async def main():
client = AsyncVariablyClient({"api_key": "your-api-key"})
result = await client.evaluate_flag_bool("feature", False, {
"user_id": "user-123"
})
await client.close()
asyncio.run(main())
Development
Setup
# Install development dependencies
pip install -e ".[dev]"
Testing
pytest
Code Quality
# Format code
black src/ tests/
# Sort imports
isort src/ tests/
# Lint
flake8 src/ tests/
# Type check
mypy src/
Publishing to PyPI
Prerequisites
- Create a PyPI account at https://pypi.org/account/register/
- Generate an API token at https://pypi.org/manage/account/token/
- Scope: select "Entire account" for first upload, or project-specific after that
- Install build tools:
pip3 install build twine
Note:
buildandtwineinstall to user site-packages and may not be on your PATH. Always usepython3 -m buildandpython3 -m twineinstead of barebuild/twine.
Configure PyPI credentials
Create ~/.pypirc:
[distutils]
index-servers = pypi
[pypi]
username = __token__
password = pypi-YOUR_API_TOKEN_HERE
Secure the file:
chmod 600 ~/.pypirc
Build and publish
The version in the build output (e.g., variably_sdk-2.0.0-py3-none-any.whl) comes directly from pyproject.toml's version field. PyPI rejects re-uploads of the same version — you must bump the version to publish again.
# 1. Clean previous builds
rm -rf dist/ build/ src/*.egg-info
# 2. Build sdist and wheel
python3 -m build
# 3. Verify the package (optional but recommended)
python3 -m twine check dist/*
# 4. Upload to TestPyPI first (optional, for dry-run)
python3 -m twine upload --repository testpypi dist/*
# 5. Upload to PyPI
python3 -m twine upload dist/*
Verify the published package
pip3 install variably-sdk==2.1.0
python3 -c "from variably import VariablyClient, PromptVariant; print('OK')"
Version bumping checklist
When releasing a new version, update these three files then clean-build-publish:
src/variably/version.py—__version__pyproject.toml—versionsrc/variably/http_client.py—User-Agentheader string
# Example: bumping from 2.0.0 to 2.0.1
# After updating the 3 files above:
rm -rf dist/ build/ src/*.egg-info
python3 -m build
python3 -m twine upload dist/*
Requirements
- Python 3.7+
- requests >= 2.25.0
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file variably_sdk-2.7.0.tar.gz.
File metadata
- Download URL: variably_sdk-2.7.0.tar.gz
- Upload date:
- Size: 35.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90ae10420e543491fc5521f507a002c41db1acde8d06dd7421b4018881d36035
|
|
| MD5 |
848a4ec79a84758dbffe9da6d027aea7
|
|
| BLAKE2b-256 |
4ac6b75c3d858e8d1506850827f763b26496198cd46f25bb9105b59e4be11f53
|
File details
Details for the file variably_sdk-2.7.0-py3-none-any.whl.
File metadata
- Download URL: variably_sdk-2.7.0-py3-none-any.whl
- Upload date:
- Size: 28.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de949d480590e8b8e7c7a1d7228b710ca3cdc65e6721e6f314486e48bd398b2b
|
|
| MD5 |
0628f93770a45ff036aac09b6ca8d009
|
|
| BLAKE2b-256 |
8f4744ac42bad79c44aae778d6ab33bc4915c4554fa6cb372b16dbd690cd8da2
|