Skip to main content

Official Python SDK for Variably feature flags, LLM experimentation, and prompt optimization platform

Project description

Variably Python SDK

Official Python SDK for Variably — feature flags, LLM experimentation, and prompt optimization.

Installation

pip install variably-sdk

Quick Start

from variably import VariablyClient

# Initialize the client
client = VariablyClient({
    "api_key": "your-api-key",
    "base_url": "https://api.variably.com",  # optional, defaults to localhost:8080
    "environment": "production"  # optional
})

# Evaluate a boolean feature flag
user_context = {
    "user_id": "user-123",
    "email": "user@example.com",
    "country": "US"
}

is_feature_enabled = client.evaluate_flag_bool(
    "new-checkout-flow",
    False,  # default value
    user_context
)

if is_feature_enabled:
    # Show new checkout flow
    pass

# Evaluate a feature gate
has_access = client.evaluate_gate("premium-features", user_context)

# Track events
client.track({
    "name": "button_clicked",
    "user_id": "user-123",
    "properties": {
        "button_name": "checkout",
        "page": "product-detail"
    }
})

# Clean up resources
client.close()

Prompt Experimentation

Variably provides two modes for LLM prompt experimentation:

BYOR (Bring Your Own Runtime)

You call your own LLM. Variably handles variant allocation and 41-dimensional evaluation.

from variably import VariablyClient
import time

client = VariablyClient({"api_key": "your-api-key"})

user_context = {"user_id": "user-123"}
input_variables = {"query": "What are the symptoms of Type 2 diabetes?"}

# Step 1: Get the allocated variant
variant = client.get_variant("rag-prompt-experiment", user_context, input_variables)
print(f"Variant: {variant.variant_key}, Model: {variant.model}")

# Step 2: Call your LLM with the variant's prompt template
prompt = variant.prompt_template.format(**input_variables)
start = time.time()
llm_response = call_your_llm(prompt, model=variant.model)  # your LLM call
latency = int((time.time() - start) * 1000)

# Step 3: Submit the response for 41-dimensional evaluation
result = client.submit_response(
    experiment_key="rag-prompt-experiment",
    variant_key=variant.variant_key,
    executed_prompt=prompt,
    response=llm_response,
    user_context=user_context,
    input_variables=input_variables,
    provider=variant.provider,
    model=variant.model,
    latency_ms=latency,
)
print(f"Submitted: {result.status}")

Managed Execution

Variably selects the variant, calls the LLM, and evaluates — all in one call.

response = client.evaluate_prompt(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What are the symptoms of Type 2 diabetes?"},
    evaluation_mode="full",  # "full" | "fast"
)

print(f"Content: {response.content}")
print(f"Model: {response.model}, Latency: {response.latency_ms}ms")
print(f"Tokens: {response.token_usage}")
print(f"Quality Score: {response.quality_score}")

Managed Execution with Streaming (v2.1.0+)

Same as managed execution, but tokens stream in real-time — ideal for chatbot UIs.

from variably import VariablyClient

client = VariablyClient({"api_key": "your-api-key"})

stream = client.evaluate_prompt_stream(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What are the symptoms of Type 2 diabetes?"},
)

# Tokens arrive one-by-one for real-time display
for token in stream:
    print(token, end="", flush=True)

print()  # newline after stream ends

# After iteration, metadata is available (token usage, latency, quality score)
meta = stream.metadata
if meta:
    print(f"Model: {meta.model}, Latency: {meta.latency_ms}ms")
    print(f"Tokens: {meta.token_usage}")

Context-Aware Evaluation (Better RAG Quality) — v2.2.0+

For RAG chatbots, passing conversation history and retrieved chunks enables groundedness scoring, hallucination detection, and conversational coherence — dimensions that are impossible to evaluate in isolation.

The evaluation_context parameter is not sent to the LLM — it's only used by Variably's evaluator for richer scoring.

# Step 1: Collect conversation history from your session
workflow_history = [
    {"role": "user", "content": "What causes diabetes?"},
    {"role": "assistant", "content": "Key factors include genetics, diet..."},
    {"role": "user", "content": "What about potatoes?"},
]

# Step 2: Collect retrieved RAG chunks (after your retrieval step)
reference_materials = [
    {
        "id": "chunk-001",
        "content": "Unhealthy diets high in refined sugars, fats...",
        "source": "Kenya National Clinical Guidelines",
        "type": "chunk",
        "relevance_score": 0.89,
    },
    {
        "id": "chunk-002",
        "content": "Modifiable risk factors include obesity...",
        "source": "Kenya National Clinical Guidelines",
        "type": "chunk",
        "relevance_score": 0.82,
    },
]

# Step 3: Pass evaluation_context in your evaluate call
response = client.evaluate_prompt(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What about potatoes?", "context": context_text},
    evaluation_mode="full",
    evaluation_context={
        "reference_materials": reference_materials,
        "workflow_history": workflow_history,
        "retrieval_query": "potato consumption glycemic index diabetes risk",
    },
)

# Same works with streaming
stream = client.evaluate_prompt_stream(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What about potatoes?", "context": context_text},
    evaluation_context={
        "reference_materials": reference_materials,
        "workflow_history": workflow_history,
    },
)
for token in stream:
    print(token, end="", flush=True)

What this enables:

Dimension Description Requires
faithfulness % of claims grounded in retrieved chunks reference_materials
hallucination_rate % of claims with no source in context reference_materials
context_utilization % of relevant chunks actually used reference_materials
attribution_accuracy Do citations map to correct chunks? reference_materials
conversation_consistency No contradictions with prior turns workflow_history
context_retention Maintains topic awareness across turns workflow_history
transparency Discloses when going beyond source material reference_materials

BYOR mode also supports evaluation_context — pass it in submit_response():

result = client.submit_response(
    experiment_key="my-experiment",
    variant_key=variant.variant_key,
    executed_prompt=prompt,
    response=llm_response,
    user_context=user_context,
    input_variables=input_variables,
    provider=variant.provider,
    model=variant.model,
    latency_ms=latency,
    evaluation_context={
        "reference_materials": reference_materials,
        "workflow_history": workflow_history,
    },
)

evaluation_context Schema

Field Type Description
reference_materials list[dict] RAG chunks / source documents for groundedness scoring
reference_materials[].id str Unique chunk identifier
reference_materials[].content str Chunk text content
reference_materials[].source str (optional) Source document URL or name
reference_materials[].type str (optional) e.g. "chunk", "document"
reference_materials[].relevance_score float (optional) Retriever similarity score
workflow_history list[dict] Conversation turns for coherence scoring
workflow_history[].role str "user" or "assistant"
workflow_history[].content str Message content
retrieval_query str (optional) The rewritten query sent to the retriever

See Context-Aware RAG Evaluation for the full concept doc with architecture diagrams and integration examples.

Integration with LangGraph / FastAPI streaming

from fastapi.responses import StreamingResponse

async def stream_with_variably(query: str, session_id: str):
    """Yield NDJSON events from Variably streaming evaluation."""
    stream = client.evaluate_prompt_stream(
        experiment_key="my-experiment",
        user_context={"user_id": session_id},
        input_variables={"query": query},
    )

    for token in stream:
        yield json.dumps({"type": "token", "content": token}) + "\n"

    # Send final metadata
    if stream.metadata:
        yield json.dumps({
            "type": "stream_end",
            "content": stream.metadata.content,
        }) + "\n"

@app.post("/api/chat")
async def chat(request: ChatRequest):
    return StreamingResponse(
        stream_with_variably(request.message, request.session_id),
        media_type="application/x-ndjson",
    )

Backend API: SSE Streaming Endpoint

The streaming endpoint uses Server-Sent Events (SSE). Here's the raw API:

Endpoint: POST /api/v1/internal/sdk/prompt-experiments/evaluate-stream

Headers:

X-API-Key: your-api-key
Content-Type: application/json

Request body (same as non-streaming evaluate):

{
  "experiment_key": "rag-prompt-experiment",
  "user_context": {
    "userId": "user-123",
    "sessionId": "sess-456"
  },
  "input_variables": {
    "query": "What are the symptoms of Type 2 diabetes?"
  },
  "evaluation_context": {
    "reference_materials": [{"id": "chunk-1", "content": "...", "source": "...", "type": "chunk"}],
    "workflow_history": [{"role": "user", "content": "..."}],
    "retrieval_query": "diabetes symptoms type 2"
  }
}

Response (SSE stream):

event: token
data: {"content": "Type"}

event: token
data: {"content": " 2"}

event: token
data: {"content": " diabetes"}

event: token
data: {"content": " symptoms"}

event: token
data: {"content": " include..."}

event: metadata
data: {"experiment_id": "exp-123", "variant_id": "variant-a", "execution_id": "eval-789", "provider": "anthropic", "model": "claude-3-5-haiku-20241022", "prompt_tokens": 150, "completion_tokens": 85, "total_tokens": 235, "cost_usd": 0.000425, "latency_ms": 1250}

event: done
data: {}

curl example:

curl -N -X POST http://localhost:8080/api/v1/internal/sdk/prompt-experiments/evaluate-stream \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "experiment_key": "rag-prompt-experiment",
    "user_context": {"userId": "user-123", "sessionId": "sess-456"},
    "input_variables": {"query": "What are the symptoms of Type 2 diabetes?"}
  }'

Error handling: If an error occurs during streaming, an error event is sent:

event: error
data: {"message": "LLM generation failed: rate limit exceeded"}

Configuration

from variably import VariablyConfig, VariablyClient

config = VariablyConfig(
    api_key="your-api-key",
    base_url="https://api.variably.com",  # default: http://localhost:8080
    environment="production",  # default: development
    timeout=5000,  # timeout in milliseconds, default: 5000
    retry_attempts=3,  # default: 3
    enable_analytics=True,  # default: True
    cache={
        "ttl": 300,  # TTL in seconds, default: 300 (5 minutes)
        "max_size": 1000,  # default: 1000
        "enabled": True  # default: True
    },
    log_level="INFO"  # DEBUG, INFO, WARNING, ERROR
)

client = VariablyClient(config)

Advanced Usage

Environment Variables

You can create a client using environment variables:

from variably import create_client_from_env

# Uses these environment variables:
# VARIABLY_API_KEY (required)
# VARIABLY_BASE_URL
# VARIABLY_ENVIRONMENT
# VARIABLY_TIMEOUT
# VARIABLY_RETRY_ATTEMPTS
# VARIABLY_ENABLE_ANALYTICS
# VARIABLY_LOG_LEVEL

client = create_client_from_env()

Different Flag Types

# Boolean flags
bool_value = client.evaluate_flag_bool("feature-enabled", False, user_context)

# String flags
string_value = client.evaluate_flag_string("theme", "light", user_context)

# Number flags
number_value = client.evaluate_flag_number("max-items", 10, user_context)

# JSON flags
json_value = client.evaluate_flag_json("config", {"timeout": 5000}, user_context)

# Get full evaluation details
result = client.evaluate_flag("feature-flag", "default", user_context)
print(f"Value: {result.value}, Reason: {result.reason}, Cache Hit: {result.cache_hit}")

Batch Evaluation

flags = client.evaluate_flags([
    "feature-a",
    "feature-b", 
    "feature-c"
], user_context)

print(flags["feature-a"].value)

Event Tracking

from datetime import datetime

# Single event
client.track({
    "name": "purchase_completed",
    "user_id": "user-123",
    "properties": {
        "amount": 99.99,
        "currency": "USD",
        "items": ["item-1", "item-2"]
    },
    "timestamp": datetime.utcnow()  # optional, auto-generated if not provided
})

# Batch events
client.track_batch([
    {"name": "page_view", "user_id": "user-123", "properties": {"page": "/home"}},
    {"name": "button_click", "user_id": "user-123", "properties": {"button": "cta"}}
])

Cache Management

# Clear cache
client.clear_cache()

# Get cache stats
stats = client.cache.get_stats()
print(stats)  # {"size": 10, "max_size": 1000, "enabled": True, "ttl": 300}

Metrics

# Get SDK metrics
metrics = client.get_metrics()
print(metrics)
# {
#     "api_calls": 25,
#     "cache_hits": 15,
#     "cache_misses": 10,
#     "errors": 1,
#     "average_latency": 45.2,
#     "cache_hit_rate": 0.6,
#     "error_rate": 0.04,
#     "flags_evaluated": 20,
#     "gates_evaluated": 5,
#     "events_tracked": 12,
#     "start_time": "2023-10-01T12:00:00Z",
#     "uptime_seconds": 3600
# }

Context Manager

# Use with context manager for automatic cleanup
with VariablyClient({"api_key": "your-api-key"}) as client:
    result = client.evaluate_flag_bool("feature", False, user_context)
    # client.close() is called automatically

Custom Logger

from variably import VariablyClient, create_logger

# Create custom logger
logger = create_logger(
    name="my-app",
    level="DEBUG",
    structured=True,  # JSON logging
    silent=False
)

# Client will use the custom logger
client = VariablyClient({
    "api_key": "your-api-key",
    "log_level": "DEBUG"
})

Error Handling

from variably import (
    VariablyError,
    NetworkError,
    AuthenticationError,
    ValidationError,
    RateLimitError,
    TimeoutError,
    ConfigurationError
)

try:
    result = client.evaluate_flag("my-flag", False, user_context)
except AuthenticationError:
    print("Invalid API key")
except NetworkError as e:
    print(f"Network error: {e.status_code}")
except ValidationError as e:
    print(f"Validation error in field: {e.field}")
except RateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after} seconds")
except TimeoutError:
    print("Request timed out")
except ConfigurationError as e:
    print(f"Configuration error in parameter: {e.parameter}")
except VariablyError as e:
    print(f"Variably SDK error: {e}")

Type Hints

The SDK includes full type hints for better IDE support:

from typing import Dict, Any
from variably import VariablyClient, UserContext, FlagResult

user_context: UserContext = {
    "user_id": "user-123",
    "email": "user@example.com",
    "attributes": {
        "plan": "premium",
        "signup_date": "2023-01-01"
    }
}

result: FlagResult = client.evaluate_flag("feature", False, user_context)

Async Support

For async applications, you can wrap the synchronous client:

import asyncio
from concurrent.futures import ThreadPoolExecutor
from variably import VariablyClient

class AsyncVariablyClient:
    def __init__(self, config):
        self.client = VariablyClient(config)
        self.executor = ThreadPoolExecutor(max_workers=4)
    
    async def evaluate_flag_bool(self, flag_key, default_value, user_context):
        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            self.executor,
            self.client.evaluate_flag_bool,
            flag_key, default_value, user_context
        )
    
    async def close(self):
        self.client.close()
        self.executor.shutdown(wait=True)

# Usage
async def main():
    client = AsyncVariablyClient({"api_key": "your-api-key"})
    
    result = await client.evaluate_flag_bool("feature", False, {
        "user_id": "user-123"
    })
    
    await client.close()

asyncio.run(main())

Development

Setup

# Install development dependencies
pip install -e ".[dev]"

Testing

pytest

Code Quality

# Format code
black src/ tests/

# Sort imports
isort src/ tests/

# Lint
flake8 src/ tests/

# Type check
mypy src/

Publishing to PyPI

Prerequisites

  1. Create a PyPI account at https://pypi.org/account/register/
  2. Generate an API token at https://pypi.org/manage/account/token/
    • Scope: select "Entire account" for first upload, or project-specific after that
  3. Install build tools:
    pip3 install build twine
    

Note: build and twine install to user site-packages and may not be on your PATH. Always use python3 -m build and python3 -m twine instead of bare build/twine.

Configure PyPI credentials

Create ~/.pypirc:

[distutils]
index-servers = pypi

[pypi]
username = __token__
password = pypi-YOUR_API_TOKEN_HERE

Secure the file:

chmod 600 ~/.pypirc

Build and publish

The version in the build output (e.g., variably_sdk-2.0.0-py3-none-any.whl) comes directly from pyproject.toml's version field. PyPI rejects re-uploads of the same version — you must bump the version to publish again.

# 1. Clean previous builds
rm -rf dist/ build/ src/*.egg-info

# 2. Build sdist and wheel
python3 -m build

# 3. Verify the package (optional but recommended)
python3 -m twine check dist/*

# 4. Upload to TestPyPI first (optional, for dry-run)
python3 -m twine upload --repository testpypi dist/*

# 5. Upload to PyPI
python3 -m twine upload dist/*

Verify the published package

pip3 install variably-sdk==2.1.0
python3 -c "from variably import VariablyClient, PromptVariant; print('OK')"

Version bumping checklist

When releasing a new version, update these three files then clean-build-publish:

  1. src/variably/version.py__version__
  2. pyproject.tomlversion
  3. src/variably/http_client.pyUser-Agent header string
# Example: bumping from 2.0.0 to 2.0.1
# After updating the 3 files above:
rm -rf dist/ build/ src/*.egg-info
python3 -m build
python3 -m twine upload dist/*

Requirements

  • Python 3.7+
  • requests >= 2.25.0

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

variably_sdk-2.5.0.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

variably_sdk-2.5.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file variably_sdk-2.5.0.tar.gz.

File metadata

  • Download URL: variably_sdk-2.5.0.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for variably_sdk-2.5.0.tar.gz
Algorithm Hash digest
SHA256 68a8f7560abe0d2a166fc8c794c634cc1a3c75550239debfeba86e16c7d300c0
MD5 017e8981c84fd4d8475143014cef1fdb
BLAKE2b-256 26486ab8f3b3393661582587972755137c5fcad4eeb8b5f4dba04d541180c513

See more details on using hashes here.

File details

Details for the file variably_sdk-2.5.0-py3-none-any.whl.

File metadata

  • Download URL: variably_sdk-2.5.0-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for variably_sdk-2.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e079848678c1cc71cd49ec36f79010a4ea7c275687b3650303293dceaab7ad50
MD5 1cadea289fe33576a86e752752c4aab4
BLAKE2b-256 49f0cad7f6c8bd2a9d790fbcc799602501e30f25d816bc7debed75b3b3bd143b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page