Langfuse v4 tracing for Google Gemini and Anthropic Claude with automatic token counting, dynamic pricing, and zero-boilerplate auto-tracing
Project description
Langfuse Custom Tracer
Langfuse v4 tracing for Google Gemini and Anthropic Claude with automatic cost tracking
๐ฏ What is This?
A lightweight Python library that adds observability and cost tracking to your LLM applications using Langfuse.
- Automatic token counting for all supported LLM providers
- Dynamic cost calculation with real-time pricing from remote JSON (no redeploy needed!)
- Nested trace visualization in Langfuse
- Simple context manager API built on OpenTelemetry
- Zero setup - works with just API keys
- TTL-based caching for optimal performance
- Graceful degradation - never crashes on network failures
๐ Quick Start
1. Install
# Basic installation
pip install langfuse-custom-tracer
# With environment variable support
pip install langfuse-custom-tracer[env]
# With Gemini support
pip install langfuse-custom-tracer[gemini]
# With Anthropic support
pip install langfuse-custom-tracer[anthropic]
# Everything (all providers)
pip install langfuse-custom-tracer[all]
2. Get API Keys
- Langfuse: Sign up at cloud.langfuse.com
- Gemini: Get key from ai.google.dev (optional)
- Anthropic: Get key from console.anthropic.com (optional)
3. Set Environment Variables
Create a .env file:
# Langfuse (get from your dashboard)
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
# Gemini API (optional)
GEMINI_API_KEY=...
# Anthropic API (optional)
ANTHROPIC_API_KEY=...
4. Use It (Gemini Example)
import os
from langfuse_custom_tracer import load_env, create_langfuse_client, GeminiTracer
import google.generativeai as genai
# Load environment variables
load_env()
# Initialize
lf = create_langfuse_client(
os.getenv("LANGFUSE_SECRET_KEY"),
os.getenv("LANGFUSE_PUBLIC_KEY")
)
tracer = GeminiTracer(lf)
# Configure Gemini
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash")
# Use with tracing
with tracer.trace("invoice-processing", input={"file": "invoice.pdf"}) as span:
with tracer.generation("extract-data", model="gemini-2.0-flash",
input="Extract name, amount, date") as gen:
response = model.generate_content("Extract name, amount, date from invoice")
usage = tracer.extract_usage(response, model="gemini-2.0-flash")
# usage now includes pricing_source and pricing_version (automatically tracked)
gen.update(output=response.text, usage_details=usage)
span.update(output="Extraction complete")
tracer.flush() # Send to Langfuse
4b. Use It (Anthropic Example)
import os
from langfuse_custom_tracer import load_env, create_langfuse_client, AnthropicTracer
from anthropic import Anthropic
# Load environment variables
load_env()
# Initialize
lf = create_langfuse_client(
os.getenv("LANGFUSE_SECRET_KEY"),
os.getenv("LANGFUSE_PUBLIC_KEY")
)
tracer = AnthropicTracer(lf)
# Create Anthropic client
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
# Use with tracing
with tracer.trace("invoice-processing", input={"file": "invoice.pdf"}) as span:
with tracer.generation("extract-data", model="claude-3-5-sonnet-20241022",
input="Extract name, amount, date") as gen:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Extract name, amount, date from invoice"}]
)
usage = tracer.extract_usage(response, model="claude-3-5-sonnet-20241022")
# usage now includes pricing_source and pricing_version (automatically tracked)
gen.update(output=response.content[0].text, usage_details=usage)
span.update(output="Extraction complete")
tracer.flush() # Send to Langfuse
๐ What You'll See in Langfuse
Dashboard View
๐ Trace: invoice-processing (ID: trace-123)
โโ โฑ Duration: 2.3s
โโ ๐ค User: (none set)
โโ ๐ท๏ธ Tags: [production, batch]
โ
โโ๐ Generation: extract-data
โโ Model: gemini-2.0-flash
โโ Status: โ
Success
โโ Tokens: Input 156 | Output 89 | Total 245
โโ Cost: $0.000287 (auto-calculated from dynamic pricing)
โ โโ Input: $0.000234 (156 tokens @ $0.15/1M)
โ โโ Output: $0.000053 (89 tokens @ $0.60/1M)
โ โโ Pricing Source: json
โ โโ Pricing Version: 2026-04-22-v1
โโ Latency: 1.8s
โโ Output: "Name: John Doe, Amount: $500, Date: 2025-03-31"
Cost Aggregation
All calls are automatically aggregated on the dashboard:
- Total tokens: 245,300 across all traces
- Total cost: $0.18 for the day
- By model: Gemini 2.0 Flash: $0.15, Gemini 1.5 Pro: $0.03
Nested Traces
Langfuse automatically detects nesting via OpenTelemetry context:
with tracer.trace("main-pipeline"): # Parent span
with tracer.trace("step-1"): # Child span 1
with tracer.generation(...): # Grandchild span
...
with tracer.trace("step-2"): # Child span 2
...
Result in Langfuse: Clean hierarchical tree
๐ฎ Full API Reference
create_langfuse_client()
lf = create_langfuse_client(
secret_key="sk-lf-...", # Required
public_key="pk-lf-...", # Required
host="https://cloud.langfuse.com" # Optional, default EU
)
Hosts:
- EU:
https://cloud.langfuse.com(default) - US:
https://us.cloud.langfuse.com
load_env()
Load environment variables from .env file:
from langfuse_custom_tracer import load_env
# Load from .env in current directory
load_env()
# Load from custom file
load_env(".env.production")
Requires python-dotenv. Install with: pip install langfuse-custom-tracer[env]
BaseTracer.trace()
Create a root span (top-level trace):
with tracer.trace(
name="my-pipeline",
input={"file": "data.csv"},
metadata={"version": "1.0"},
user_id="user-123",
session_id="session-456",
tags=["production", "batch"]
) as span:
# Do work here
span.update(output={"rows_processed": 1000})
Parameters:
name(str): Span nameinput(any): Input data (shown in Langfuse)metadata(dict): Custom metadatauser_id(str): User identifiersession_id(str): Session identifiertags(list): String tags for filtering
BaseTracer.generation()
Create a generation span (LLM call):
with tracer.generation(
name="extract",
model="gemini-2.0-flash",
input="Extract data",
metadata={"temperature": 0.7}
) as gen:
response = model.generate_content("Extract data")
usage = tracer.extract_usage(response, model="gemini-2.0-flash")
gen.update(output=response.text, usage_details=usage)
Parameters:
name(str): Generation namemodel(str): Model identifierinput(any): Prompt/inputmetadata(dict): Custom metadata
GeminiTracer.extract_usage() / AnthropicTracer.extract_usage()
Extract token counts and calculate costs:
# Gemini
usage = tracer.extract_usage(
response, # Gemini response object
model="gemini-2.0-flash" # Model name for pricing
)
# Anthropic
usage = tracer.extract_usage(
response, # Anthropic message object
model="claude-3-5-sonnet-20241022" # Model name for pricing
)
# Returns:
# {
# "input": 156, # Prompt tokens
# "output": 89, # Completion tokens
# "total": 245, # Total tokens
# "unit": "TOKENS",
# "inputCost": 0.000234, # Input cost in USD (from dynamic pricing)
# "outputCost": 0.000053, # Output cost in USD (from dynamic pricing)
# "totalCost": 0.000287, # Total cost in USD
# "cachedTokens": 10, # (optional) cached tokens (Gemini & Anthropic)
# "pricing_source": "json", # Where pricing came from (json or default)
# "pricing_version": "2026-04-22-v1" # Pricing version for audit trail
# }
BaseTracer.flush()
Send pending traces to Langfuse (blocking):
tracer.flush() # Wait for all events to be sent
Required for short-lived scripts. Long-running servers batch automatically.
๐ค Auto Tracing (Zero Boilerplate)
NEW! Automatic tracing with just one import. No manual wrapping required!
What is Auto Tracing?
Instead of manually wrapping each LLM call, just import and enable auto-tracing:
from langfuse_custom_tracer import observe
# Enable auto-tracing (patches Gemini & Anthropic SDK globally)
observe()
# Now every LLM call is automatically traced and sent to Langfuse!
import google.generativeai as genai
genai.configure(api_key="...")
model = genai.GenerativeModel("gemini-2.0-flash")
# This is automatically traced (no manual spans needed!)
response = model.generate_content("Hello, world!")
Features
- โ Zero boilerplate - One import, automatic tracing
- โ User tracking - Tag all calls to a specific user
- โ Session grouping - Group related calls together
- โ Score & rate - Attach quality scores after the fact
- โ Async support - Works with asyncio (contextvars-based)
- โ Error handling - Captures failures gracefully
- โ Dynamic pricing - Automatic cost calculation
- โ Latency tracking - Measures wall-clock time
Basic Setup
import os
from langfuse_custom_tracer import load_env, observe
# Load credentials from .env
load_env()
# Enable auto-tracing (must be done before importing LLM SDKs)
observe()
# Now all Gemini and Anthropic calls are automatically traced!
import google.generativeai as genai
from anthropic import Anthropic
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
gemini_model = genai.GenerativeModel("gemini-2.0-flash")
anthropic_client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
# Both calls are automatically traced
gemini_response = gemini_model.generate_content("Who invented Python?")
anthropic_response = anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Who invented Python?"}]
)
User Tracking
Tag all calls to a specific user:
from langfuse_custom_tracer import observe, set_user
observe()
# In a web app, set user at request start
@app.post("/chat")
async def chat(request: ChatRequest, user: User = Depends(get_current_user)):
set_user(user.id) # Tag all subsequent calls to this user
# This call is automatically traced and tagged to user.id
response = await model.generate_content(request.message)
return {"reply": response.text}
In Langfuse, you'll see:
- Users tab: All calls aggregated by user
- Cost per user: Total tokens and estimated cost
- User sessions: All conversations for a specific user
Session Tracking
Group related calls into sessions:
from langfuse_custom_tracer import observe, set_user, set_session
observe()
@app.post("/new-conversation")
async def new_conversation(user: User = Depends(get_current_user)):
set_user(user.id)
session_id = set_session() # Start a new session, get ID back
# All subsequent calls in this context are grouped
response1 = model.generate_content("Question 1")
response2 = model.generate_content("Question 2")
response3 = model.generate_content("Question 3")
# Later, you can retrieve the session ID:
store_session_id(user.id, session_id)
In Langfuse, you'll see:
- Sessions tab: All calls in a conversation grouped together
- Timeline view: See the flow of a multi-turn conversation
- Session metrics: Total cost and tokens per conversation
Scoring & Feedback
Attach quality scores to traces after the call completes:
from langfuse_custom_tracer import observe, set_user, score, get_trace_id
observe()
set_user("user-123")
# Make LLM call
response = model.generate_content("Explain quantum computing")
# Later (even in a different request), score the trace
trace_id = get_trace_id() # Get the trace ID from current context
# User clicks "thumbs up" button
score("thumbs_up", 1.0, trace_id=trace_id, comment="Very helpful!")
# Or score by quality metrics
score("relevance", 0.95, trace_id=trace_id, data_type="NUMERIC")
score("hallucination", False, trace_id=trace_id, data_type="BOOLEAN")
Async Support
Auto-tracing works seamlessly with asyncio:
import asyncio
from langfuse_custom_tracer import observe, set_user
import google.generativeai as genai
observe()
async def process_user_batch(user_id: str, messages: list):
set_user(user_id) # ContextVar isolates per asyncio task
# Each task gets its own user_id
tasks = [
model.generate_content_async(msg)
for msg in messages
]
# All calls tagged to the same user_id (via ContextVar)
results = await asyncio.gather(*tasks)
return results
# Each user gets their own isolated context
asyncio.run(process_user_batch("user-1", ["msg1", "msg2"]))
asyncio.run(process_user_batch("user-2", ["msg3", "msg4"]))
Auto Tracing API Reference
| Function | Purpose |
|---|---|
observe() |
Enable auto-tracing (patches SDK globally) |
set_user(user_id) |
Tag all subsequent calls to a user |
set_session(session_id=None) |
Start a session (auto-generates UUID if not provided) |
end_session() |
Clear current session |
get_trace_id() |
Get trace ID of most recent call |
score(name, value, trace_id=None, comment=None, data_type="NUMERIC") |
Attach score to trace |
What Gets Traced Automatically
Each auto-traced call captures:
- โ Model name
- โ Input prompt
- โ Output response
- โ Token counts (input, output, cached)
- โ Cost (dynamic, from remote pricing)
- โ Latency (wall-clock time in ms)
- โ User ID and session ID
- โ Status (SUCCESS or ERROR)
- โ Pricing source and version
Supported SDKs
| Provider | SDK | Methods Patched | Status |
|---|---|---|---|
| Gemini (Legacy) | google-generativeai |
GenerativeModel.generate_content() |
โ Supported |
| Gemini (New) | google-genai |
Models.generate_content() |
โ Supported |
| Anthropic | anthropic |
Messages.create() |
โ Supported |
๐ง Supported Models
Gemini โ
All Google Gemini models with Q1 2026 pricing:
| Model | Input | Output | Cache |
|---|---|---|---|
| gemini-2.5-pro | $1.25/1M | $10.00/1M | $0.3125/1M |
| gemini-2.0-flash | $0.15/1M | $0.60/1M | $0.0375/1M |
| gemini-2.0-flash-lite | $0.075/1M | $0.30/1M | $0.01875/1M |
| gemini-1.5-pro | $1.25/1M | $5.00/1M | $0.3125/1M |
| gemini-1.5-flash | $0.075/1M | $0.30/1M | $0.01875/1M |
| gemini-1.5-flash-8b | $0.0375/1M | $0.15/1M | $0.01/1M |
Anthropic Claude โ
All Claude models with Q1 2026 pricing (with prompt caching support):
| Model | Input | Output | Cache Read | Cache Write |
|---|---|---|---|---|
| claude-3-5-sonnet-20241022 | $3.00/1M | $15.00/1M | $0.30/1M | $3.75/1M |
| claude-3-5-haiku-20241022 | $0.80/1M | $4.00/1M | $0.08/1M | $1.00/1M |
| claude-3-opus-20250219 | $15.00/1M | $75.00/1M | $1.50/1M | $18.75/1M |
| claude-3-sonnet-20250229 | $3.00/1M | $15.00/1M | $0.30/1M | $3.75/1M |
| claude-3-haiku-20250307 | $0.80/1M | $4.00/1M | $0.08/1M | $1.00/1M |
๐ Project Structure
langfuse-custom-tracer/
โโโ langfuse_custom_tracer/
โ โโโ __init__.py # Package exports
โ โโโ client.py # Langfuse client setup
โ โโโ pricing_manager.py # Dynamic pricing manager (NEW)
โ โโโ auto.py # Automatic tracer patching
โ โโโ context.py # Context management
โ โโโ factory.py # Tracer factory
โ โโโ scoring.py # Cost scoring
โ โโโ tracers/
โ โโโ __init__.py
โ โโโ base.py # BaseTracer (abstract)
โ โโโ gemini.py # GeminiTracer (19 tests, dynamic pricing)
โ โโโ anthropic.py # AnthropicTracer (43 tests, dynamic pricing)
โโโ tests/
โ โโโ conftest.py # Pytest fixtures
โ โโโ test_pricing_manager.py # 19 tests (new)
โ โโโ test_gemini_tracer.py # 19 tests
โ โโโ test_anthropic_tracer.py # 43 tests
โ โโโ test_base_tracer.py # Base tests
โ โโโ test_auto_patch.py # Auto patching tests
โ โโโ test_factory.py # Factory tests
โ โโโ test_client.py # Client tests
โโโ pricing.json # Dynamic pricing data (NEW)
โโโ examples/
โ โโโ env_setup_example.py # Usage example
โโโ SETUP.md # Setup guide
โโโ TESTING.md # Testing guide
โโโ FEATURE_COMPLETE.md # Feature implementation details
โโโ pyproject.toml # Package config
๐งช Testing
81 unit tests with 64% coverage:
# Run all tests
pytest
# Run with coverage report
pytest --cov
# Run specific test
pytest tests/test_gemini_tracer.py::TestGeminiTracer::test_extract_usage_basic -v
# Run Anthropic tests
pytest tests/test_anthropic_tracer.py -v
# Run pricing manager tests
pytest tests/test_pricing_manager.py -v
All tests pass โ
Test Coverage Breakdown:
- PricingManager: 19 tests, 79% coverage (remote pricing fetching, caching, fallback)
- AnthropicTracer: 43 tests, 100% coverage
- GeminiTracer: 19 tests, 76% coverage
- Total: 81 tests, 64% coverage
Execution time: ~1 second All tests passing: โ 81/81
๐ Security
- Never commit
.envfiles - Already in.gitignore - API keys required - Will raise
ImportErrorif missing - HTTPS only - All Langfuse communication encrypted
- No keys in code - Always use environment variables
๏ฟฝ Dynamic Pricing (NEW!)
Major improvement: Pricing is now decoupled from source code and managed via remote JSON!
from langfuse_custom_tracer import get_pricing_manager
# Get pricing for a model
pm = get_pricing_manager()
price, version, source = pm.get_price("gemini-2.5-flash")
print(f"Input: ${price['input']} per 1M tokens (v{version}, from {source})")
Benefits:
- โ Update pricing without redeploying the library
- โ TTL-based caching (default 10 minutes)
- โ Graceful fallback if remote is unavailable
- โ All traces include pricing metadata
- โ Support for 25+ models across Gemini, Claude, GPT
How it works:
- Library fetches pricing from remote JSON (GitHub by default)
- Caches the data for 10 minutes
- If remote is unavailable, uses cached or default pricing
- Every trace includes
pricing_sourceandpricing_version
Custom pricing URL:
pm = get_pricing_manager(url="https://your-domain.com/pricing.json")
# Or via environment variable
# export PRICING_JSON_URL=https://your-domain.com/pricing.json
See FEATURE_COMPLETE.md for full details.
๏ฟฝ๐ Examples
Example 1: Gemini Extraction Task
from langfuse_custom_tracer import create_langfuse_client, GeminiTracer
import google.generativeai as genai
import os
lf = create_langfuse_client(
os.getenv("LANGFUSE_SECRET_KEY"),
os.getenv("LANGFUSE_PUBLIC_KEY")
)
tracer = GeminiTracer(lf)
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash")
# Simple extraction
with tracer.trace("email-analysis") as span:
with tracer.generation("extract", model="gemini-2.0-flash",
input="Extract sender, subject, body") as gen:
response = model.generate_content(
"From the email below, extract sender, subject, body:\n..."
)
usage = tracer.extract_usage(response, model="gemini-2.0-flash")
gen.update(output=response.text, usage_details=usage)
tracer.flush()
Example 1b: Anthropic Extraction Task
from langfuse_custom_tracer import create_langfuse_client, AnthropicTracer
from anthropic import Anthropic
import os
lf = create_langfuse_client(
os.getenv("LANGFUSE_SECRET_KEY"),
os.getenv("LANGFUSE_PUBLIC_KEY")
)
tracer = AnthropicTracer(lf)
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
# Simple extraction with Claude
with tracer.trace("email-analysis") as span:
with tracer.generation("extract", model="claude-3-5-sonnet-20241022",
input="Extract sender, subject, body") as gen:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{
"role": "user",
"content": "From the email below, extract sender, subject, body:\n..."
}]
)
usage = tracer.extract_usage(response, model="claude-3-5-sonnet-20241022")
gen.update(output=response.content[0].text, usage_details=usage)
tracer.flush()
Example 2: Multi-Step Pipeline
with tracer.trace("document-processing", user_id="user-123",
metadata={"doc_type": "invoice"}) as span:
# Step 1: Extract text
with tracer.trace("step-1-extract"):
with tracer.generation("ocr", model="gemini-2.0-flash-lite"):
text = model.generate_content("Extract text from image")
# ...
# Step 2: Classify
with tracer.trace("step-2-classify"):
with tracer.generation("classify", model="gemini-2.0-flash"):
classification = model.generate_content(f"Classify: {text}")
# ...
# Step 3: Extract fields
with tracer.trace("step-3-extract-fields"):
with tracer.generation("extract", model="gemini-2.0-flash"):
fields = model.generate_content(f"Extract fields: {text}")
# ...
tracer.flush()
In Langfuse you'll see:
- Total latency: sum of all steps
- Total cost: $0.0015
- Token breakdown by step
- Each step as a child span
Example 3: Error Handling
with tracer.trace("risky-operation"):
with tracer.generation("call", model="gemini-2.0-flash"):
try:
response = model.generate_content("...")
usage = tracer.extract_usage(response)
gen.update(output=response.text, usage_details=usage)
except Exception as e:
gen.update(status_code=500, error=str(e))
raise
tracer.flush()
๐ Documentation
- SETUP.md - Installation and configuration
- TESTING.md - Testing guide and running tests
- examples/env_setup_example.py - More examples
๐ค Contributing
This is an early-stage project. Contributions welcome!
Next features:
- Additional LLM providers (Ollama, Groq, Azure, Anthropic)
- Async support
- Batch operations
- Response filtering
๐ License
MIT - See LICENSE file
๐ Support
- Documentation: Read the docs
- Issues: Report bugs on GitHub
- Questions: Check TESTING.md for common issues
Built with โค๏ธ for the LLM community
Langfuse is open-source observability for LLM applications
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langfuse_custom_tracer-1.1.0.tar.gz.
File metadata
- Download URL: langfuse_custom_tracer-1.1.0.tar.gz
- Upload date:
- Size: 39.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c453ec9390562281b9747d44f18e2739313137a6abd2c42661de2ac9e9f03d53
|
|
| MD5 |
bf92c115d18dc0fa8ceef62ec2380488
|
|
| BLAKE2b-256 |
d6ac9012a3790629e01b6d42ea37837e9aec9c1732b0fc3b45a3253bc0ca2ae1
|
File details
Details for the file langfuse_custom_tracer-1.1.0-py3-none-any.whl.
File metadata
- Download URL: langfuse_custom_tracer-1.1.0-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
152de760a87eaf1dfb19d6861df488fca44fa09955c7a0f7950f0fc53499f864
|
|
| MD5 |
10465761893aeb356220374774896a4b
|
|
| BLAKE2b-256 |
091f797625d3df1cfff11a645c7a4d10ffaebbd4fa9518596908b2f317a5aa50
|