Skip to main content

Langfuse integration for fasteval - evaluate production traces with fasteval metrics

Project description

fasteval-langfuse

Langfuse integration for fasteval - evaluate production traces with fasteval's research-backed metrics.

Installation

pip install fasteval-core fasteval-langfuse

Quick Start

Evaluate Production Traces

Fetch traces from Langfuse and evaluate them with fasteval metrics:

from fasteval_langfuse import langfuse_traces
from fasteval_langfuse.sampling import RandomSamplingStrategy
import fasteval as fe

@fe.correctness(threshold=0.8)
@fe.hallucination(threshold=0.9)
@langfuse_traces(
    project="production",
    filter_tags=["customer-support"],
    time_range="last_24h",
    sampling=RandomSamplingStrategy(sample_size=200)
)
def test_production_traces(trace_id, input, output, context, metadata):
    # Evaluate the trace
    fe.score(output, input=input)

# Run with pytest - scores automatically pushed to Langfuse
# pytest test_production.py -v

Sampling Strategies

Reduce evaluation costs with intelligent sampling:

from fasteval_langfuse.sampling import (
    RandomSamplingStrategy,
    StratifiedSamplingStrategy,
    ScoreBasedSamplingStrategy,
)

# Random sampling - 200 random traces
@langfuse_traces(
    project="prod",
    sampling=RandomSamplingStrategy(sample_size=200, seed=42)
)
def test_random_sample(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Stratified sampling - even distribution across user types
@langfuse_traces(
    project="prod",
    sampling=StratifiedSamplingStrategy(
        strata_key="metadata.user_type",
        samples_per_stratum=30
    )
)
def test_across_segments(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Score-based sampling - focus on failures
@langfuse_traces(
    project="prod",
    sampling=ScoreBasedSamplingStrategy(
        score_name="user_rating",
        low_score_threshold=3.0,
        low_score_rate=1.0,      # 100% of low ratings
        high_score_rate=0.05     # 5% of high ratings
    )
)
def test_failures(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

Built-in Sampling Strategies

  • NoSamplingStrategy: Evaluate all matching traces (default)
  • RandomSamplingStrategy: Unbiased random sampling
  • StratifiedSamplingStrategy: Even distribution across groups
  • ScoreBasedSamplingStrategy: Oversample low-scoring traces
  • RecentFirstSamplingStrategy: Prioritize recent traces

Dataset Integration

Evaluate against Langfuse datasets. All dataset columns are passed as parameters - declare what you need:

from fasteval_langfuse import langfuse_dataset

# Basic usage
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_qa_dataset(input, expected_output):
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Using custom metadata fields
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_with_metadata(input, expected_output, difficulty, category):
    # difficulty and category come from item.metadata
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Only what you need
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="inputs-only")
def test_minimal(input):
    # Only declare input, ignore other fields
    response = my_agent(input)
    fe.score(response, input=input)

Configuration

from fasteval_langfuse import configure_langfuse, LangfuseConfig

configure_langfuse(LangfuseConfig(
    public_key="pk-...",                # Or from LANGFUSE_PUBLIC_KEY env
    secret_key="sk-...",                # Or from LANGFUSE_SECRET_KEY env
    host="https://cloud.langfuse.com",  # Or self-hosted
    default_project="production",
    auto_push_scores=True,              # Push scores back automatically
    score_name_prefix="fasteval_",      # Prefix for score names
))

RAG Evaluation with Context

The decorator automatically extracts context from trace metadata:

@fe.faithfulness(threshold=0.8)
@fe.contextual_precision(threshold=0.7)
@langfuse_traces(
    project="prod",
    filter_tags=["rag"]
)
def test_rag_quality(trace_id, input, output, context, metadata):
    # context is auto-extracted from metadata keys:
    # - "context", "retrieved_docs", "documents", "retrieval_context"
    
    # Or manually extract if needed:
    if not context:
        context = metadata.get("custom_docs_key")
    
    fe.score(output, context=context, input=input)

Benefits

  • 💰 Cost Reduction: Reduce LLM evaluation costs by 90%+ with sampling
  • Faster Feedback: Evaluate in minutes vs hours
  • 📊 Research-Backed Metrics: Use fasteval's validated evaluation metrics
  • 🎯 Focus on Issues: Oversample failures with ScoreBasedSamplingStrategy
  • Zero Instrumentation: Evaluate existing traces without code changes
  • 🔄 Automatic Scoring: Evaluation results automatically sync to Langfuse

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasteval_langfuse-1.0.0a1.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasteval_langfuse-1.0.0a1-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file fasteval_langfuse-1.0.0a1.tar.gz.

File metadata

  • Download URL: fasteval_langfuse-1.0.0a1.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasteval_langfuse-1.0.0a1.tar.gz
Algorithm Hash digest
SHA256 86cfffb18b2e915a319ebd4c50bfed529c4081c43887aceb59a66fc86f67d730
MD5 f6179fe2dcf3d99a71008e781da7ff41
BLAKE2b-256 37107870a1427b59c48f4f48d83f8d20aafecd932cc1c7422823659961c16c72

See more details on using hashes here.

File details

Details for the file fasteval_langfuse-1.0.0a1-py3-none-any.whl.

File metadata

  • Download URL: fasteval_langfuse-1.0.0a1-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasteval_langfuse-1.0.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 f4a137617aac130588595ace660442e124f2177f39ef51c4a30ae5804718b5c6
MD5 a15a1ef1808422f2ac1058dd5df27660
BLAKE2b-256 9fec935b4a89a288ad2f906d095d50b8819f75493c46fbf82f337603300363da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page