Skip to main content

Langfuse integration for fasteval - evaluate production traces with fasteval metrics

Project description

fasteval-langfuse

Langfuse integration for fasteval - evaluate production traces with fasteval's research-backed metrics.

Installation

pip install fasteval-core fasteval-langfuse

Quick Start

Evaluate Production Traces

Fetch traces from Langfuse and evaluate them with fasteval metrics:

from fasteval_langfuse import langfuse_traces
from fasteval_langfuse.sampling import RandomSamplingStrategy
import fasteval as fe

@fe.correctness(threshold=0.8)
@fe.hallucination(threshold=0.9)
@langfuse_traces(
    project="production",
    filter_tags=["customer-support"],
    time_range="last_24h",
    sampling=RandomSamplingStrategy(sample_size=200)
)
def test_production_traces(trace_id, input, output, context, metadata):
    # Evaluate the trace
    fe.score(output, input=input)

# Run with pytest - scores automatically pushed to Langfuse
# pytest test_production.py -v

Sampling Strategies

Reduce evaluation costs with intelligent sampling:

from fasteval_langfuse.sampling import (
    RandomSamplingStrategy,
    StratifiedSamplingStrategy,
    ScoreBasedSamplingStrategy,
)

# Random sampling - 200 random traces
@langfuse_traces(
    project="prod",
    sampling=RandomSamplingStrategy(sample_size=200, seed=42)
)
def test_random_sample(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Stratified sampling - even distribution across user types
@langfuse_traces(
    project="prod",
    sampling=StratifiedSamplingStrategy(
        strata_key="metadata.user_type",
        samples_per_stratum=30
    )
)
def test_across_segments(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Score-based sampling - focus on failures
@langfuse_traces(
    project="prod",
    sampling=ScoreBasedSamplingStrategy(
        score_name="user_rating",
        low_score_threshold=3.0,
        low_score_rate=1.0,      # 100% of low ratings
        high_score_rate=0.05     # 5% of high ratings
    )
)
def test_failures(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

Built-in Sampling Strategies

  • NoSamplingStrategy: Evaluate all matching traces (default)
  • RandomSamplingStrategy: Unbiased random sampling
  • StratifiedSamplingStrategy: Even distribution across groups
  • ScoreBasedSamplingStrategy: Oversample low-scoring traces
  • RecentFirstSamplingStrategy: Prioritize recent traces

Dataset Integration

Evaluate against Langfuse datasets. All dataset columns are passed as parameters - declare what you need:

from fasteval_langfuse import langfuse_dataset

# Basic usage
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_qa_dataset(input, expected_output):
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Using custom metadata fields
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_with_metadata(input, expected_output, difficulty, category):
    # difficulty and category come from item.metadata
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Only what you need
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="inputs-only")
def test_minimal(input):
    # Only declare input, ignore other fields
    response = my_agent(input)
    fe.score(response, input=input)

Configuration

from fasteval_langfuse import configure_langfuse, LangfuseConfig

configure_langfuse(LangfuseConfig(
    public_key="pk-...",                # Or from LANGFUSE_PUBLIC_KEY env
    secret_key="sk-...",                # Or from LANGFUSE_SECRET_KEY env
    host="https://cloud.langfuse.com",  # Or self-hosted
    default_project="production",
    auto_push_scores=True,              # Push scores back automatically
    score_name_prefix="fasteval_",      # Prefix for score names
))

RAG Evaluation with Context

The decorator automatically extracts context from trace metadata:

@fe.faithfulness(threshold=0.8)
@fe.contextual_precision(threshold=0.7)
@langfuse_traces(
    project="prod",
    filter_tags=["rag"]
)
def test_rag_quality(trace_id, input, output, context, metadata):
    # context is auto-extracted from metadata keys:
    # - "context", "retrieved_docs", "documents", "retrieval_context"
    
    # Or manually extract if needed:
    if not context:
        context = metadata.get("custom_docs_key")
    
    fe.score(output, context=context, input=input)

Benefits

  • 💰 Cost Reduction: Reduce LLM evaluation costs by 90%+ with sampling
  • Faster Feedback: Evaluate in minutes vs hours
  • 📊 Research-Backed Metrics: Use fasteval's validated evaluation metrics
  • 🎯 Focus on Issues: Oversample failures with ScoreBasedSamplingStrategy
  • Zero Instrumentation: Evaluate existing traces without code changes
  • 🔄 Automatic Scoring: Evaluation results automatically sync to Langfuse

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasteval_langfuse-2.1.1.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasteval_langfuse-2.1.1-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file fasteval_langfuse-2.1.1.tar.gz.

File metadata

  • Download URL: fasteval_langfuse-2.1.1.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fasteval_langfuse-2.1.1.tar.gz
Algorithm Hash digest
SHA256 35a6161c8d3e8a7397d3123d5e9ebf4ccf913c3759adc6c207fe34f812fe14f1
MD5 bb457a8dc874a73773b8f2c9a5bdf9a6
BLAKE2b-256 bfdbc6296e832a9e9d1425e8f37294882d89803661680c55e1863c5090dd215d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fasteval_langfuse-2.1.1.tar.gz:

Publisher: release.yml on intuit/fasteval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fasteval_langfuse-2.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for fasteval_langfuse-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0d45bb1c128f4b73dffbc38c38f4c5901be2a759608bd26b46b021549e2b7b7b
MD5 286a498a4a8811f3759c5f83eb0f5455
BLAKE2b-256 8c700144c0281f65627803626704a4a85a59ec943cb75e9a0c866be8f0ddca3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fasteval_langfuse-2.1.1-py3-none-any.whl:

Publisher: release.yml on intuit/fasteval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page