Skip to main content

Langfuse integration for fasteval - evaluate production traces with fasteval metrics

Project description

fasteval-langfuse

Langfuse integration for fasteval - evaluate production traces with fasteval's research-backed metrics.

Installation

pip install fasteval-core fasteval-langfuse

Quick Start

Evaluate Production Traces

Fetch traces from Langfuse and evaluate them with fasteval metrics:

from fasteval_langfuse import langfuse_traces
from fasteval_langfuse.sampling import RandomSamplingStrategy
import fasteval as fe

@fe.correctness(threshold=0.8)
@fe.hallucination(threshold=0.9)
@langfuse_traces(
    project="production",
    filter_tags=["customer-support"],
    time_range="last_24h",
    sampling=RandomSamplingStrategy(sample_size=200)
)
def test_production_traces(trace_id, input, output, context, metadata):
    # Evaluate the trace
    fe.score(output, input=input)

# Run with pytest - scores automatically pushed to Langfuse
# pytest test_production.py -v

Sampling Strategies

Reduce evaluation costs with intelligent sampling:

from fasteval_langfuse.sampling import (
    RandomSamplingStrategy,
    StratifiedSamplingStrategy,
    ScoreBasedSamplingStrategy,
)

# Random sampling - 200 random traces
@langfuse_traces(
    project="prod",
    sampling=RandomSamplingStrategy(sample_size=200, seed=42)
)
def test_random_sample(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Stratified sampling - even distribution across user types
@langfuse_traces(
    project="prod",
    sampling=StratifiedSamplingStrategy(
        strata_key="metadata.user_type",
        samples_per_stratum=30
    )
)
def test_across_segments(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Score-based sampling - focus on failures
@langfuse_traces(
    project="prod",
    sampling=ScoreBasedSamplingStrategy(
        score_name="user_rating",
        low_score_threshold=3.0,
        low_score_rate=1.0,      # 100% of low ratings
        high_score_rate=0.05     # 5% of high ratings
    )
)
def test_failures(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

Built-in Sampling Strategies

  • NoSamplingStrategy: Evaluate all matching traces (default)
  • RandomSamplingStrategy: Unbiased random sampling
  • StratifiedSamplingStrategy: Even distribution across groups
  • ScoreBasedSamplingStrategy: Oversample low-scoring traces
  • RecentFirstSamplingStrategy: Prioritize recent traces

Dataset Integration

Evaluate against Langfuse datasets. All dataset columns are passed as parameters - declare what you need:

from fasteval_langfuse import langfuse_dataset

# Basic usage
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_qa_dataset(input, expected_output):
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Using custom metadata fields
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_with_metadata(input, expected_output, difficulty, category):
    # difficulty and category come from item.metadata
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Only what you need
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="inputs-only")
def test_minimal(input):
    # Only declare input, ignore other fields
    response = my_agent(input)
    fe.score(response, input=input)

Configuration

from fasteval_langfuse import configure_langfuse, LangfuseConfig

configure_langfuse(LangfuseConfig(
    public_key="pk-...",                # Or from LANGFUSE_PUBLIC_KEY env
    secret_key="sk-...",                # Or from LANGFUSE_SECRET_KEY env
    host="https://cloud.langfuse.com",  # Or self-hosted
    default_project="production",
    auto_push_scores=True,              # Push scores back automatically
    score_name_prefix="fasteval_",      # Prefix for score names
))

RAG Evaluation with Context

The decorator automatically extracts context from trace metadata:

@fe.faithfulness(threshold=0.8)
@fe.contextual_precision(threshold=0.7)
@langfuse_traces(
    project="prod",
    filter_tags=["rag"]
)
def test_rag_quality(trace_id, input, output, context, metadata):
    # context is auto-extracted from metadata keys:
    # - "context", "retrieved_docs", "documents", "retrieval_context"
    
    # Or manually extract if needed:
    if not context:
        context = metadata.get("custom_docs_key")
    
    fe.score(output, context=context, input=input)

Benefits

  • 💰 Cost Reduction: Reduce LLM evaluation costs by 90%+ with sampling
  • Faster Feedback: Evaluate in minutes vs hours
  • 📊 Research-Backed Metrics: Use fasteval's validated evaluation metrics
  • 🎯 Focus on Issues: Oversample failures with ScoreBasedSamplingStrategy
  • Zero Instrumentation: Evaluate existing traces without code changes
  • 🔄 Automatic Scoring: Evaluation results automatically sync to Langfuse

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasteval_langfuse-2.1.5.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasteval_langfuse-2.1.5-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file fasteval_langfuse-2.1.5.tar.gz.

File metadata

  • Download URL: fasteval_langfuse-2.1.5.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fasteval_langfuse-2.1.5.tar.gz
Algorithm Hash digest
SHA256 1de4ec21f510c8406652f01af9c507b6f63257a56e2a2d26d5af58885f51d19b
MD5 e1faef28e62fcb2d75fbc8196f2265ce
BLAKE2b-256 225b6997efd7db8ff903920c5e2655f065561319e88034fb49e5bf5982793c07

See more details on using hashes here.

Provenance

The following attestation bundles were made for fasteval_langfuse-2.1.5.tar.gz:

Publisher: release.yml on intuit/fasteval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fasteval_langfuse-2.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for fasteval_langfuse-2.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 60e6bed00f45e2cffb78853a95f15498bc6bc8dc423ca7dd2f6f9993644bf982
MD5 3f3100c047c059df653d86d1ffdf3f72
BLAKE2b-256 fec4f6efb26517e4367fba80f2c2403bfe72d86beb1d0b56ac36b0a3973e2ff7

See more details on using hashes here.

Provenance

The following attestation bundles were made for fasteval_langfuse-2.1.5-py3-none-any.whl:

Publisher: release.yml on intuit/fasteval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page