Skip to main content

Langfuse integration for fasteval - evaluate production traces with fasteval metrics

Project description

fasteval-langfuse

Langfuse integration for fasteval - evaluate production traces with fasteval's research-backed metrics.

Installation

pip install fasteval-core fasteval-langfuse

Quick Start

Evaluate Production Traces

Fetch traces from Langfuse and evaluate them with fasteval metrics:

from fasteval_langfuse import langfuse_traces
from fasteval_langfuse.sampling import RandomSamplingStrategy
import fasteval as fe

@fe.correctness(threshold=0.8)
@fe.hallucination(threshold=0.9)
@langfuse_traces(
    project="production",
    filter_tags=["customer-support"],
    time_range="last_24h",
    sampling=RandomSamplingStrategy(sample_size=200)
)
def test_production_traces(trace_id, input, output, context, metadata):
    # Evaluate the trace
    fe.score(output, input=input)

# Run with pytest - scores automatically pushed to Langfuse
# pytest test_production.py -v

Sampling Strategies

Reduce evaluation costs with intelligent sampling:

from fasteval_langfuse.sampling import (
    RandomSamplingStrategy,
    StratifiedSamplingStrategy,
    ScoreBasedSamplingStrategy,
)

# Random sampling - 200 random traces
@langfuse_traces(
    project="prod",
    sampling=RandomSamplingStrategy(sample_size=200, seed=42)
)
def test_random_sample(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Stratified sampling - even distribution across user types
@langfuse_traces(
    project="prod",
    sampling=StratifiedSamplingStrategy(
        strata_key="metadata.user_type",
        samples_per_stratum=30
    )
)
def test_across_segments(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Score-based sampling - focus on failures
@langfuse_traces(
    project="prod",
    sampling=ScoreBasedSamplingStrategy(
        score_name="user_rating",
        low_score_threshold=3.0,
        low_score_rate=1.0,      # 100% of low ratings
        high_score_rate=0.05     # 5% of high ratings
    )
)
def test_failures(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

Built-in Sampling Strategies

  • NoSamplingStrategy: Evaluate all matching traces (default)
  • RandomSamplingStrategy: Unbiased random sampling
  • StratifiedSamplingStrategy: Even distribution across groups
  • ScoreBasedSamplingStrategy: Oversample low-scoring traces
  • RecentFirstSamplingStrategy: Prioritize recent traces

Dataset Integration

Evaluate against Langfuse datasets. All dataset columns are passed as parameters - declare what you need:

from fasteval_langfuse import langfuse_dataset

# Basic usage
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_qa_dataset(input, expected_output):
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Using custom metadata fields
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_with_metadata(input, expected_output, difficulty, category):
    # difficulty and category come from item.metadata
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Only what you need
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="inputs-only")
def test_minimal(input):
    # Only declare input, ignore other fields
    response = my_agent(input)
    fe.score(response, input=input)

Configuration

from fasteval_langfuse import configure_langfuse, LangfuseConfig

configure_langfuse(LangfuseConfig(
    public_key="pk-...",                # Or from LANGFUSE_PUBLIC_KEY env
    secret_key="sk-...",                # Or from LANGFUSE_SECRET_KEY env
    host="https://cloud.langfuse.com",  # Or self-hosted
    default_project="production",
    auto_push_scores=True,              # Push scores back automatically
    score_name_prefix="fasteval_",      # Prefix for score names
))

RAG Evaluation with Context

The decorator automatically extracts context from trace metadata:

@fe.faithfulness(threshold=0.8)
@fe.contextual_precision(threshold=0.7)
@langfuse_traces(
    project="prod",
    filter_tags=["rag"]
)
def test_rag_quality(trace_id, input, output, context, metadata):
    # context is auto-extracted from metadata keys:
    # - "context", "retrieved_docs", "documents", "retrieval_context"
    
    # Or manually extract if needed:
    if not context:
        context = metadata.get("custom_docs_key")
    
    fe.score(output, context=context, input=input)

Benefits

  • 💰 Cost Reduction: Reduce LLM evaluation costs by 90%+ with sampling
  • Faster Feedback: Evaluate in minutes vs hours
  • 📊 Research-Backed Metrics: Use fasteval's validated evaluation metrics
  • 🎯 Focus on Issues: Oversample failures with ScoreBasedSamplingStrategy
  • Zero Instrumentation: Evaluate existing traces without code changes
  • 🔄 Automatic Scoring: Evaluation results automatically sync to Langfuse

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasteval_langfuse-2.1.2.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasteval_langfuse-2.1.2-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file fasteval_langfuse-2.1.2.tar.gz.

File metadata

  • Download URL: fasteval_langfuse-2.1.2.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fasteval_langfuse-2.1.2.tar.gz
Algorithm Hash digest
SHA256 6240fc41158f50c09507502b08ee6de70a96b30e1cf9ec8e329113b41c834afc
MD5 b1bd6792619422b6d68d50da66ae25c5
BLAKE2b-256 b3ce774a51550e9a021b2afbe42136c329e2996c01195ca89f84eed76cd30d66

See more details on using hashes here.

Provenance

The following attestation bundles were made for fasteval_langfuse-2.1.2.tar.gz:

Publisher: release.yml on intuit/fasteval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fasteval_langfuse-2.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for fasteval_langfuse-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 87a94f7f679189a9593f68f2f1e65032980ce39c91a700f802131080ba9a3645
MD5 fcde45a585370735ff84c657f4f32921
BLAKE2b-256 6375e3fd34cbb05c95ba25c27738f8512359b0198b52d6df031b23d51121c2f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for fasteval_langfuse-2.1.2-py3-none-any.whl:

Publisher: release.yml on intuit/fasteval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page