Langfuse integration for fasteval - evaluate production traces with fasteval metrics

Project description

fasteval-langfuse

Langfuse integration for fasteval - evaluate production traces with fasteval's research-backed metrics.

Installation

pip install fasteval-core fasteval-langfuse

Quick Start

Evaluate Production Traces

Fetch traces from Langfuse and evaluate them with fasteval metrics:

from fasteval_langfuse import langfuse_traces
from fasteval_langfuse.sampling import RandomSamplingStrategy
import fasteval as fe

@fe.correctness(threshold=0.8)
@fe.hallucination(threshold=0.9)
@langfuse_traces(
    project="production",
    filter_tags=["customer-support"],
    time_range="last_24h",
    sampling=RandomSamplingStrategy(sample_size=200)
)
def test_production_traces(trace_id, input, output, context, metadata):
    # Evaluate the trace
    fe.score(output, input=input)

# Run with pytest - scores automatically pushed to Langfuse
# pytest test_production.py -v

Sampling Strategies

Reduce evaluation costs with intelligent sampling:

from fasteval_langfuse.sampling import (
    RandomSamplingStrategy,
    StratifiedSamplingStrategy,
    ScoreBasedSamplingStrategy,
)

# Random sampling - 200 random traces
@langfuse_traces(
    project="prod",
    sampling=RandomSamplingStrategy(sample_size=200, seed=42)
)
def test_random_sample(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Stratified sampling - even distribution across user types
@langfuse_traces(
    project="prod",
    sampling=StratifiedSamplingStrategy(
        strata_key="metadata.user_type",
        samples_per_stratum=30
    )
)
def test_across_segments(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

# Score-based sampling - focus on failures
@langfuse_traces(
    project="prod",
    sampling=ScoreBasedSamplingStrategy(
        score_name="user_rating",
        low_score_threshold=3.0,
        low_score_rate=1.0,      # 100% of low ratings
        high_score_rate=0.05     # 5% of high ratings
    )
)
def test_failures(trace_id, input, output, context, metadata):
    fe.score(output, input=input)

Built-in Sampling Strategies

NoSamplingStrategy: Evaluate all matching traces (default)
RandomSamplingStrategy: Unbiased random sampling
StratifiedSamplingStrategy: Even distribution across groups
ScoreBasedSamplingStrategy: Oversample low-scoring traces
RecentFirstSamplingStrategy: Prioritize recent traces

Dataset Integration

Evaluate against Langfuse datasets. All dataset columns are passed as parameters - declare what you need:

from fasteval_langfuse import langfuse_dataset

# Basic usage
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_qa_dataset(input, expected_output):
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Using custom metadata fields
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="qa-golden-set", version="v2")
def test_with_metadata(input, expected_output, difficulty, category):
    # difficulty and category come from item.metadata
    response = my_agent(input)
    fe.score(response, expected_output, input=input)

# Only what you need
@fe.correctness(threshold=0.8)
@langfuse_dataset(name="inputs-only")
def test_minimal(input):
    # Only declare input, ignore other fields
    response = my_agent(input)
    fe.score(response, input=input)

Configuration

from fasteval_langfuse import configure_langfuse, LangfuseConfig

configure_langfuse(LangfuseConfig(
    public_key="pk-...",                # Or from LANGFUSE_PUBLIC_KEY env
    secret_key="sk-...",                # Or from LANGFUSE_SECRET_KEY env
    host="https://cloud.langfuse.com",  # Or self-hosted
    default_project="production",
    auto_push_scores=True,              # Push scores back automatically
    score_name_prefix="fasteval_",      # Prefix for score names
))

RAG Evaluation with Context

The decorator automatically extracts context from trace metadata:

@fe.faithfulness(threshold=0.8)
@fe.contextual_precision(threshold=0.7)
@langfuse_traces(
    project="prod",
    filter_tags=["rag"]
)
def test_rag_quality(trace_id, input, output, context, metadata):
    # context is auto-extracted from metadata keys:
    # - "context", "retrieved_docs", "documents", "retrieval_context"
    
    # Or manually extract if needed:
    if not context:
        context = metadata.get("custom_docs_key")
    
    fe.score(output, context=context, input=input)

Benefits

💰 Cost Reduction: Reduce LLM evaluation costs by 90%+ with sampling
⚡ Faster Feedback: Evaluate in minutes vs hours
📊 Research-Backed Metrics: Use fasteval's validated evaluation metrics
🎯 Focus on Issues: Oversample failures with ScoreBasedSamplingStrategy
✅ Zero Instrumentation: Evaluate existing traces without code changes
🔄 Automatic Scoring: Evaluation results automatically sync to Langfuse

License

MIT

Project details

Release history Release notifications | RSS feed

2.1.5

Mar 20, 2026

2.1.4

Mar 19, 2026

2.1.3

Mar 9, 2026

2.1.2

Mar 7, 2026

2.1.1

Mar 7, 2026

2.1.0

Mar 2, 2026

2.0.0

Mar 2, 2026

This version

1.0.0a1 pre-release

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasteval_langfuse-1.0.0a1.tar.gz (11.3 kB view details)

Uploaded Feb 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fasteval_langfuse-1.0.0a1-py3-none-any.whl (16.4 kB view details)

Uploaded Feb 12, 2026 Python 3

File details

Details for the file fasteval_langfuse-1.0.0a1.tar.gz.

File metadata

Download URL: fasteval_langfuse-1.0.0a1.tar.gz
Upload date: Feb 12, 2026
Size: 11.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasteval_langfuse-1.0.0a1.tar.gz
Algorithm	Hash digest
SHA256	`86cfffb18b2e915a319ebd4c50bfed529c4081c43887aceb59a66fc86f67d730`
MD5	`f6179fe2dcf3d99a71008e781da7ff41`
BLAKE2b-256	`37107870a1427b59c48f4f48d83f8d20aafecd932cc1c7422823659961c16c72`

See more details on using hashes here.

File details

Details for the file fasteval_langfuse-1.0.0a1-py3-none-any.whl.

File metadata

Download URL: fasteval_langfuse-1.0.0a1-py3-none-any.whl
Upload date: Feb 12, 2026
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasteval_langfuse-1.0.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f4a137617aac130588595ace660442e124f2177f39ef51c4a30ae5804718b5c6`
MD5	`a15a1ef1808422f2ac1058dd5df27660`
BLAKE2b-256	`9fec935b4a89a288ad2f906d095d50b8819f75493c46fbf82f337603300363da`

See more details on using hashes here.

fasteval-langfuse 1.0.0a1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

fasteval-langfuse

Installation

Quick Start

Evaluate Production Traces

Sampling Strategies

Built-in Sampling Strategies

Dataset Integration

Configuration

RAG Evaluation with Context

Benefits

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes