Skip to main content

A DSPy metric function learning package

Project description

DSPy Metric Learning

DSPy Metric Learning

Status: Pre-Alpha Test Status Coverage License: MIT Python DSPy

A powerful package for learning and optimizing metric functions for DSPy, leveraging language models to create better evaluation metrics for your generative AI applications.

Note: This package is currently in pre-alpha stage. The API is likely to change significantly in future releases.


🌟 Features

  • LLM-based Evaluation: Define metric functions as DSPy modules using language models
  • Custom Scoring: Pass your preferred language models for rating predictions
  • Data Management: Store and manage scored outputs in an organized directory structure
  • Interactive Labeling: Simple REPL interface for human labeling of examples
  • Optimization: DSPy-powered optimization for metric function modules
  • Multi-metric Support: Create and manage multiple specialized metric functions
  • Comprehensive Testing: Extensive test suite with 92% code coverage

📋 Table of Contents

📦 Installation

⚠️ Pre-Alpha Release: This package is in very early development. The API is unstable and major architectural changes are expected.

pip install dspy-metric-learning

You can also install directly from the repository for the latest development version:

git clone https://github.com/tom-doerr/dspy_metric_learning.git
cd dspy_metric_learning
pip install -e .

🚀 Quick Start

import dspy
from metric_learner import MetricModule, MetricDataManager

# Initialize a language model
lm = dspy.OpenAI(model="gpt-3.5-turbo")

# Create a metric module
metric = MetricModule(lm=lm)

# Score a prediction
score = metric(
    input="What is the capital of France?",
    prediction="Paris is the capital of France.",
    gold="Paris"
)

print(f"Score: {score}")  # Output: Score: 0.92

📚 Usage Examples

1. Creating a Metric Module

from metric_learner import MetricModule

# Create with custom prompt template
metric = MetricModule(
    lm=lm,
    prompt_template=(
        "Rate the factual accuracy of the answer '{prediction}' "
        "for the question '{input}' on a scale from 0 to 1."
    )
)

2. Managing Data

from metric_learner import MetricDataManager

# Create a data manager
data_manager = MetricDataManager(metric_name="factual_accuracy")

# Save an instance
data_manager.save_instance(
    input="What is the tallest mountain?",
    prediction="Mount Everest is the tallest mountain on Earth.",
    gold="Mount Everest",
    score=0.9
)

# Load instances
instances = data_manager.load_instances()

3. Optimizing a Metric

from metric_learner import optimize_metric_module

# Get labeled dataset
dataset = data_manager.get_labeled_dataset()

# Optimize the metric
optimized_metric = optimize_metric_module(metric, dataset)

🔍 Examples

The examples/ directory contains several example scripts:

Example Description
basic_usage.py Simple demonstration of core functionality
multiple_metrics.py Using multiple specialized metrics
streamlit_app.py Interactive web interface for labeling and optimization
complete_workflow.py End-to-end workflow from data collection to optimization

Run the complete workflow example:

python examples/complete_workflow.py

Run the Streamlit app (in headless mode):

streamlit run examples/streamlit_app.py --server.headless=true

📖 API Reference

Core Components

Component Description
MetricModule Core class for defining and using metric functions
MetricDataManager Manages storage and retrieval of labeled instances
optimize_metric_module Function to optimize a metric module using labeled data
MetricEvaluator Evaluates the performance of a metric module
label_instances Interactive REPL interface for labeling instances

MetricModule

class MetricModule(dspy.Module):
    """Module for evaluating predictions using a language model."""

Parameters:

  • lm: Language model to use for scoring
  • prompt_template: Optional custom prompt template for the metric
  • demonstrations: Optional list of demonstration examples

Methods:

  • __call__(input, prediction, gold=None): Score a prediction

MetricDataManager

class MetricDataManager:
    """Manages storage and retrieval of metric data."""

Parameters:

  • metric_name: Name of the metric
  • data_dir: Optional directory for storing data

Methods:

  • save_instance(input, prediction, gold=None, score=None): Save an instance
  • load_instances(): Load all instances
  • update_user_score(datetime, score): Update user score for an instance
  • get_labeled_dataset(): Get a dataset of labeled instances

Optimization Functions

# Optimize a metric module
optimized_module = optimize_metric_module(
    metric_module,    # MetricModule to optimize
    dataset,          # Dataset of labeled examples
    metric_fn=None,   # Optional custom metric function
    optimizer_class=None  # Optional custom optimizer class
)

# Evaluate a metric module
evaluator = MetricEvaluator(metric_module, data_manager)
metrics = evaluator.evaluate()  # Returns MSE, correlation, etc.

Interactive Labeling

# Start an interactive labeling session
label_instances(
    data_manager,     # MetricDataManager instance
    quit_after=None,  # Optional number of instances to label
    skip_labeled=True # Whether to skip already labeled instances
)

🧪 Testing

Run unit tests:

python -m pytest tests/

Run integration tests:

python -m pytest integration_tests/

Run specific test categories:

python -m pytest -m "integration and not slow"

👥 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ using DSPy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspy_metric_learning-0.1.0rc1.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dspy_metric_learning-0.1.0rc1-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file dspy_metric_learning-0.1.0rc1.tar.gz.

File metadata

  • Download URL: dspy_metric_learning-0.1.0rc1.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.10 Linux/5.15.0-131-generic

File hashes

Hashes for dspy_metric_learning-0.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 ce519fa67babce0c12876ecf2367fd96f871c106c7790ac6371bb8c1928520dc
MD5 7583ff57f5dd0e8cb26a66994e598fc1
BLAKE2b-256 365c83b1b05f986686c9e97907c2e03666cdd8b6236d97a17a3af60f2e6d40e4

See more details on using hashes here.

File details

Details for the file dspy_metric_learning-0.1.0rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for dspy_metric_learning-0.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 b37ff0d71a98bfa3bf54e1e8a47d9f53f99a26a5b8d1ca440ede3e9c15209353
MD5 b8158a01b443d915705b610b7b9ec70c
BLAKE2b-256 ef3fb6cd1d8d66575cc750b87fc5cf984fe96c8d9622811003f499e8d5176568

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page