A DSPy metric function learning package
Project description
DSPy Metric Learning
A powerful package for learning and optimizing metric functions for DSPy, leveraging language models to create better evaluation metrics for your generative AI applications.
Note: This package is currently in pre-alpha stage. The API is likely to change significantly in future releases.
🌟 Features
- LLM-based Evaluation: Define metric functions as DSPy modules using language models
- Custom Scoring: Pass your preferred language models for rating predictions
- Data Management: Store and manage scored outputs in an organized directory structure
- Interactive Labeling: Simple REPL interface for human labeling of examples
- Optimization: DSPy-powered optimization for metric function modules
- Multi-metric Support: Create and manage multiple specialized metric functions
- Comprehensive Testing: Extensive test suite with 92% code coverage
📋 Table of Contents
📦 Installation
⚠️ Pre-Alpha Release: This package is in very early development. The API is unstable and major architectural changes are expected.
pip install dspy-metric-learning
You can also install directly from the repository for the latest development version:
git clone https://github.com/tom-doerr/dspy_metric_learning.git
cd dspy_metric_learning
pip install -e .
🚀 Quick Start
import dspy
from metric_learner import MetricModule, MetricDataManager
# Initialize a language model
lm = dspy.OpenAI(model="gpt-3.5-turbo")
# Create a metric module
metric = MetricModule(lm=lm)
# Score a prediction
score = metric(
input="What is the capital of France?",
prediction="Paris is the capital of France.",
gold="Paris"
)
print(f"Score: {score}") # Output: Score: 0.92
📚 Usage Examples
1. Creating a Metric Module
from metric_learner import MetricModule
# Create with custom prompt template
metric = MetricModule(
lm=lm,
prompt_template=(
"Rate the factual accuracy of the answer '{prediction}' "
"for the question '{input}' on a scale from 0 to 1."
)
)
2. Managing Data
from metric_learner import MetricDataManager
# Create a data manager
data_manager = MetricDataManager(metric_name="factual_accuracy")
# Save an instance
data_manager.save_instance(
input="What is the tallest mountain?",
prediction="Mount Everest is the tallest mountain on Earth.",
gold="Mount Everest",
score=0.9
)
# Load instances
instances = data_manager.load_instances()
3. Optimizing a Metric
from metric_learner import optimize_metric_module
# Get labeled dataset
dataset = data_manager.get_labeled_dataset()
# Optimize the metric
optimized_metric = optimize_metric_module(metric, dataset)
🔍 Examples
The examples/ directory contains several example scripts:
| Example | Description |
|---|---|
basic_usage.py |
Simple demonstration of core functionality |
multiple_metrics.py |
Using multiple specialized metrics |
streamlit_app.py |
Interactive web interface for labeling and optimization |
complete_workflow.py |
End-to-end workflow from data collection to optimization |
Run the complete workflow example:
python examples/complete_workflow.py
Run the Streamlit app (in headless mode):
streamlit run examples/streamlit_app.py --server.headless=true
📖 API Reference
Core Components
| Component | Description |
|---|---|
MetricModule |
Core class for defining and using metric functions |
MetricDataManager |
Manages storage and retrieval of labeled instances |
optimize_metric_module |
Function to optimize a metric module using labeled data |
MetricEvaluator |
Evaluates the performance of a metric module |
label_instances |
Interactive REPL interface for labeling instances |
MetricModule
class MetricModule(dspy.Module):
"""Module for evaluating predictions using a language model."""
Parameters:
lm: Language model to use for scoringprompt_template: Optional custom prompt template for the metricdemonstrations: Optional list of demonstration examples
Methods:
__call__(input, prediction, gold=None): Score a prediction
MetricDataManager
class MetricDataManager:
"""Manages storage and retrieval of metric data."""
Parameters:
metric_name: Name of the metricdata_dir: Optional directory for storing data
Methods:
save_instance(input, prediction, gold=None, score=None): Save an instanceload_instances(): Load all instancesupdate_user_score(datetime, score): Update user score for an instanceget_labeled_dataset(): Get a dataset of labeled instances
Optimization Functions
# Optimize a metric module
optimized_module = optimize_metric_module(
metric_module, # MetricModule to optimize
dataset, # Dataset of labeled examples
metric_fn=None, # Optional custom metric function
optimizer_class=None # Optional custom optimizer class
)
# Evaluate a metric module
evaluator = MetricEvaluator(metric_module, data_manager)
metrics = evaluator.evaluate() # Returns MSE, correlation, etc.
Interactive Labeling
# Start an interactive labeling session
label_instances(
data_manager, # MetricDataManager instance
quit_after=None, # Optional number of instances to label
skip_labeled=True # Whether to skip already labeled instances
)
🧪 Testing
Run unit tests:
python -m pytest tests/
Run integration tests:
python -m pytest integration_tests/
Run specific test categories:
python -m pytest -m "integration and not slow"
👥 Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dspy_metric_learning-0.1.0rc1.tar.gz.
File metadata
- Download URL: dspy_metric_learning-0.1.0rc1.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.10 Linux/5.15.0-131-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce519fa67babce0c12876ecf2367fd96f871c106c7790ac6371bb8c1928520dc
|
|
| MD5 |
7583ff57f5dd0e8cb26a66994e598fc1
|
|
| BLAKE2b-256 |
365c83b1b05f986686c9e97907c2e03666cdd8b6236d97a17a3af60f2e6d40e4
|
File details
Details for the file dspy_metric_learning-0.1.0rc1-py3-none-any.whl.
File metadata
- Download URL: dspy_metric_learning-0.1.0rc1-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.10 Linux/5.15.0-131-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b37ff0d71a98bfa3bf54e1e8a47d9f53f99a26a5b8d1ca440ede3e9c15209353
|
|
| MD5 |
b8158a01b443d915705b610b7b9ec70c
|
|
| BLAKE2b-256 |
ef3fb6cd1d8d66575cc750b87fc5cf984fe96c8d9622811003f499e8d5176568
|