A Python package for calculating key metrics to assess LLM performance in various tasks, including extracting structured dataa.
Project description
LLMDataLens
LLMDataLens is a powerful and flexible framework for evaluating LLM-based applications with structured output. It provides a comprehensive suite of tools for assessing the performance of language models across various metrics, with a focus on experiment tracking and reproducibility.
๐ Features
- Structured Output Evaluation: Assess LLM outputs against ground truth data with precision.
- Customizable Metrics: Easily define and use custom metrics for comprehensive performance assessment.
- Experiment Tracking: Built-in experiment management for reproducibility and comparison.
- Prompt Versioning: Keep track of prompt evolution and its impact on model performance.
- Model Version Tracking: Monitor performance across different model versions.
- Flexible Integration: Seamlessly integrate with existing LLM pipelines and workflows.
- Extensible Architecture: Add custom metrics, evaluators, and experiment trackers with ease.
๐ Quick Start
Installation
Install LLMDataLens directly from PyPI:
pip install llm-data-lens
For development or to get the latest version from the repository:
-
Clone the repository:
git clone https://github.com/codingmindset/LLMDataLens.git cd llmdatalens
-
Install the package using Poetry:
poetry install
Basic Usage
Here's a simple example to get you started:
from llmdatalens.evaluators import StructuredOutputEvaluator
from llmdatalens.core import LLMOutputData, GroundTruthData
from llmdatalens.core.metrics_registry import MetricNames
# Create an evaluator with specific metrics
evaluator = StructuredOutputEvaluator(
metrics=[MetricNames.OverallAccuracy, MetricNames.AverageLatency],
experiment_name="Invoice Processing Experiment"
)
# Add LLM output and ground truth data
llm_output = LLMOutputData(
raw_output="Processed invoice: $100",
structured_output={"invoice_amount": 100},
metadata={
"model_info": {"name": "GPT-3.5", "version": "1.0"},
"prompt_info": {"text": "Extract invoice amount:"}
}
)
ground_truth = GroundTruthData(
data={"invoice_amount": 100}
)
evaluator.add_llm_output(llm_output, latency=0.5, confidence=0.9)
evaluator.add_ground_truth(ground_truth)
# Evaluate
result = evaluator.evaluate()
# Print results
print(result.metrics)
# Access experiment data
experiment = evaluator.experiment_manager.get_experiment(evaluator.experiment_id)
print(f"Experiment: {experiment.name}")
print(f"Number of runs: {len(experiment.runs)}")
print(f"Prompts used: {len(experiment.prompts)}")
print(f"Models used: {list(experiment.models.keys())}")
๐ Advanced Features
Custom Metrics
Create and register custom metrics easily:
from llmdatalens.core.metrics_registry import register_metric
from llmdatalens.core.enums import MetricField
@register_metric("CustomF1Score", field=MetricField.Accuracy, input_keys=["y_true", "y_pred"])
def calculate_custom_f1_score(y_true, y_pred):
""" This description will be shown in the metrics registry """
# (Your custom F1 score calculation here
pass
Experiment Tracking
Track experiments, prompts, and model versions:
# Get prompt history
prompt_history = evaluator.experiment_manager.get_prompt_history(evaluator.experiment_id)
# Get model history
model_history = evaluator.experiment_manager.get_model_history(evaluator.experiment_id)
# Compare runs
for run in experiment.runs:
print(f"Run {run.id}: {run.metrics}")
For more detailed examples, check the examples/ directory in the repository. (More examples will be added soon!)
๐ Documentation
(Comming soon!)
๐ ๏ธ Project Structure
llmdatalens/
โโโ src/
โ โโโ llmdatalens/
โ โโโ core/
โ โ โโโ base_model.py
โ โ โโโ enums.py
โ โ โโโ metrics_registry.py
โ โโโ evaluators/
โ โ โโโ structured_output_evaluator.py
โ โโโ experiment/
โ โโโ experiment_manager.py
โ โโโ models.py
โโโ tests/
โ โโโ test_core/
โ โโโ test_evaluators/
โ โโโ test_experiment/
โโโ examples/
โโโ docs/
โโโ pyproject.toml
โโโ README.md
๐ค Contributing
We welcome contributions to LLMDataLens! Here's how you can help:
- Fork the repository
- Create a new branch (
git checkout -b feature/AmazingFeature) - Make your changes
- Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Please read our Contributing Guidelines for more details.
๐ License
LLMDataLens is released under the MIT License. See the LICENSE file for details.
๐ฌ Contact
If you have any questions, suggestions, or just want to say hi, feel free to reach out:
- Email: elvin@codingmindset.io
- X: @codingmindset
- GitHub Issues: For bug reports and feature requests
๐ Acknowledgements
- Thanks to all our contributors and users for their valuable feedback and support.
- Special thanks to the open-source community for the amazing tools and libraries that made this project possible.
Built with โค๏ธ by Coding Mindset
Citing LLMDataLens
If you use LLMDataLens in your research, please cite it as follows:
@software{llmdatalens,
title = {LLMDataLens: A Framework for Evaluating LLM-based Applications},
author = {Elvin Gomez},
year = {2024},
url = {https://github.com/codingmindset/LLMDataLens.git},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_data_lens-0.1.4.tar.gz.
File metadata
- Download URL: llm_data_lens-0.1.4.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.5 Darwin/23.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
442290ddcfc48332bf01c26c126a62a83b6a119942cafa26320fdc6f85f63a54
|
|
| MD5 |
a4ad87f2f962f22d06a644a5b4bb3cfe
|
|
| BLAKE2b-256 |
6908926125641fc6cf331eb2491408342bec597fbc1b53025139360c7326b2ae
|
File details
Details for the file llm_data_lens-0.1.4-py3-none-any.whl.
File metadata
- Download URL: llm_data_lens-0.1.4-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.5 Darwin/23.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ddcee7a0119b5462aba9fecadc4811ad8eb8b06c7ab5dfb22fc3ab8cb13faf2b
|
|
| MD5 |
4a0f6ea55e4c55337183bf0c951038f4
|
|
| BLAKE2b-256 |
641dd4cd8f292d098fbbf0ba8c52a2d45905ebfbf465e95b67a9c9095276cd4a
|