Skip to main content

A Python package for calculating key metrics to assess LLM performance in various tasks, including extracting structured dataa.

Project description

LLMDataLens

PyPI version Python Versions License: MIT Documentation Status

LLMDataLens is a powerful and flexible framework for evaluating LLM-based applications with structured output. It provides a comprehensive suite of tools for assessing the performance of language models across various metrics, with a focus on experiment tracking and reproducibility.

๐ŸŒŸ Features

  • Structured Output Evaluation: Assess LLM outputs against ground truth data with precision.
  • Customizable Metrics: Easily define and use custom metrics for comprehensive performance assessment.
  • Experiment Tracking: Built-in experiment management for reproducibility and comparison.
  • Prompt Versioning: Keep track of prompt evolution and its impact on model performance.
  • Model Version Tracking: Monitor performance across different model versions.
  • Flexible Integration: Seamlessly integrate with existing LLM pipelines and workflows.
  • Extensible Architecture: Add custom metrics, evaluators, and experiment trackers with ease.

๐Ÿš€ Quick Start

Installation

Install LLMDataLens directly from PyPI:

pip install llm-data-lens

For development or to get the latest version from the repository:

  1. Clone the repository:

    git clone https://github.com/codingmindset/LLMDataLens.git
    cd llmdatalens
    
  2. Install the package using Poetry:

    poetry install
    

Basic Usage

Here's a simple example to get you started:

from llmdatalens.evaluators import StructuredOutputEvaluator
from llmdatalens.core import LLMOutputData, GroundTruthData
from llmdatalens.core.metrics_registry import MetricNames

# Create an evaluator with specific metrics
evaluator = StructuredOutputEvaluator(
    metrics=[MetricNames.OverallAccuracy, MetricNames.AverageLatency],
    experiment_name="Invoice Processing Experiment"
)

# Add LLM output and ground truth data
llm_output = LLMOutputData(
    raw_output="Processed invoice: $100",
    structured_output={"invoice_amount": 100},
    metadata={
        "model_info": {"name": "GPT-3.5", "version": "1.0"},
        "prompt_info": {"text": "Extract invoice amount:"}
    }
)
ground_truth = GroundTruthData(
    data={"invoice_amount": 100}
)

evaluator.add_llm_output(llm_output, latency=0.5, confidence=0.9)
evaluator.add_ground_truth(ground_truth)

# Evaluate
result = evaluator.evaluate()

# Print results
print(result.metrics)

# Access experiment data
experiment = evaluator.experiment_manager.get_experiment(evaluator.experiment_id)
print(f"Experiment: {experiment.name}")
print(f"Number of runs: {len(experiment.runs)}")
print(f"Prompts used: {len(experiment.prompts)}")
print(f"Models used: {list(experiment.models.keys())}")

๐Ÿ“Š Advanced Features

Custom Metrics

Create and register custom metrics easily:

from llmdatalens.core.metrics_registry import register_metric
from llmdatalens.core.enums import MetricField

@register_metric("CustomF1Score", field=MetricField.Accuracy, input_keys=["y_true", "y_pred"])
def calculate_custom_f1_score(y_true, y_pred):
    """ This description will be shown in the metrics registry """
    # (Your custom F1 score calculation here
    pass

Experiment Tracking

Track experiments, prompts, and model versions:

# Get prompt history
prompt_history = evaluator.experiment_manager.get_prompt_history(evaluator.experiment_id)

# Get model history
model_history = evaluator.experiment_manager.get_model_history(evaluator.experiment_id)

# Compare runs
for run in experiment.runs:
    print(f"Run {run.id}: {run.metrics}")

For more detailed examples, check the examples/ directory in the repository. (More examples will be added soon!)

๐Ÿ“˜ Documentation

(Comming soon!)

๐Ÿ› ๏ธ Project Structure

llmdatalens/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ llmdatalens/
โ”‚       โ”œโ”€โ”€ core/
โ”‚       โ”‚   โ”œโ”€โ”€ base_model.py
โ”‚       โ”‚   โ”œโ”€โ”€ enums.py
โ”‚       โ”‚   โ””โ”€โ”€ metrics_registry.py
โ”‚       โ”œโ”€โ”€ evaluators/
โ”‚       โ”‚   โ””โ”€โ”€ structured_output_evaluator.py
โ”‚       โ””โ”€โ”€ experiment/
โ”‚           โ”œโ”€โ”€ experiment_manager.py
โ”‚           โ””โ”€โ”€ models.py
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_core/
โ”‚   โ”œโ”€โ”€ test_evaluators/
โ”‚   โ””โ”€โ”€ test_experiment/
โ”œโ”€โ”€ examples/
โ”œโ”€โ”€ docs/
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ README.md

๐Ÿค Contributing

We welcome contributions to LLMDataLens! Here's how you can help:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/AmazingFeature)
  3. Make your changes
  4. Commit your changes (git commit -m 'Add some AmazingFeature')
  5. Push to the branch (git push origin feature/AmazingFeature)
  6. Open a Pull Request

Please read our Contributing Guidelines for more details.

๐Ÿ“„ License

LLMDataLens is released under the MIT License. See the LICENSE file for details.

๐Ÿ“ฌ Contact

If you have any questions, suggestions, or just want to say hi, feel free to reach out:

๐Ÿ™ Acknowledgements

  • Thanks to all our contributors and users for their valuable feedback and support.
  • Special thanks to the open-source community for the amazing tools and libraries that made this project possible.

Built with โค๏ธ by Coding Mindset


Citing LLMDataLens

If you use LLMDataLens in your research, please cite it as follows:

@software{llmdatalens,
  title = {LLMDataLens: A Framework for Evaluating LLM-based Applications},
  author = {Elvin Gomez},
  year = {2024},
  url = {https://github.com/codingmindset/LLMDataLens.git},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_data_lens-0.1.2.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_data_lens-0.1.2-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file llm_data_lens-0.1.2.tar.gz.

File metadata

  • Download URL: llm_data_lens-0.1.2.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.5 Darwin/23.1.0

File hashes

Hashes for llm_data_lens-0.1.2.tar.gz
Algorithm Hash digest
SHA256 dc62d107c3b313eb81813ff8cdb3e5e32b9c0c96b8a749e38340f0c5cf24dd1c
MD5 d49b32e9e3d29bbb486dd86612a2b01e
BLAKE2b-256 9d57aad16f7ff38530d86ce78d4f2bb9d9bf82d24b152de53e820eae6da10885

See more details on using hashes here.

File details

Details for the file llm_data_lens-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: llm_data_lens-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.5 Darwin/23.1.0

File hashes

Hashes for llm_data_lens-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9112ad5859bc8db7752d7a279d5bcdb8a36c88b0bbd47d904a84068fe62032cc
MD5 659d949875997e8a63dee4f7075f5f6f
BLAKE2b-256 c98f72434ea939e526ecef74cc8e162a8a09c6a7df6333f6b8d4fe359f78bcf4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page