Skip to main content

A Python package for calculating key metrics to assess LLM performance in various tasks, including extracting structured dataa.

Project description

LLMDataLens

PyPI version Python Versions License: MIT Documentation Status

LLMDataLens is a powerful and flexible framework for evaluating LLM-based applications with structured output. It provides a comprehensive suite of tools for assessing the performance of language models across various metrics, with a focus on experiment tracking and reproducibility.

๐ŸŒŸ Features

  • Structured Output Evaluation: Assess LLM outputs against ground truth data with precision.
  • Customizable Metrics: Easily define and use custom metrics for comprehensive performance assessment.
  • Experiment Tracking: Built-in experiment management for reproducibility and comparison.
  • Prompt Versioning: Keep track of prompt evolution and its impact on model performance.
  • Model Version Tracking: Monitor performance across different model versions.
  • Flexible Integration: Seamlessly integrate with existing LLM pipelines and workflows.
  • Extensible Architecture: Add custom metrics, evaluators, and experiment trackers with ease.

๐Ÿš€ Quick Start

Installation

Install LLMDataLens directly from PyPI:

pip install llm-data-lens

For development or to get the latest version from the repository:

  1. Clone the repository:

    git clone https://github.com/codingmindset/LLMDataLens.git
    cd llmdatalens
    
  2. Install the package using Poetry:

    poetry install
    

Basic Usage

Here's a simple example to get you started:

from llmdatalens.evaluators import StructuredOutputEvaluator
from llmdatalens.core import LLMOutputData, GroundTruthData
from llmdatalens.core.metrics_registry import MetricNames

# Create an evaluator with specific metrics
evaluator = StructuredOutputEvaluator(
    metrics=[MetricNames.OverallAccuracy, MetricNames.AverageLatency],
    experiment_name="Invoice Processing Experiment"
)

# Add LLM output and ground truth data
llm_output = LLMOutputData(
    raw_output="Processed invoice: $100",
    structured_output={"invoice_amount": 100},
    metadata={
        "model_info": {"name": "GPT-3.5", "version": "1.0"},
        "prompt_info": {"text": "Extract invoice amount:"}
    }
)
ground_truth = GroundTruthData(
    data={"invoice_amount": 100}
)

evaluator.add_llm_output(llm_output, latency=0.5, confidence=0.9)
evaluator.add_ground_truth(ground_truth)

# Evaluate
result = evaluator.evaluate()

# Print results
print(result.metrics)

# Access experiment data
experiment = evaluator.experiment_manager.get_experiment(evaluator.experiment_id)
print(f"Experiment: {experiment.name}")
print(f"Number of runs: {len(experiment.runs)}")
print(f"Prompts used: {len(experiment.prompts)}")
print(f"Models used: {list(experiment.models.keys())}")

๐Ÿ“Š Advanced Features

Custom Metrics

Create and register custom metrics easily:

from llmdatalens.core.metrics_registry import register_metric
from llmdatalens.core.enums import MetricField

@register_metric("CustomF1Score", field=MetricField.Accuracy, input_keys=["y_true", "y_pred"])
def calculate_custom_f1_score(y_true, y_pred):
    """ This description will be shown in the metrics registry """
    # (Your custom F1 score calculation here
    pass

Experiment Tracking

Track experiments, prompts, and model versions:

# Get prompt history
prompt_history = evaluator.experiment_manager.get_prompt_history(evaluator.experiment_id)

# Get model history
model_history = evaluator.experiment_manager.get_model_history(evaluator.experiment_id)

# Compare runs
for run in experiment.runs:
    print(f"Run {run.id}: {run.metrics}")

For more detailed examples, check the examples/ directory in the repository. (More examples will be added soon!)

๐Ÿ“˜ Documentation

(Comming soon!)

๐Ÿ› ๏ธ Project Structure

llmdatalens/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ llmdatalens/
โ”‚       โ”œโ”€โ”€ core/
โ”‚       โ”‚   โ”œโ”€โ”€ base_model.py
โ”‚       โ”‚   โ”œโ”€โ”€ enums.py
โ”‚       โ”‚   โ””โ”€โ”€ metrics_registry.py
โ”‚       โ”œโ”€โ”€ evaluators/
โ”‚       โ”‚   โ””โ”€โ”€ structured_output_evaluator.py
โ”‚       โ””โ”€โ”€ experiment/
โ”‚           โ”œโ”€โ”€ experiment_manager.py
โ”‚           โ””โ”€โ”€ models.py
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_core/
โ”‚   โ”œโ”€โ”€ test_evaluators/
โ”‚   โ””โ”€โ”€ test_experiment/
โ”œโ”€โ”€ examples/
โ”œโ”€โ”€ docs/
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ README.md

๐Ÿค Contributing

We welcome contributions to LLMDataLens! Here's how you can help:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/AmazingFeature)
  3. Make your changes
  4. Commit your changes (git commit -m 'Add some AmazingFeature')
  5. Push to the branch (git push origin feature/AmazingFeature)
  6. Open a Pull Request

Please read our Contributing Guidelines for more details.

๐Ÿ“„ License

LLMDataLens is released under the MIT License. See the LICENSE file for details.

๐Ÿ“ฌ Contact

If you have any questions, suggestions, or just want to say hi, feel free to reach out:

๐Ÿ™ Acknowledgements

  • Thanks to all our contributors and users for their valuable feedback and support.
  • Special thanks to the open-source community for the amazing tools and libraries that made this project possible.

Built with โค๏ธ by Coding Mindset


Citing LLMDataLens

If you use LLMDataLens in your research, please cite it as follows:

@software{llmdatalens,
  title = {LLMDataLens: A Framework for Evaluating LLM-based Applications},
  author = {Elvin Gomez},
  year = {2024},
  url = {https://github.com/codingmindset/LLMDataLens.git},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_data_lens-0.1.3.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_data_lens-0.1.3-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file llm_data_lens-0.1.3.tar.gz.

File metadata

  • Download URL: llm_data_lens-0.1.3.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.5 Darwin/23.1.0

File hashes

Hashes for llm_data_lens-0.1.3.tar.gz
Algorithm Hash digest
SHA256 2213c53c03d979b96dc73c76d2a593d524a5d4f3311b1b46b4700abd43552abc
MD5 3efabe261b9cb9fe63208e833549ea23
BLAKE2b-256 e4c591ceb49d8cd445dae120364d9df98cb2e6613a5e033218bc79be449c8d18

See more details on using hashes here.

File details

Details for the file llm_data_lens-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: llm_data_lens-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.5 Darwin/23.1.0

File hashes

Hashes for llm_data_lens-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b11b1735a990ac92dd9bd607d1e91fd8ef39c3b5253502aa114d5b87a8d84339
MD5 527dc75b71d5b126fca06a2d24d0281a
BLAKE2b-256 0d8c69a418a8c05e424d9cbc8f519e50fde126d75421d9c857c75e788e363ca7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page