A Python package for calculating key metrics to assess LLM performance in various tasks, including extracting structured dataa.

These details have not been verified by PyPI

Project description

LLMDataLens

LLMDataLens is a powerful and flexible framework for evaluating LLM-based applications with structured output. It provides a comprehensive suite of tools for assessing the performance of language models across various metrics, with a focus on experiment tracking and reproducibility.

🌟 Features

Structured Output Evaluation: Assess LLM outputs against ground truth data with precision.
Customizable Metrics: Easily define and use custom metrics for comprehensive performance assessment.
Experiment Tracking: Built-in experiment management for reproducibility and comparison.
Prompt Versioning: Keep track of prompt evolution and its impact on model performance.
Model Version Tracking: Monitor performance across different model versions.
Flexible Integration: Seamlessly integrate with existing LLM pipelines and workflows.
Extensible Architecture: Add custom metrics, evaluators, and experiment trackers with ease.

🚀 Quick Start

Installation

Install LLMDataLens directly from PyPI:

pip install llm-data-lens

For development or to get the latest version from the repository:

Clone the repository:

git clone https://github.com/codingmindset/LLMDataLens.git
cd llmdatalens

Install the package using Poetry:
```
poetry install
```

Basic Usage

Here's a simple example to get you started:

from llmdatalens.evaluators import StructuredOutputEvaluator
from llmdatalens.core import LLMOutputData, GroundTruthData
from llmdatalens.core.metrics_registry import MetricNames

# Create an evaluator with specific metrics
evaluator = StructuredOutputEvaluator(
    metrics=[MetricNames.OverallAccuracy, MetricNames.AverageLatency],
    experiment_name="Invoice Processing Experiment"
)

# Add LLM output and ground truth data
llm_output = LLMOutputData(
    raw_output="Processed invoice: $100",
    structured_output={"invoice_amount": 100},
    metadata={
        "model_info": {"name": "GPT-3.5", "version": "1.0"},
        "prompt_info": {"text": "Extract invoice amount:"}
    }
)
ground_truth = GroundTruthData(
    data={"invoice_amount": 100}
)

evaluator.add_llm_output(llm_output, latency=0.5, confidence=0.9)
evaluator.add_ground_truth(ground_truth)

# Evaluate
result = evaluator.evaluate()

# Print results
print(result.metrics)

# Access experiment data
experiment = evaluator.experiment_manager.get_experiment(evaluator.experiment_id)
print(f"Experiment: {experiment.name}")
print(f"Number of runs: {len(experiment.runs)}")
print(f"Prompts used: {len(experiment.prompts)}")
print(f"Models used: {list(experiment.models.keys())}")

📊 Advanced Features

Custom Metrics

Create and register custom metrics easily:

from llmdatalens.core.metrics_registry import register_metric
from llmdatalens.core.enums import MetricField

@register_metric("CustomF1Score", field=MetricField.Accuracy, input_keys=["y_true", "y_pred"])
def calculate_custom_f1_score(y_true, y_pred):
    """ This description will be shown in the metrics registry """
    # (Your custom F1 score calculation here
    pass

Experiment Tracking

Track experiments, prompts, and model versions:

# Get prompt history
prompt_history = evaluator.experiment_manager.get_prompt_history(evaluator.experiment_id)

# Get model history
model_history = evaluator.experiment_manager.get_model_history(evaluator.experiment_id)

# Compare runs
for run in experiment.runs:
    print(f"Run {run.id}: {run.metrics}")

For more detailed examples, check the examples/ directory in the repository. (More examples will be added soon!)

📘 Documentation

(Comming soon!)

🛠️ Project Structure

llmdatalens/
├── src/
│   └── llmdatalens/
│       ├── core/
│       │   ├── base_model.py
│       │   ├── enums.py
│       │   └── metrics_registry.py
│       ├── evaluators/
│       │   └── structured_output_evaluator.py
│       └── experiment/
│           ├── experiment_manager.py
│           └── models.py
├── tests/
│   ├── test_core/
│   ├── test_evaluators/
│   └── test_experiment/
├── examples/
├── docs/
├── pyproject.toml
└── README.md

🤝 Contributing

We welcome contributions to LLMDataLens! Here's how you can help:

Fork the repository
Create a new branch (git checkout -b feature/AmazingFeature)
Make your changes
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Please read our Contributing Guidelines for more details.

📄 License

LLMDataLens is released under the MIT License. See the LICENSE file for details.

📬 Contact

If you have any questions, suggestions, or just want to say hi, feel free to reach out:

Email: elvin@codingmindset.io
X: @codingmindset
GitHub Issues: For bug reports and feature requests

🙏 Acknowledgements

Thanks to all our contributors and users for their valuable feedback and support.
Special thanks to the open-source community for the amazing tools and libraries that made this project possible.

Built with ❤️ by Coding Mindset

Citing LLMDataLens

If you use LLMDataLens in your research, please cite it as follows:

@software{llmdatalens,
  title = {LLMDataLens: A Framework for Evaluating LLM-based Applications},
  author = {Elvin Gomez},
  year = {2024},
  url = {https://github.com/codingmindset/LLMDataLens.git},
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

Sep 12, 2024

This version

0.1.4

Sep 5, 2024

0.1.3

Sep 3, 2024

0.1.2

Sep 3, 2024

0.1.1

Aug 30, 2024

0.1.0

Aug 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_data_lens-0.1.4.tar.gz (15.5 kB view details)

Uploaded Sep 5, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_data_lens-0.1.4-py3-none-any.whl (17.6 kB view details)

Uploaded Sep 5, 2024 Python 3

File details

Details for the file llm_data_lens-0.1.4.tar.gz.

File metadata

Download URL: llm_data_lens-0.1.4.tar.gz
Upload date: Sep 5, 2024
Size: 15.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.5 Darwin/23.1.0

File hashes

Hashes for llm_data_lens-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`442290ddcfc48332bf01c26c126a62a83b6a119942cafa26320fdc6f85f63a54`
MD5	`a4ad87f2f962f22d06a644a5b4bb3cfe`
BLAKE2b-256	`6908926125641fc6cf331eb2491408342bec597fbc1b53025139360c7326b2ae`

See more details on using hashes here.

File details

Details for the file llm_data_lens-0.1.4-py3-none-any.whl.

File metadata

Download URL: llm_data_lens-0.1.4-py3-none-any.whl
Upload date: Sep 5, 2024
Size: 17.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.5 Darwin/23.1.0

File hashes

Hashes for llm_data_lens-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ddcee7a0119b5462aba9fecadc4811ad8eb8b06c7ab5dfb22fc3ab8cb13faf2b`
MD5	`4a0f6ea55e4c55337183bf0c951038f4`
BLAKE2b-256	`641dd4cd8f292d098fbbf0ba8c52a2d45905ebfbf465e95b67a9c9095276cd4a`

See more details on using hashes here.

llm-data-lens 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

LLMDataLens

🌟 Features

🚀 Quick Start

Installation

Basic Usage

📊 Advanced Features

Custom Metrics

Experiment Tracking

📘 Documentation

🛠️ Project Structure

🤝 Contributing

📄 License

📬 Contact

🙏 Acknowledgements

Citing LLMDataLens

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes