Skip to main content

Advanced relevance scoring for LLM outputs

Project description

LlamaCalc

PyPI version Python Versions [License](https://llamasearch.ai

LlamaCalc is an advanced relevance scoring tool for evaluating how well answers address questions, specifically designed for assessing LLM-generated responses.

![LlamaCalc Demo](https://llamasearch.ai

Features

  • Multi-Factor Scoring: Evaluates answer relevance based on proximity, concept coverage, conciseness, and logical flow
  • MLX Acceleration: Optimized for Apple Silicon using the MLX framework, with fallback to NumPy
  • Command-Line Interface: Interactive and colorful CLI powered by Rich
  • Batch Processing: Evaluate multiple question-answer pairs at once
  • Caching: In-memory and persistent disk caching for faster repeated evaluations
  • Customizable Weights: Fine-tune the importance of different scoring components
  • Python API: Easy integration into your Python applications

Installation

# Basic installation
pip install llamacalc

# With interactive UI features
pip install llamacalc[ui]

# With MLX acceleration (for Apple Silicon)
pip install llamacalc[mlx]

# With development dependencies
pip install llamacalc[dev]

Quick Start

Command Line

# Simple usage
llamacalc --question "What is Python?" --answer "Python is a high-level programming language known for its readability and versatility."

# Interactive mode
llamacalc --interactive

# Batch processing
llamacalc --batch-file qa_pairs.json --json --output-file results.json

# Custom weights
llamacalc -q "What is Python?" -a "Python is a programming language." --proximity-weight 0.4 --coverage-weight 0.3 --conciseness-weight 0.1 --logical-flow-weight 0.2

Python API

from llamacalc import calculate_relevance_score

# Calculate a score
result = calculate_relevance_score(
    question="What is Python?",
    answer="Python is a versatile programming language used in web development, data science, and AI."
)

# Display results
print(f"Total Score: {result.total_score:.2f}")
print(f"Proximity Score: {result.proximity_score:.2f}")
print(f"Coverage Score: {result.coverage_score:.2f}")
print(f"Conciseness Score: {result.conciseness_score:.2f}")
print(f"Logical Flow Score: {result.logical_flow_score:.2f}")

# Custom weights
weights = {
    "proximity": 0.4,
    "coverage": 0.3,
    "conciseness": 0.1,
    "logical_flow": 0.2
}
custom_result = calculate_relevance_score(
    question="What is Python?",
    answer="Python is a programming language.",
    weights=weights
)

Scoring Methodology

LlamaCalc evaluates relevance using four components:

  1. Proximity Score (default weight: 0.35): Measures how directly the answer addresses the question using vector similarity.

  2. Coverage Score (default weight: 0.30): Evaluates how well the answer covers key concepts from the question.

  3. Conciseness Score (default weight: 0.15): Assesses whether the answer is appropriately detailed without being too verbose or too brief.

  4. Logical Flow Score (default weight: 0.20): Analyzes the logical structure and coherence of the answer.

The total score is a weighted sum of these components. Scores range from 0.0 to 1.0, with higher scores indicating better relevance.

Advanced Usage

Batch Processing

Process multiple question-answer pairs:

from llamacalc import batch_calculate_relevance

qa_pairs = [
    ("What is Python?", "Python is a programming language."),
    ("How does a neural network work?", "Neural networks process data through layers of interconnected nodes."),
    # Add more pairs...
]

results = batch_calculate_relevance(qa_pairs, max_workers=4)

for result in results:
    print(f"Score: {result.total_score:.2f} - Q: {result.question[:30]}...")

Using Caching

from llamacalc import calculate_relevance_score
from llamacalc.cache import MemoryCache, DiskCache

# Memory cache (faster, not persistent)
memory_cache = MemoryCache(max_size=1000, ttl=3600)  # 1 hour TTL

# Check cache first
cached_result = memory_cache.get(question, answer)
if cached_result:
    print("Cache hit!")
    result = cached_result
else:
    # Calculate and cache
    result = calculate_relevance_score(question, answer)
    memory_cache.put(result)

# Persistent disk cache
disk_cache = DiskCache(cache_dir="~/.my_app_cache", ttl=86400*7)  # 1 week TTL
disk_cache.put(result)

Performance Optimization

LlamaCalc automatically uses MLX for accelerated computation if available:

from llamacalc.core import HAS_MLX

print(f"Using MLX acceleration: {HAS_MLX}")

For large batches, consider adjusting the number of worker threads:

from llamacalc import batch_calculate_relevance

# Use 8 worker threads for large batches
results = batch_calculate_relevance(large_qa_batch, max_workers=8)

Contributing

Contributions are welcome! Please check our Contributing Guidelines for more details.

License

LlamaCalc is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

  • LlamaCalc is developed by the [LlamaSearch.AI](https://llamasearch.ai team
  • Special thanks to the MLX team for their excellent framework
  • Logo design by LlamaSearch.AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llamacalc_llamasearch-0.1.0.tar.gz (30.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llamacalc_llamasearch-0.1.0-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file llamacalc_llamasearch-0.1.0.tar.gz.

File metadata

  • Download URL: llamacalc_llamasearch-0.1.0.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for llamacalc_llamasearch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9245552f3687a694be2f071d4a30ed2bd70033372527128c84ea69cf2bb8b993
MD5 5efaf9e1905cfb50a7e18eb9e778dac0
BLAKE2b-256 fdb74dc754340ef1c1b909f66588d00362e207e0225f12d452183512e2ba6f79

See more details on using hashes here.

File details

Details for the file llamacalc_llamasearch-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llamacalc_llamasearch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49da0e0b2b48969fc09d4d0d4bccf7feb21812c2d0200926634daa8a5dd8fb64
MD5 24fa2926d08ccff087155a4b3dd7f33a
BLAKE2b-256 a510330f24153e2bafd05fc883eeed41697e4b95460c70fc7621a13fce0020fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page