Skip to main content

LLM-based summary quality evaluation

Project description

assert-eval

LLM-based summary quality evaluation.

Scores a summary against source text for coverage, factual accuracy, alignment, and topic preservation. No PyTorch, no BERT, no heavy dependencies.

Installation

pip install assert-eval

Quick Start

from assert_eval import evaluate_summary, LLMConfig

config = LLMConfig(
    provider="bedrock",
    model_id="us.amazon.nova-pro-v1:0",
    region="us-east-1",
)

results = evaluate_summary(
    full_text="Original long text goes here...",
    summary="Summary to evaluate goes here...",
    metrics=["coverage", "factual_consistency", "factual_alignment", "topic_preservation"],
    llm_config=config,
)

print(results)
# {'coverage': 0.85, 'factual_consistency': 0.92, 'factual_alignment': 0.88, 'topic_preservation': 0.90}

Available Metrics

Metric Description
coverage What % of source document claims appear in the summary (recall/completeness)
factual_consistency What % of summary claims are supported by the source (precision/accuracy)
factual_alignment F1 score combining coverage and factual_consistency
topic_preservation How well the main topics from the source are preserved in the summary

Custom Evaluation Instructions

Tailor the LLM's evaluation criteria for your domain:

results = evaluate_summary(
    full_text=text,
    summary=summary,
    metrics=["coverage", "factual_consistency"],
    llm_config=config,
    custom_prompt_instructions={
        "coverage": "Apply strict standards. Only mark a claim as covered if it is clearly and explicitly represented.",
        "factual_consistency": "Flag any claim that adds detail not present in the original text.",
    },
)

Verbose Output

Pass verbose=True to include per-claim LLM reasoning in the results:

results = evaluate_summary(
    full_text=text,
    summary=summary,
    metrics=["coverage", "factual_consistency"],
    llm_config=config,
    verbose=True,
)

LLM Configuration

from assert_eval import LLMConfig

# AWS Bedrock (uses ~/.aws credentials by default)
config = LLMConfig(
    provider="bedrock",
    model_id="us.amazon.nova-pro-v1:0",
    region="us-east-1",
)

# AWS Bedrock with explicit credentials
config = LLMConfig(
    provider="bedrock",
    model_id="us.amazon.nova-pro-v1:0",
    region="us-east-1",
    api_key="your-aws-access-key-id",
    api_secret="your-aws-secret-access-key",
    aws_session_token="your-session-token",  # optional
)

# OpenAI
config = LLMConfig(
    provider="openai",
    model_id="gpt-4o",
    api_key="your-openai-api-key",
)

Supported Bedrock Model Families

Model Family Example Model IDs
Amazon Nova us.amazon.nova-pro-v1:0, amazon.nova-lite-v1:0
Anthropic Claude anthropic.claude-3-sonnet-20240229-v1:0
Meta Llama meta.llama3-70b-instruct-v1:0
Mistral AI mistral.mistral-large-2402-v1:0
Cohere Command cohere.command-r-plus-v1:0
AI21 Labs ai21.jamba-1-5-large-v1:0

Proxy Configuration

# Single proxy
config = LLMConfig(
    provider="bedrock", model_id="us.amazon.nova-pro-v1:0", region="us-east-1",
    proxy_url="http://proxy.example.com:8080",
)

# Protocol-specific proxies
config = LLMConfig(
    provider="bedrock", model_id="us.amazon.nova-pro-v1:0", region="us-east-1",
    http_proxy="http://proxy.example.com:8080",
    https_proxy="http://proxy.example.com:8443",
)

# Authenticated proxy
config = LLMConfig(
    provider="bedrock", model_id="us.amazon.nova-pro-v1:0", region="us-east-1",
    proxy_url="http://username:password@proxy.example.com:8080",
)

Standard HTTP_PROXY / HTTPS_PROXY environment variables are also respected.

Dependencies

  • assert-core — shared LLM provider layer (AWS Bedrock, OpenAI)

Migrating from assert_llm_tools

assert-eval replaces the summary evaluation functionality of assert_llm_tools, which is now deprecated. The API is largely the same — swap the import:

# Before
from assert_llm_tools import evaluate_summary, LLMConfig

# After
from assert_eval import evaluate_summary, LLMConfig

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assert_eval-0.1.4.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

assert_eval-0.1.4-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file assert_eval-0.1.4.tar.gz.

File metadata

  • Download URL: assert_eval-0.1.4.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for assert_eval-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c32e65e3926408bfc1a23629041648c419c55fbec42347e8e2abecc8e893039f
MD5 b436dc6b72e2ec5b9f80f62f73149fe3
BLAKE2b-256 b41413eb33893b0babe3473fed84f428b6cfd030887a090a6f06a52d1aa8d1ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for assert_eval-0.1.4.tar.gz:

Publisher: publish-assert-eval.yml on charliedouglas/assert_llm_tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file assert_eval-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: assert_eval-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for assert_eval-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6b760d685ceaf6c5e80fb8f5f87b2c9512b97a1e391f61e59aa2af83da6cd6ba
MD5 28633331523c35bf5e8e044c3f1db4e6
BLAKE2b-256 17591f68cf5842d723e08e6f8ab6c435103550bfc2755f543663744d6a95a230

See more details on using hashes here.

Provenance

The following attestation bundles were made for assert_eval-0.1.4-py3-none-any.whl:

Publisher: publish-assert-eval.yml on charliedouglas/assert_llm_tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page