Skip to main content

LLM-based summary quality evaluation

Project description

assert-eval

LLM-based summary quality evaluation.

Scores a summary against source text for coverage, factual accuracy, alignment, and topic preservation. No PyTorch, no BERT, no heavy dependencies.

Installation

pip install assert-eval

Quick Start

from assert_eval import evaluate_summary, LLMConfig

config = LLMConfig(
    provider="bedrock",
    model_id="us.amazon.nova-pro-v1:0",
    region="us-east-1",
)

results = evaluate_summary(
    full_text="Original long text goes here...",
    summary="Summary to evaluate goes here...",
    metrics=["coverage", "factual_consistency", "factual_alignment", "topic_preservation"],
    llm_config=config,
)

print(results)
# {'coverage': 0.85, 'factual_consistency': 0.92, 'factual_alignment': 0.88, 'topic_preservation': 0.90}

Available Metrics

Metric Description
coverage What % of source document claims appear in the summary (recall/completeness)
factual_consistency What % of summary claims are supported by the source (precision/accuracy)
factual_alignment F1 score combining coverage and factual_consistency
topic_preservation How well the main topics from the source are preserved in the summary

Custom Evaluation Instructions

Tailor the LLM's evaluation criteria for your domain:

results = evaluate_summary(
    full_text=text,
    summary=summary,
    metrics=["coverage", "factual_consistency"],
    llm_config=config,
    custom_prompt_instructions={
        "coverage": "Apply strict standards. Only mark a claim as covered if it is clearly and explicitly represented.",
        "factual_consistency": "Flag any claim that adds detail not present in the original text.",
    },
)

Verbose Output

Pass verbose=True to include per-claim LLM reasoning in the results:

results = evaluate_summary(
    full_text=text,
    summary=summary,
    metrics=["coverage", "factual_consistency"],
    llm_config=config,
    verbose=True,
)

LLM Configuration

from assert_eval import LLMConfig

# AWS Bedrock (uses ~/.aws credentials by default)
config = LLMConfig(
    provider="bedrock",
    model_id="us.amazon.nova-pro-v1:0",
    region="us-east-1",
)

# AWS Bedrock with explicit credentials
config = LLMConfig(
    provider="bedrock",
    model_id="us.amazon.nova-pro-v1:0",
    region="us-east-1",
    api_key="your-aws-access-key-id",
    api_secret="your-aws-secret-access-key",
    aws_session_token="your-session-token",  # optional
)

# OpenAI
config = LLMConfig(
    provider="openai",
    model_id="gpt-4o",
    api_key="your-openai-api-key",
)

Supported Bedrock Model Families

Model Family Example Model IDs
Amazon Nova us.amazon.nova-pro-v1:0, amazon.nova-lite-v1:0
Anthropic Claude anthropic.claude-3-sonnet-20240229-v1:0
Meta Llama meta.llama3-70b-instruct-v1:0
Mistral AI mistral.mistral-large-2402-v1:0
Cohere Command cohere.command-r-plus-v1:0
AI21 Labs ai21.jamba-1-5-large-v1:0

Proxy Configuration

# Single proxy
config = LLMConfig(
    provider="bedrock", model_id="us.amazon.nova-pro-v1:0", region="us-east-1",
    proxy_url="http://proxy.example.com:8080",
)

# Protocol-specific proxies
config = LLMConfig(
    provider="bedrock", model_id="us.amazon.nova-pro-v1:0", region="us-east-1",
    http_proxy="http://proxy.example.com:8080",
    https_proxy="http://proxy.example.com:8443",
)

# Authenticated proxy
config = LLMConfig(
    provider="bedrock", model_id="us.amazon.nova-pro-v1:0", region="us-east-1",
    proxy_url="http://username:password@proxy.example.com:8080",
)

Standard HTTP_PROXY / HTTPS_PROXY environment variables are also respected.

Dependencies

  • assert-core — shared LLM provider layer (AWS Bedrock, OpenAI)

Migrating from assert_llm_tools

assert-eval replaces the summary evaluation functionality of assert_llm_tools, which is now deprecated. The API is largely the same — swap the import:

# Before
from assert_llm_tools import evaluate_summary, LLMConfig

# After
from assert_eval import evaluate_summary, LLMConfig

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assert_eval-0.1.3.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

assert_eval-0.1.3-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file assert_eval-0.1.3.tar.gz.

File metadata

  • Download URL: assert_eval-0.1.3.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for assert_eval-0.1.3.tar.gz
Algorithm Hash digest
SHA256 61d5f25c428952e2ca388fc0ce9ab8df10ce9ba685da98f8bc40cf1f49d0176b
MD5 dc8084eb81146e2c659f644a02689453
BLAKE2b-256 364248371e4961178977ee8bd9594d11c18da878b00037f1a6b3277ea610e00f

See more details on using hashes here.

Provenance

The following attestation bundles were made for assert_eval-0.1.3.tar.gz:

Publisher: publish-assert-eval.yml on charliedouglas/assert_llm_tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file assert_eval-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: assert_eval-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for assert_eval-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b9e7e2c5d6c412935b930002dbb60816568e779ede69513a9bc3a9059a0afd4f
MD5 3648d0ddee521e5616105f17ef2eac82
BLAKE2b-256 7c458bfa0c8c662ab9ff71a1fca3219cfe0710b043f717ae5c7e0d7d6420f96c

See more details on using hashes here.

Provenance

The following attestation bundles were made for assert_eval-0.1.3-py3-none-any.whl:

Publisher: publish-assert-eval.yml on charliedouglas/assert_llm_tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page