Skip to main content

Texts content similarity scoring plugin

Project description

PyPI version Python versions Build Status

A pytest plugin for semantic text similarity scoring using Large Language Models (LLMs).

It enables robust assertions over meaning, not surface text, making it ideal for validating LLM outputs, RAG systems, summaries, and other generated content.

The plugin evaluates similarity by prompting an LLM to extract and answer factual questions, producing Precision (Completeness), Recall (Correctness), and F1 scores.


Features

  • Semantic comparison beyond keyword matching

  • Standard IR metrics: F1, Precision, Recall

  • Azure OpenAI support via pytest configuration

  • Readable aliases: completeness ↔ precision, correctness ↔ recall

  • CI-friendly aggregation to reduce LLM variance


Requirements

  • Python >=3.10,<4.0

  • pytest >=8.4.2

  • Azure OpenAI subscription with a deployed model (e.g., GPT-4)


Installation

Install from PyPI:

pip install pytest-texts-score

Configuration

Configuration is provided via pytest.ini or overridden with CLI arguments.

Required settings

  • llm-api-key — Azure OpenAI API key

  • llm-endpoint — Azure OpenAI resource endpoint

  • llm-api-version — API version (e.g. 2024-05-01)

  • llm-deployment — Deployment name

  • llm-model — Model identifier (e.g. gpt-4)

Optional settings

  • llm-max-tokens — Maximum response tokens (default: 8192)

Example pytest.ini

[pytest]
llm_api_key = YOUR_API_KEY
llm_endpoint = https://your-resource.openai.azure.com/
llm_api_version = 2024-05-01
llm_deployment = your-deployment
llm_model = gpt-4
llm_max_tokens = 8192

Override any value via CLI:

pytest --llm-temperature=0.5

Usage

You can use the plugin either by direct imports or via the ``texts_score`` fixture.

Direct import

from pytest_texts_score import texts_expect_f1_equal

def test_similarity():
    expected = "The quick brown fox jumps over a dog."
    actual = "A fast brown fox leaps over a dog."

   exts_expect_f1_equal(expected, actual, 1.0)

Fixture-based usage

The texts_score fixture exposes all assertion helpers in a dictionary.

def test_similarity(texts_score):
    expected = "The quick brown fox jumps over a dog."
    actual = "A fast brown fox leaps over a dog."

   texts_score["expect_f1_equal"](expected, actual, 1.0)

Documentation

Documentation is availbe at documentation


Available Assertions

Metrics overview

  • Recall (Correctness) Measures how much information from the expected text is present in the given text.

  • Precision (Completeness) Measures how much information in the given text is supported by the expected text.

  • F1 score Harmonic mean of precision and recall.


Single-run assertions

These execute one LLM evaluation. *_equal variants are convenience wrappers around *_range.

▶ F1 score

  • texts_expect_f1_equal

  • texts_expect_f1_range

▶ Precision / Completeness

  • texts_expect_precision_equal

  • texts_expect_precision_range

  • texts_expect_completeness_equal (alias)

  • texts_expect_completeness_range (alias)

▶ Recall / Correctness

  • texts_expect_recall_equal

  • texts_expect_recall_range

  • texts_expect_correctness_equal (alias)

  • texts_expect_correctness_range (alias)


Aggregated assertions

These perform multiple evaluations and aggregate the result. Recommended for CI/CD pipelines to reduce LLM nondeterminism.

Supported aggregations: min, max, median, mean / average.

▶ F1 score

  • texts_agg_f1_min

  • texts_agg_f1_max

  • texts_agg_f1_median

  • texts_agg_f1_mean

  • texts_agg_f1_average

▶ Precision / Completeness

  • texts_agg_precision_min

  • texts_agg_precision_max

  • texts_agg_precision_median

  • texts_agg_precision_mean

  • texts_agg_precision_average

  • texts_agg_completeness_min

  • texts_agg_completeness_max

  • texts_agg_completeness_median

  • texts_agg_completeness_mean

  • texts_agg_completeness_average

▶ Recall / Correctness

  • texts_agg_recall_min

  • texts_agg_recall_max

  • texts_agg_recall_median

  • texts_agg_recall_mean

  • texts_agg_recall_average

  • texts_agg_correctness_min

  • texts_agg_correctness_max

  • texts_agg_correctness_median

  • texts_agg_correctness_mean

  • texts_agg_correctness_average


License

Distributed under the terms of the MIT license.


Issues & Support

Please report bugs or feature requests via the GitHub issue tracker: file an issue


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_texts_score-1.0.0.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytest_texts_score-1.0.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file pytest_texts_score-1.0.0.tar.gz.

File metadata

  • Download URL: pytest_texts_score-1.0.0.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pytest_texts_score-1.0.0.tar.gz
Algorithm Hash digest
SHA256 51fc7a9f2d8fcc25be53fbfbf6f2f1345695d2c5a1a19ed0356f0baa9cda4baa
MD5 bc96006cb311fcd2f8ad4361c5ee6035
BLAKE2b-256 c04586db9e1f2bfdb41f7d33342e1cb843079ed765d63538690ec94803d0da81

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_texts_score-1.0.0.tar.gz:

Publisher: pypi.yml on VodilaPat/pytest-texts-score

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pytest_texts_score-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pytest_texts_score-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b7bc40bcc48611d0fce00330d0480beefa2adc9e7fb7dc504807968c75d684b9
MD5 ee0a44470b6bf1efce0dc1afdabebaa1
BLAKE2b-256 0b3316f0637c1a7c788ca2f900989ebeaf51378a05abec077f66997332ad09b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_texts_score-1.0.0-py3-none-any.whl:

Publisher: pypi.yml on VodilaPat/pytest-texts-score

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page