Texts content similarity scoring plugin
Project description
A pytest plugin for semantic text similarity scoring using Large Language Models (LLMs).
It enables robust assertions over meaning, not surface text, making it ideal for validating LLM outputs, RAG systems, summaries, and other generated content.
The plugin evaluates similarity by prompting an LLM to extract and answer factual questions, producing Precision (Completeness), Recall (Correctness), and F1 scores.
Features
✔ Semantic comparison beyond keyword matching
✔ Standard IR metrics: F1, Precision, Recall
✔ Azure OpenAI support via pytest configuration
✔ Readable aliases: completeness ↔ precision, correctness ↔ recall
✔ CI-friendly aggregation to reduce LLM variance
Requirements
Python >=3.10,<4.0
pytest >=8.4.2
Azure OpenAI subscription with a deployed model (e.g., GPT-4)
Installation
Install from PyPI:
pip install pytest-texts-score
Configuration
Configuration is provided via pytest.ini or overridden with CLI arguments.
Required settings
llm-api-key — Azure OpenAI API key
llm-endpoint — Azure OpenAI resource endpoint
llm-api-version — API version (e.g. 2024-05-01)
llm-deployment — Deployment name
llm-model — Model identifier (e.g. gpt-4)
Optional settings
llm-max-tokens — Maximum response tokens (default: 8192)
Example pytest.ini
[pytest]
llm_api_key = YOUR_API_KEY
llm_endpoint = https://your-resource.openai.azure.com/
llm_api_version = 2024-05-01
llm_deployment = your-deployment
llm_model = gpt-4
llm_max_tokens = 8192
Override any value via CLI:
pytest --llm-temperature=0.5
Usage
You can use the plugin either by direct imports or via the ``texts_score`` fixture.
Direct import
from pytest_texts_score import texts_expect_f1_equal
def test_similarity():
expected = "The quick brown fox jumps over a dog."
actual = "A fast brown fox leaps over a dog."
exts_expect_f1_equal(expected, actual, 1.0)
Fixture-based usage
The texts_score fixture exposes all assertion helpers in a dictionary.
def test_similarity(texts_score):
expected = "The quick brown fox jumps over a dog."
actual = "A fast brown fox leaps over a dog."
texts_score["expect_f1_equal"](expected, actual, 1.0)
Documentation
Documentation is availbe at documentation
Available Assertions
Metrics overview
Recall (Correctness) Measures how much information from the expected text is present in the given text.
Precision (Completeness) Measures how much information in the given text is supported by the expected text.
F1 score Harmonic mean of precision and recall.
Single-run assertions
These execute one LLM evaluation. *_equal variants are convenience wrappers around *_range.
▶ F1 score
texts_expect_f1_equal
texts_expect_f1_range
▶ Precision / Completeness
texts_expect_precision_equal
texts_expect_precision_range
texts_expect_completeness_equal (alias)
texts_expect_completeness_range (alias)
▶ Recall / Correctness
texts_expect_recall_equal
texts_expect_recall_range
texts_expect_correctness_equal (alias)
texts_expect_correctness_range (alias)
Aggregated assertions
These perform multiple evaluations and aggregate the result. Recommended for CI/CD pipelines to reduce LLM nondeterminism.
Supported aggregations: min, max, median, mean / average.
▶ F1 score
texts_agg_f1_min
texts_agg_f1_max
texts_agg_f1_median
texts_agg_f1_mean
texts_agg_f1_average
▶ Precision / Completeness
texts_agg_precision_min
texts_agg_precision_max
texts_agg_precision_median
texts_agg_precision_mean
texts_agg_precision_average
texts_agg_completeness_min
texts_agg_completeness_max
texts_agg_completeness_median
texts_agg_completeness_mean
texts_agg_completeness_average
▶ Recall / Correctness
texts_agg_recall_min
texts_agg_recall_max
texts_agg_recall_median
texts_agg_recall_mean
texts_agg_recall_average
texts_agg_correctness_min
texts_agg_correctness_max
texts_agg_correctness_median
texts_agg_correctness_mean
texts_agg_correctness_average
License
Distributed under the terms of the MIT license.
Issues & Support
Please report bugs or feature requests via the GitHub issue tracker: file an issue
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_texts_score-1.0.0.tar.gz.
File metadata
- Download URL: pytest_texts_score-1.0.0.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51fc7a9f2d8fcc25be53fbfbf6f2f1345695d2c5a1a19ed0356f0baa9cda4baa
|
|
| MD5 |
bc96006cb311fcd2f8ad4361c5ee6035
|
|
| BLAKE2b-256 |
c04586db9e1f2bfdb41f7d33342e1cb843079ed765d63538690ec94803d0da81
|
Provenance
The following attestation bundles were made for pytest_texts_score-1.0.0.tar.gz:
Publisher:
pypi.yml on VodilaPat/pytest-texts-score
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_texts_score-1.0.0.tar.gz -
Subject digest:
51fc7a9f2d8fcc25be53fbfbf6f2f1345695d2c5a1a19ed0356f0baa9cda4baa - Sigstore transparency entry: 763529654
- Sigstore integration time:
-
Permalink:
VodilaPat/pytest-texts-score@971593a83d70c57adc05b7ecdd8df2655a6fedbb -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/VodilaPat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@971593a83d70c57adc05b7ecdd8df2655a6fedbb -
Trigger Event:
push
-
Statement type:
File details
Details for the file pytest_texts_score-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pytest_texts_score-1.0.0-py3-none-any.whl
- Upload date:
- Size: 23.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7bc40bcc48611d0fce00330d0480beefa2adc9e7fb7dc504807968c75d684b9
|
|
| MD5 |
ee0a44470b6bf1efce0dc1afdabebaa1
|
|
| BLAKE2b-256 |
0b3316f0637c1a7c788ca2f900989ebeaf51378a05abec077f66997332ad09b3
|
Provenance
The following attestation bundles were made for pytest_texts_score-1.0.0-py3-none-any.whl:
Publisher:
pypi.yml on VodilaPat/pytest-texts-score
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_texts_score-1.0.0-py3-none-any.whl -
Subject digest:
b7bc40bcc48611d0fce00330d0480beefa2adc9e7fb7dc504807968c75d684b9 - Sigstore transparency entry: 763529657
- Sigstore integration time:
-
Permalink:
VodilaPat/pytest-texts-score@971593a83d70c57adc05b7ecdd8df2655a6fedbb -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/VodilaPat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@971593a83d70c57adc05b7ecdd8df2655a6fedbb -
Trigger Event:
push
-
Statement type: