Skip to main content

Package for RHI Metrics evaluation

Project description

Evaluation of Relation Hallucination in Abstractive Summarization:

Features

1. Relation Extraction for Varied Models

Comprehensive SVO Relation Extraction

  • Extracts Subject-Verb-Object (SVO) relations from:
    • Input Text (InputText)
    • Reference Summaries (ReferenceSummary)
    • Model-Generated Summaries for varied models.
  • Includes confidence levels for each extracted relation to ensure reliability.

2. Computing Hallucination Factors

Advanced Hallucination Metrics

  • Evaluates hallucination tendencies in model-generated summaries based on:
    • Relation overlaps between InputText, ReferenceSummary, and model summaries.
  • Computes custom hallucination metrics dynamically for each model:
    • Precision
    • Recall
    • F1 Score
    • 6 Hallucination metrics namely Extractiveness factor , Negative and Postive Hallucinations , Lost Hallucinations and Lost Focus, Overfocus factor

Relation Matching

  • Converts extracted SVO relations into structured tuples to facilitate direct comparison.
  • Measures hallucination factors based on relation alignment or mismatch.

3. Additional Highlights

Multi-Model Compatibility

  • Easily integrates summaries from multiple models in a single dataset.
  • Provides comparative metrics for each model’s output.

Customizable Field Mapping

  • Allows flexible mapping of fields for datasets with varying structures.
  • Ensures compatibility with diverse data formats.

Efficient Processing

  • Handles both single-entry and batch processing.
  • Scalable for datasets of any size, from small samples to large datasets.

Usage

This framework and package is designed for researchers and developers working on abstractive summarization and hallucination analysis, offering robust tools for multi-model evaluation and relation extraction in text summarization models.

Installation

Please install our package using following command:

pip install RHI-Metrics

Functionality 1:

Processing the single data and batch data for Relation Hallucination in text summarization models using our pacakage:

package name: process_text_and_compute_metrics

How to import?

from RHI_Metrics import process_text_and_compute_metrics

What feature and functionality this provides?

  • Extract SVO Relations: Extracts Subject-Verb-Object relations with confidence scores from input text, reference summary, and model-generated summaries.
  • Compute ROUGE Scores: Computes ROUGE-1, ROUGE-2, and ROUGE-L scores for model-generated summaries against the reference summary.
  • Compute Hallucination Metrics: Computes metrics related to hallucinations, including precision, recall, F1 score, 6 factors namely Extractiveness factor , Negative and Postive Hallucinations , Lost Hallucinations and Lost Focus, Overfocus factor respectively and more, by comparing generated and reference SVO relations.
  • Supports Single and Batch Processing: Can process individual entries or a list of entries in batch mode.
  • Customizable Field Mapping: Allows customization of field names for input text and reference summary.

Installation

These are the packages need to be installed:

pip install nltk rouge-score rouge-score numpy pandas scipy transformers matplotlib seaborn streamlit

How to Use?

** Function: process_text_and_compute_metrics **

This function processes JSON data to extract relations, compute ROUGE scores, and hallucination metrics. It works for both single-entry and batch processing.

Function Strcture:

def process_text_and_compute_metrics(

data: Union[Dict, List[Dict]],

batch_mode=True,

field_mapping: Dict = None

) -> List[Dict]:

Arguments:

  • data (Union[Dict, List[Dict]]):
    • For batch processing (batch_mode=True), provide a list of dictionaries where each dictionary represents an entry.
    • For single-entry processing (batch_mode=False), provide a single dictionary.
  • batch_mode (bool, default=True):
    • If True, processes multiple entries.
    • If False, processes a single entry.
  • field_mapping (Dict, optional):
    • A dictionary to map custom field names for input text and reference summary.
      Example: {"input_text": "textInput", "reference_summary": "summaryReference"}.

Returns:

  • List[Dict]: A list of processed entries containing extracted relations, ROUGE scores, and hallucination metrics for each model.

Example

Single Entry Processing:

from RHI_Metrics import process_text_and_compute_metrics

data = {
    "textInput": "This is an input text.",
    "summaryReference": "This is a reference summary.",
    "facebook/bart-large-cnn": "Generated summary by BART.",
    "google/pegasus-xsum": "Generated summary by Pegasus."
}

field_mapping = {
    "input_text": "textInput",
    "reference_summary": "summaryReference"
}

result = process_text_and_compute_metrics(data, batch_mode=False, field_mapping=field_mapping)
print(result)

For Batch Processing here is an example below:

json_data = [
    {
        "textInput": "This is input text for the first entry.",
        "summaryReference": "This is the reference summary for the first entry.",
        "facebook/bart-large-cnn": "Generated summary by BART.",
    },
    {
        "content": "This is input text for the second entry.",
        "ref_summary": "This is the reference summary for the second entry.",
        "google/pegasus-xsum": "Generated summary by Pegasus.",
    }
]

field_mapping = {
    "input_text": "textInput",
    "reference_summary": "summaryReference"
}

result = process_text_and_compute_metrics(json_data, batch_mode=True, field_mapping=field_mapping)
print(result)

Customizing Field Names

If your JSON data or your data uses different field names for input text and reference summary, you can provide a custom field mapping dictionary.

field_mapping = {
    "input_text": "customInputTextField",
    "reference_summary": "customReferenceSummaryField"
}

Sample Data Format

{
  "0": {
    "Id": "10157432",
    "dataset": "xlsum",
    "InputText": "The announcement makes Italy the latest eurozone country to announce cuts in an effort to reduce the gap between spending and earnings. The UK and Danish governments also this week announced plans to curb spending. Italy will take measures to reduce public sector pay and will put a freeze on new recruitment. Public sector pensions and local government spending are also expected to be hit. Added to these, a clampdown on tax avoidance is also planned. The cuts are equal to some 1.6% of gross domestic product (GDP). Similar reductions in spending measures have already been announced by Greece, Spain and Portugal. Heavy price Some Italian workers have already been out protesting. In Rome, workers at the Italian Institute for the Professional Development of Vocational Training of Workers (Isfol) held protests against the cuts at their headquarters. One worker, Simone Casadei, said the public sector had already paid a heavy price. The sector of public research has already paid its toll and suffered cuts in the past, he said. So we are asking for our sector to be left out of the new budget cuts. He added that the money should be raised by getting tough on tax evasion. We also demand that the money needed to face this problems... is obtained through a tough action against tax evasion. The state cannot always take the money from the same sources, that is workers and pensioners. The government hopes to bring its deficit down to below 3% of GDP by 2012 - from 5.3% now - in order to help maintain the confidence of international investors and prevent the spread of a Greek-style debt crisis. Concerns over the level of Greece's debts have led to an effective boycott of Greek debt on the world markets, and have spilled over into deepening worries about the other weakest members of the 16-nation eurozone: Spain, Portugal and Italy.",
    "ReferenceSummary": "The Italian government has approved austerity measures worth 24 billion euros (\u00a320bn; $29bn) for the years 2011-2012.",
    "facebook/bart-large-cnn": "Italy will take measures to reduce public sector pay and will put a freeze on new recruitment. Public sector pensions and local government spending are also expected to be hit. The cuts are equal to some 1.6% of gross domestic product (GDP) Similar reductions in spending measures have already been announced by Greece, Spain and Portugal.",
    "google/pegasus-xsum": "The Italian government has announced plans to cut public sector spending by 20bn (\u00a314bn; $22bn) over the next two years.",
    "t5-large": "the cuts are equal to some 1.6% of gross domestic product (gdp) similar reductions in spending measures have already been announced by Greece, Spain and Portugal. the government hopes to bring its deficit down to below 3% of GDP by 2012.",
    "gpt-3.5-turbo": "Italy has announced public sector spending cuts of 1.6% of GDP in an attempt to reduce the gap between spending and earnings and to bring its deficit down to below 3% of GDP by 2012. Measures will include a reduction in public sector pay, a freeze on new recruitment, a clampdown on tax avoidance as well as cuts to public sector pensions and local government spending. Italy becomes the latest eurozone country to announce such cuts after similar moves made by Greece, Spain and Portugal. Although the government hopes to prevent the spread of a Greek-style debt crisis, the proposed austerity measures have already drawn protests from some public sector workers.",
    "RefSum": "The Italian government has approved austerity measures worth 24 billion euros (\u00a320bn; $29bn) for the years 2011-2012."
  }
}

Explanation of Fields:

  • Id: Unique identifier for the data sample.
  • dataset: The dataset source (e.g., xlsum).
  • InputText: The input article or passage used as the base for summarization.
  • ReferenceSummary: The gold-standard reference summary.
  • facebook/bart-large-cnn, google/pegasus-xsum, t5-large, gpt-3.5-turbo: Summaries generated by different models.
  • RefSum: Alternative representation of the reference summary (same as ReferenceSummary in this example).

Computation of Hallucination metrics Separately Using _ calculate_hallucination_factors_:

Package name: calculate_hallucination_factors

Function Parameters:

pred_words: Tokenized predicted (generated) words. ref_words: Tokenized reference summary words. inp_words: Tokenized input text words.

input_relations: SVO relations extracted from the input text. ref_relations: SVO relations extracted from the reference summary. model_relations: SVO relations extracted from the predicted summary.

Function Output: Returns a dictionary containing:

Metrics:

precision: Precision of the generated relations. recall: Recall of the generated relations. f1_score: F1 score for the generated relations. ef (Extractiveness Factor): Measures overlap between input, reference, and generated relations. ph (Positive Hallucination): Evaluates agreement between reference and generated relations. of (Over Focus): Quantifies excessive focus on input relations. nh (Negative Hallucination): Reflects spurious content in generated relations. lf (Lost Focus): Indicates missing content from reference relations. lh (Lost Hallucination): Highlights divergence between input and generated relations. rhi (Relation Hallucination Index): A composite metric summarizing hallucination behavior. Intersection Counts:

I_intersect_R_count: Overlap between input and reference relatzions. I_intersect_G_count: Overlap between input and generated relations. R_intersect_G_count: Overlap between reference and generated relations. I_intersect_R_intersect_G_count: Triple overlap between input, reference, and generated relations.

Set Lengths:

lenI: Length of input relations set. lenR: Length of reference relations set. lenG: Length of generated relations set.

from RHI_Metrics import calculate_hallucination_factors
pred_words=["The", "Eiffel", "Tower", "is", "a", "landmark"],
ref_words=["Eiffel", "Tower", "is", "a", "famous", "landmark"],
inp_words=["The", "Eiffel", "Tower", "is", "a", "famous", "site"],
input_relations=[("Eiffel Tower", "is", "landmark")],
ref_relations=[("Eiffel Tower", "is", "famous landmark")],
model_relations=[("Eiffel Tower", "is", "landmark")]

metrics = calculate_hallucination_factors(pred_words, ref_words, inp_words, input_relations, ref_relations, model_relations)

print(metrics)

Extracting Relations for Any Given Text

This package includes a powerful algorithm to extract Subject-Verb-Object (SVO) relations from any text with confidence scores. You can use the following function to extract these relations:

Function Name:

extract_relations_and_svo_with_confidence_score(input_text)

Parameters:

  • text: The input text for which you want to extract SVO relations.
  • confidence_threshold: (Optional, default = 0.5) The minimum confidence score required for a relation to be included in the output.

Usage:

Pass your input text and the desired confidence threshold to the function to extract relations.

Example:

from RHI_Metrics import extract_relations_and_svo_with_confidence_score

# Input text
text = "The Italian government has approved austerity measures worth 24 billion euros for the years 2011-2012."

# Extract relations with a confidence threshold of 0.6
relations = extract_relations_and_svo_with_confidence_score(text, confidence_threshold=0.6)

# Sample Output
print(relations)

```
[
{"subject": "Italian government", "verb": "approved", "object": "austerity measures", "confidence": 0.8},
{"subject": "austerity measures", "verb": "worth", "object": "24 billion euros", "confidence": 0.75}
]

Function for Rouge score:

from RHI_Metrics import compute_all_rouge_scores

If you want calculate only rouge score for your data then use function compute_all_rouge_scores(predicted, reference)

pass the function with generated summary and reference text

Conclusion

This package serves as a comprehensive tool for evaluating relation hallucination in abstractive summarization models. By focusing on Subject-Verb-Object (SVO) relations, it provides detailed metrics to measure and compare model behavior, helping researchers identify and mitigate hallucination issues in generated summaries.

The flexibility of batch processing, customizable field mappings, and compatibility with multiple models makes it suitable for handling diverse datasets and research requirements. With features like precision metrics, hallucination factors, and multi-model analysis, this package offers valuable insights for improving the reliability and quality of abstractive text summarization systems.

Contributions, suggestions, and enhancements are always welcome. Together, we can refine and expand the capabilities of this package for the benefit of the research community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rhi_metrics-0.0.2.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

RHI_Metrics-0.0.2-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file rhi_metrics-0.0.2.tar.gz.

File metadata

  • Download URL: rhi_metrics-0.0.2.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.1

File hashes

Hashes for rhi_metrics-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4a3cbf0bf8cc3ee20e6868ede350911778f91ae1446dcda718125df209a2f57b
MD5 e3a0f62dc8a4c2e5093f52ccbd31b2fd
BLAKE2b-256 b98bcf87b6d9a13b265127ee30746b94a59e21a7efbac763776f971c86d75ff7

See more details on using hashes here.

File details

Details for the file RHI_Metrics-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: RHI_Metrics-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.1

File hashes

Hashes for RHI_Metrics-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6fc4a23053ea4b1f17b768a4d5654de9022743cffece928a0eee60bb498f5a1a
MD5 d88867d179c4a57541a923b7d66b72b8
BLAKE2b-256 8343342c827e7d437543ee302284a19960ad094f0a55a3d7a5e11c98011ae14e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page