A factuality evaluation metric for evaluating plain language summaries using question answering

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.9

Project description

PlainQAFact

PlainQAFact is a retrieval-augmented and question-answering (QA)-based factuality evaluation framework for assessing the factuality of biomedical plain language summarization tasks. PlainFact is a high-quality human-annotated dataset with fine-grained explanation (i.e., added information) annotations.

News

(2025.03.11) PlainFact is now available on 🤗 Hugging Face: PlainFact for sentence-level data and PlainFact-summary for summary-level data.
(2025.03.02) Pre-embedded vector bases of Textbooks and StatPearls can be downloaded here.
(2025.03.01) 🚨🚨🚨 PlainQAFact is now on PyPI! Simply use pip install plainqafact to load our pipeline!
(2025.02.24) Our PlainFact dataset can be downloaded here: PlainFact, including sentence-level and summary-level granularities.
- Target_Sentence: The plain language sentence/summary.
- Original_Abstract: The scientific abstract corresponding to each sentence/summary.
- External: Whether the sentence includes information does not explicitly present in the scientific abstract. (yes: explanation, no: simplification)
- We will release the full version of PlainFact soon (including Category and Relation information). Stay tuned!
(2025.02.24) Our fine-tuned Question Generation model is available on 🤗 Hugging Face: QG model (or download it here)

NOTE: This repo is heavily relied on QAFactEval, QAEval, and MedRAG.

Overall Framework

Model Downloading

In PlainQAFact, we use pre-trained classifier to distinguish simplification and explanation sentences, Llama 3.1 8B Instruct for answer extraction, fine-tuned QG model, and the original question answering model from QAFactEval.

Download the pre-trained QA model and our pre-trained classifier through bash download_question_answering.sh.

Quickstart

conda create -n plainqafact python=3.9
pip install plainqafact

After installation, make sure you initialized git-lfs as required by MedRAG. Then, you can directly use PlainQAFact through:

from plainqafact import PlainQAFact

metric = PlainQAFact(
    cuda_device=0,
    classifier_type='learned',
    classifier_path='models/learned_classifier',
    llm_model_path='meta-llama/Llama-3.1-8B-Instruct',
    question_generation_model_path='uzw/bart-large-question-generation',
    qa_answering_model_dir='models/answering',
    knowledge_base='combined', # retrieve from both Textbooks and StatPearls KBs
    scoring_batch_size=1,
    answer_selection_strategy='llm-keywords'
)

# choice 1: interactively evaluate summaries
# summaries:
target_sentences = [
    "The study shows aspirin reduces heart attack risk.",
    "Patients with high blood pressure should exercise regularly."
]
# scientific abstracts:
abstracts = [
    "A comprehensive clinical trial demonstrated that daily aspirin administration significantly decreased the incidence of myocardial infarction in high-risk patients.",
    "Research indicates that regular physical activity is an effective intervention for managing hypertension in adult patients."
]

results = metric.evaluate(target_sentences, abstracts)

print(f"Explanation score (mean: {results['external_mean']:.4f}):", results['external_scores'])
print(f"Simplification score (mean: {results['internal_mean']:.4f}):", results['internal_scores'])
print(f"PlainQAFact score: {results['overall_mean']:.4f}")

Or you can evaluate a data file through:

from plainqafact import PlainQAFact

metric = PlainQAFact(
    cuda_device=0,
    classifier_type='learned',
    classifier_path='models/learned_classifier',
    llm_model_path='meta-llama/Llama-3.1-8B-Instruct',
    question_generation_model_path='uzw/bart-large-question-generation',
    qa_answering_model_dir='models/answering',
    knowledge_base='combined',
    scoring_batch_size=1,
    answer_selection_strategy='llm-keywords',
    target_sentence_col='Target_Sentence', # name of your summary's key (column)
    abstract_col='Original_Abstract', # name of your abstract's key (column)
    input_file_format='csv' # your input file format
)

# choice 2: directly evaluate a data file:
results = metric.evaluate_all(input_file='your_data.csv')

print(f"Explanation score (mean: {results['external_mean']:.4f}):", results['external_scores'])
print(f"Simplification score (mean: {results['internal_mean']:.4f}):", results['internal_scores'])
print(f"PlainQAFact score: {results['overall_mean']:.4f}")

Option 2: Install from source

Installation

First, create a new conda env: conda create -n plainqafact python=3.9 and clone our repo.
cd PlainQAFact
Follow the instructions in MedRAG to install PyTorch and other required packages.

Then, run the following command:

conda install git
pip install -r requirements.txt

Finally, install the old tokenizer package through:

pip install transformers_old_tokenizer-3.1.0-py3-none-any.whl

Running through our PlainFact dataset

Before running the following command, please download the question answering and learned classifier models through above instructions.

python3 run.py \
    --cuda_device 0 \
    --classifier_type learned \  # Options: 'learned', 'llama', 'gpt'
    --input_file data/summary_level.csv \ # path of the input dataset 
    --classifier_path path/to/learned_classifier \ # path of the classifier
    --llm_model_path meta-llama/Llama-3.1-8B-Instruct \ # path of the answer extractor
    --question_generation_model_path uzw/bart-large-question-generation \ # path of the question generation model
    --qa_answering_model_dir models/answering \ # path of the question answering model
    --knowledge_base combined \ # knowledge bases for retrieval, options: textbooks, statpearls, pubmed, wikipedia, combined
    --answer_selection_strategy llm-keywords  # Options: 'llm-keywords', 'gpt-keywords', 'none'

Running through your own data

Please modify the default_config.py file Line 17-19 to indicate the heading/key names of your dataset. We currently support .json, .txt, and .csv file.

python3 run.py \
    --cuda_device 0 \
    --classifier_type learned \
    --input_file your_own_data.json \
    --input_file_format json \
    --classifier_path path/to/learned_classifier \
    --llm_model_path meta-llama/Llama-3.1-8B-Instruct \
    --question_generation_model_path uzw/bart-large-question-generation \
    --qa_answering_model_dir models/answering \
    --knowledge_base textbooks \
    --answer_selection_strategy llm-keywords

Easily replace the pre-trained classifier to OpenAI models or your own

We provides options to easily replace our pre-trained classisifer tailored for the biomedical plain language summarization tasks to other tasks. You may simply set --classifier_type as gpt and provide your OpenAI API key in the default_config.py file Line 26 to run PlainQAFact.

python3 run.py \
    --cuda_device 0 \
    --classifier_type gpt \
    --input_file your_own_data.json \
    --input_file_format json \
    --llm_model_path meta-llama/Llama-3.1-8B-Instruct \
    --question_generation_model_path uzw/bart-large-question-generation \
    --qa_answering_model_dir models/answering \
    --knowledge_base textbooks \
    --answer_selection_strategy llm-keywords

Using other Knowledge Bases for retrieval

Currently, we only experiment with two KBs: Textbooks and StatPearls. You may want to use your customized KBs for more accurate retrieval. In PlainQAFact, we combine both Textbooks and StatPearls and concatenate with the scientific abstracts. Set --knowledge_base textbooks as combined to reproduce our results.

NOTE: Using Llama 3.1 8B model for both classification and answer extraction would take over 40 GB GPU memory. We recommend to use our pre-trained classifier or OpenAI models for classification if the GPU memory is limited.

Citation Information

For the use of PlainQAFact and PlainFact benchmark, please cite:

@misc{you2025plainqafactretrievalaugmentedfactualconsistency,
      title={PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization}, 
      author={Zhiwen You and Yue Guo},
      year={2025},
      eprint={2503.08890},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.08890}, 
}

Contact Information

If you have any questions, please email zhiweny2@illinois.edu.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.9

Release history Release notifications | RSS feed

This version

1.1.0

Mar 12, 2026

0.12.0

Mar 2, 2025

0.11.0

Mar 2, 2025

0.10.0

Mar 1, 2025

0.9.0

Mar 1, 2025

0.8.0

Mar 1, 2025

0.7.0

Mar 1, 2025

0.6.0

Mar 1, 2025

0.5.0

Mar 1, 2025

0.4.0

Mar 1, 2025

0.3.0

Mar 1, 2025

0.2.0

Mar 1, 2025

0.1.0

Mar 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plainqafact-1.1.0.tar.gz (254.9 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

plainqafact-1.1.0-py3-none-any.whl (36.4 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file plainqafact-1.1.0.tar.gz.

File metadata

Download URL: plainqafact-1.1.0.tar.gz
Upload date: Mar 12, 2026
Size: 254.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.20

File hashes

Hashes for plainqafact-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2b1aaa750cf6d303882a4af7633774d04718961cc34b0a37750e844da85c239e`
MD5	`3ed227248a709f87015ab85bbcd193b4`
BLAKE2b-256	`55a27c6bcc2940a4c38b5ee451819894a7f13e285a288db64e350f0f20d8ecd4`

See more details on using hashes here.

File details

Details for the file plainqafact-1.1.0-py3-none-any.whl.

File metadata

Download URL: plainqafact-1.1.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 36.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.20

File hashes

Hashes for plainqafact-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e9181b9ac853050ac450ae9b900e7de1d031d266374da4697908be4fae80877`
MD5	`621c519a4a6a92208ebeffd1b21770e7`
BLAKE2b-256	`8c437558b3e9334fa098a857d3fa85b891d74b740b1daaf22c71154e20b6f062`

See more details on using hashes here.

plainqafact 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PlainQAFact

News

Overall Framework

Model Downloading

Quickstart

Option 2: Install from source

Installation

Running through our PlainFact dataset

Running through your own data

Easily replace the pre-trained classifier to OpenAI models or your own

Using other Knowledge Bases for retrieval

Citation Information

Contact Information

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes