A factuality evaluation metric for evaluating plain language summaries using question answering
Project description
PlainQAFact
PlainQAFact is a retrieval-augmented and question-answering (QA)-based factuality evaluation framework for assessing the factuality of biomedical plain language summarization tasks. PlainFact is a high-quality human-annotated dataset with fine-grained explanation (i.e., added information) annotations.
News
- (2025.03.11)
PlainFactis now available on 🤗 Hugging Face: PlainFact for sentence-level data and PlainFact-summary for summary-level data. - (2025.03.02) Pre-embedded vector bases of Textbooks and StatPearls can be downloaded here.
- (2025.03.01) 🚨🚨🚨
PlainQAFactis now on PyPI! Simply usepip install plainqafactto load our pipeline! - (2025.02.24) Our
PlainFactdataset can be downloaded here:PlainFact, including sentence-level and summary-level granularities.- Target_Sentence: The plain language sentence/summary.
- Original_Abstract: The scientific abstract corresponding to each sentence/summary.
- External: Whether the sentence includes information does not explicitly present in the scientific abstract. (
yes: explanation,no: simplification) - We will release the full version of PlainFact soon (including Category and Relation information). Stay tuned!
- (2025.02.24) Our fine-tuned Question Generation model is available on 🤗 Hugging Face:
QG model(or download it here)
NOTE: This repo is heavily relied on QAFactEval, QAEval, and MedRAG.
Overall Framework
Model Downloading
In PlainQAFact, we use pre-trained classifier to distinguish simplification and explanation sentences, Llama 3.1 8B Instruct for answer extraction, fine-tuned QG model, and the original question answering model from QAFactEval.
Download the pre-trained QA model and our pre-trained classifier through bash download_question_answering.sh.
Quickstart
conda create -n plainqafact python=3.9
pip install plainqafact
After installation, make sure you initialized git-lfs as required by MedRAG. Then, you can directly use PlainQAFact through:
from plainqafact import PlainQAFact
metric = PlainQAFact(
cuda_device=0,
classifier_type='learned',
classifier_path='models/learned_classifier',
llm_model_path='meta-llama/Llama-3.1-8B-Instruct',
question_generation_model_path='uzw/bart-large-question-generation',
qa_answering_model_dir='models/answering',
knowledge_base='combined', # retrieve from both Textbooks and StatPearls KBs
scoring_batch_size=1,
answer_selection_strategy='llm-keywords'
)
# choice 1: interactively evaluate summaries
# summaries:
target_sentences = [
"The study shows aspirin reduces heart attack risk.",
"Patients with high blood pressure should exercise regularly."
]
# scientific abstracts:
abstracts = [
"A comprehensive clinical trial demonstrated that daily aspirin administration significantly decreased the incidence of myocardial infarction in high-risk patients.",
"Research indicates that regular physical activity is an effective intervention for managing hypertension in adult patients."
]
results = metric.evaluate(target_sentences, abstracts)
print(f"Explanation score (mean: {results['external_mean']:.4f}):", results['external_scores'])
print(f"Simplification score (mean: {results['internal_mean']:.4f}):", results['internal_scores'])
print(f"PlainQAFact score: {results['overall_mean']:.4f}")
Or you can evaluate a data file through:
from plainqafact import PlainQAFact
metric = PlainQAFact(
cuda_device=0,
classifier_type='learned',
classifier_path='models/learned_classifier',
llm_model_path='meta-llama/Llama-3.1-8B-Instruct',
question_generation_model_path='uzw/bart-large-question-generation',
qa_answering_model_dir='models/answering',
knowledge_base='combined',
scoring_batch_size=1,
answer_selection_strategy='llm-keywords',
target_sentence_col='Target_Sentence', # name of your summary's key (column)
abstract_col='Original_Abstract', # name of your abstract's key (column)
input_file_format='csv' # your input file format
)
# choice 2: directly evaluate a data file:
results = metric.evaluate_all(input_file='your_data.csv')
print(f"Explanation score (mean: {results['external_mean']:.4f}):", results['external_scores'])
print(f"Simplification score (mean: {results['internal_mean']:.4f}):", results['internal_scores'])
print(f"PlainQAFact score: {results['overall_mean']:.4f}")
Option 2: Install from source
Installation
- First, create a new conda env:
conda create -n plainqafact python=3.9and clone our repo. cd PlainQAFact- Follow the instructions in MedRAG to install PyTorch and other required packages.
- Then, run the following command:
conda install git pip install -r requirements.txt
- Finally, install the old tokenizer package through:
pip install transformers_old_tokenizer-3.1.0-py3-none-any.whl
Running through our PlainFact dataset
Before running the following command, please download the question answering and learned classifier models through above instructions.
python3 run.py \
--cuda_device 0 \
--classifier_type learned \ # Options: 'learned', 'llama', 'gpt'
--input_file data/summary_level.csv \ # path of the input dataset
--classifier_path path/to/learned_classifier \ # path of the classifier
--llm_model_path meta-llama/Llama-3.1-8B-Instruct \ # path of the answer extractor
--question_generation_model_path uzw/bart-large-question-generation \ # path of the question generation model
--qa_answering_model_dir models/answering \ # path of the question answering model
--knowledge_base combined \ # knowledge bases for retrieval, options: textbooks, statpearls, pubmed, wikipedia, combined
--answer_selection_strategy llm-keywords # Options: 'llm-keywords', 'gpt-keywords', 'none'
Running through your own data
Please modify the default_config.py file Line 17-19 to indicate the heading/key names of your dataset. We currently support .json, .txt, and .csv file.
python3 run.py \
--cuda_device 0 \
--classifier_type learned \
--input_file your_own_data.json \
--input_file_format json \
--classifier_path path/to/learned_classifier \
--llm_model_path meta-llama/Llama-3.1-8B-Instruct \
--question_generation_model_path uzw/bart-large-question-generation \
--qa_answering_model_dir models/answering \
--knowledge_base textbooks \
--answer_selection_strategy llm-keywords
Easily replace the pre-trained classifier to OpenAI models or your own
We provides options to easily replace our pre-trained classisifer tailored for the biomedical plain language summarization tasks to other tasks. You may simply set --classifier_type as gpt and provide your OpenAI API key in the default_config.py file Line 26 to run PlainQAFact.
python3 run.py \
--cuda_device 0 \
--classifier_type gpt \
--input_file your_own_data.json \
--input_file_format json \
--llm_model_path meta-llama/Llama-3.1-8B-Instruct \
--question_generation_model_path uzw/bart-large-question-generation \
--qa_answering_model_dir models/answering \
--knowledge_base textbooks \
--answer_selection_strategy llm-keywords
Using other Knowledge Bases for retrieval
Currently, we only experiment with two KBs: Textbooks and StatPearls. You may want to use your customized KBs for more accurate retrieval. In PlainQAFact, we combine both Textbooks and StatPearls and concatenate with the scientific abstracts. Set --knowledge_base textbooks as combined to reproduce our results.
NOTE: Using Llama 3.1 8B model for both classification and answer extraction would take over 40 GB GPU memory. We recommend to use our pre-trained classifier or OpenAI models for classification if the GPU memory is limited.
Citation Information
For the use of PlainQAFact and PlainFact benchmark, please cite:
@misc{you2025plainqafactretrievalaugmentedfactualconsistency,
title={PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization},
author={Zhiwen You and Yue Guo},
year={2025},
eprint={2503.08890},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.08890},
}
Contact Information
If you have any questions, please email zhiweny2@illinois.edu.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file plainqafact-1.1.0.tar.gz.
File metadata
- Download URL: plainqafact-1.1.0.tar.gz
- Upload date:
- Size: 254.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b1aaa750cf6d303882a4af7633774d04718961cc34b0a37750e844da85c239e
|
|
| MD5 |
3ed227248a709f87015ab85bbcd193b4
|
|
| BLAKE2b-256 |
55a27c6bcc2940a4c38b5ee451819894a7f13e285a288db64e350f0f20d8ecd4
|
File details
Details for the file plainqafact-1.1.0-py3-none-any.whl.
File metadata
- Download URL: plainqafact-1.1.0-py3-none-any.whl
- Upload date:
- Size: 36.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e9181b9ac853050ac450ae9b900e7de1d031d266374da4697908be4fae80877
|
|
| MD5 |
621c519a4a6a92208ebeffd1b21770e7
|
|
| BLAKE2b-256 |
8c437558b3e9334fa098a857d3fa85b891d74b740b1daaf22c71154e20b6f062
|