An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence
Project description
pyAutoSummarizer
pyAutoSummarizer — An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence.
Citation
PEREIRA, V., DE LIMA PORTO, R.C., FIGUEIRA, L.A.A., FERREIRA, R.A.C.A. (2026). Unveiling pyAutoSummarizer: An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence. In: DA HORA, H., PORTER, A.L., CHIAVETTA, D., ZHANG, Y. (eds) Technology Mining. Springer, Cham. https://doi.org/10.1007/978-3-032-10849-4_2
Introduction
pyAutoSummarizer is a Python library for text summarization, covering both extractive and abstractive approaches, and providing a comprehensive suite of evaluation metrics — from classic n-gram overlap to modern semantic and faithfulness measures.
Summarization Methods
Extractive — identifies and returns the most important sentences from the original text:
| Method | Description |
|---|---|
| TextRank | Graph-based ranking using sentence embeddings and cosine similarity |
| LexRank | Graph-based ranking using TF-IDF cosine similarity |
| LSA | Latent Semantic Analysis via SVD on embeddings or TF-IDF matrix |
| KL-Sum | Selects sentences that minimise KL-divergence from the full document distribution |
| BART | facebook/bart-large-cnn abstractive model (deep learning) |
| T5 | t5-base abstractive model (deep learning) |
Abstractive — generates new text that captures the meaning of the source:
| Method | Description |
|---|---|
| PEGASUS | google/pegasus-xsum model fine-tuned for abstractive summarization |
| chatGPT | OpenAI gpt-4o-mini (or any chat model) via the OpenAI API |
Text Pre-processing
The library provides a flexible pre-processing pipeline:
- Lowercasing, accent removal, special character removal, number removal
- Custom word removal
- Stopword removal across 26 languages: Arabic, Bengali, Bulgarian, Chinese, Czech, English, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Italian, Japanese, Korean, Marathi, Persian, Polish, Portuguese-br, Romanian, Russian, Slovak, Spanish, Swedish, Thai, and Ukrainian
- Sentence segmentation by punctuation, word count, or character count
Evaluation Metrics
Classic Metrics (reference-based, lexical)
| Metric | Method | Returns |
|---|---|---|
| ROUGE-N | rouge_N(generated, reference, n=1) |
F1, Precision, Recall |
| ROUGE-L | rouge_L(generated, reference) |
F1, Precision, Recall |
| ROUGE-S | rouge_S(generated, reference, skip_distance=4) |
F1, Precision, Recall |
| BLEU | bleu(generated, reference, n=4) |
Score |
| METEOR | meteor(generated, reference) |
Score |
Semantic Metric (reference-based)
| Metric | Method | Returns | Notes |
|---|---|---|---|
| BERTScore | bert_score(generated, reference, model_type='roberta-large') |
F1, Precision, Recall | Requires pip install bert-score. Captures paraphrasing that ROUGE misses by comparing contextualised token embeddings. |
Faithfulness / Factual Consistency Metrics (source-based, no reference needed)
These metrics check whether the summary is factually consistent with the source document, detecting hallucinations that lexical metrics cannot see.
| Metric | Method | Returns | Notes |
|---|---|---|---|
| SummaC | summa_c(generated, nli_model='cross-encoder/nli-deberta-v3-small') |
Score ∈ [0, 1] | Self-contained NLI-based faithfulness scorer using HuggingFace transformers. No extra install needed. |
| AlignScore | align_score(generated, model='AlignScore-base') |
Score ∈ [0, 1] | Requires pip install pyAutoSummarizer[faithfulness] and python -m spacy download en_core_web_sm. Based on Zha et al., ACL 2023. |
LLM-as-Judge Metric
| Metric | Method | Returns | Notes |
|---|---|---|---|
| G-Eval | g_eval(generated, api_key, model='gpt-4o-mini', dimensions=['coherence','consistency','fluency','relevance']) |
dict {dimension: int 1–5} |
Uses an OpenAI chat model to score the summary across four quality dimensions. Based on Liu et al., 2023. Requires an OpenAI API key. |
Installation
Core install (extractive/abstractive methods + lexical/BERTScore metrics)
pip install pyAutoSummarizer
With faithfulness metrics (AlignScore)
pip install "pyAutoSummarizer[faithfulness]"
python -m spacy download en_core_web_sm
Requirements: Python ≥ 3.9
Quick Start
from pyAutoSummarizer.base import psr
text = """
Your long text goes here. It can be multiple paragraphs.
The library will pre-process it, split it into sentences,
and summarize it using any of the available methods.
"""
# Initialise — pre-processes the text
s = psr.summarization(text, stop_words=['en'], lowercase=True,
rmv_accents=True, rmv_special_chars=True, rmv_numbers=True)
# --- Extractive summarization ---
rank = s.summ_text_rank() # TextRank
summary = s.show_summary(rank, n=3) # top-3 sentences
print(summary)
# --- Abstractive summarization ---
summary = s.summ_abst_chatgpt(api_key='YOUR_KEY', model='gpt-4o-mini')
# --- Evaluation (classic) ---
f1, p, r = s.rouge_N(summary, reference, n=1)
bleu_s = s.bleu(summary, reference)
# --- Evaluation (semantic) ---
f1, p, r = s.bert_score(summary, reference)
# --- Evaluation (faithfulness — no reference needed) ---
faith_sc = s.summa_c(summary) # SummaC (built-in NLI)
align_sc = s.align_score(summary) # AlignScore (requires [faithfulness] extra)
# --- Evaluation (LLM-as-judge) ---
scores = s.g_eval(summary, api_key='YOUR_KEY')
# {'coherence': 4, 'consistency': 5, 'fluency': 5, 'relevance': 4}
Colab Demos
Extractive Summarization
Abstractive Summarization
- chatGPT — requires an OpenAI API key
- PEGASUS
Related Projects
- pyBibX — A Bibliometric and Scientometric Python Library Powered with Artificial Intelligence Tools
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyautosummarizer-1.2.0.tar.gz.
File metadata
- Download URL: pyautosummarizer-1.2.0.tar.gz
- Upload date:
- Size: 55.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3751d6cba51b35b69f14b1aa98cc416f261467aa4adbda8198ae6a97ec42e1a2
|
|
| MD5 |
1c25f81cb3a8e4d00260ae411def1e17
|
|
| BLAKE2b-256 |
1101f5d84471c74f1d0cf62bf2dbd20d6c9653f02d933ca207c847e60755c50b
|
File details
Details for the file pyautosummarizer-1.2.0-py3-none-any.whl.
File metadata
- Download URL: pyautosummarizer-1.2.0-py3-none-any.whl
- Upload date:
- Size: 53.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dc731bf31a0d37c3f6b39a05c8fc63acb316a313f7f6fa8f8d91709818b8832
|
|
| MD5 |
e42b44b1c769ec15bc6da12334a8b0cc
|
|
| BLAKE2b-256 |
b7724c01b45fac7a11ecb963bf1ac9c12b15e6381ac8e0f9d92ca960b7951ee3
|