Skip to main content

An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence

Project description

pyAutoSummarizer

pyAutoSummarizer — An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence.

Citation

PEREIRA, V., DE LIMA PORTO, R.C., FIGUEIRA, L.A.A., FERREIRA, R.A.C.A. (2026). Unveiling pyAutoSummarizer: An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence. In: DA HORA, H., PORTER, A.L., CHIAVETTA, D., ZHANG, Y. (eds) Technology Mining. Springer, Cham. https://doi.org/10.1007/978-3-032-10849-4_2

Introduction

pyAutoSummarizer is a Python library for text summarization, covering both extractive and abstractive approaches, and providing a comprehensive suite of evaluation metrics — from classic n-gram overlap to modern semantic and faithfulness measures.

Summarization Methods

Extractive — identifies and returns the most important sentences from the original text:

Method Description
TextRank Graph-based ranking using sentence embeddings and cosine similarity
LexRank Graph-based ranking using TF-IDF cosine similarity
LSA Latent Semantic Analysis via SVD on embeddings or TF-IDF matrix
KL-Sum Selects sentences that minimise KL-divergence from the full document distribution
BART facebook/bart-large-cnn abstractive model (deep learning)
T5 t5-base abstractive model (deep learning)

Abstractive — generates new text that captures the meaning of the source:

Method Description
PEGASUS google/pegasus-xsum model fine-tuned for abstractive summarization
chatGPT OpenAI gpt-4o-mini (or any chat model) via the OpenAI API

Text Pre-processing

The library provides a flexible pre-processing pipeline:

  • Lowercasing, accent removal, special character removal, number removal
  • Custom word removal
  • Stopword removal across 26 languages: Arabic, Bengali, Bulgarian, Chinese, Czech, English, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Italian, Japanese, Korean, Marathi, Persian, Polish, Portuguese-br, Romanian, Russian, Slovak, Spanish, Swedish, Thai, and Ukrainian
  • Sentence segmentation by punctuation, word count, or character count

Evaluation Metrics

Classic Metrics (reference-based, lexical)

Metric Method Returns
ROUGE-N rouge_N(generated, reference, n=1) F1, Precision, Recall
ROUGE-L rouge_L(generated, reference) F1, Precision, Recall
ROUGE-S rouge_S(generated, reference, skip_distance=4) F1, Precision, Recall
BLEU bleu(generated, reference, n=4) Score
METEOR meteor(generated, reference) Score

Semantic Metric (reference-based)

Metric Method Returns Notes
BERTScore bert_score(generated, reference, model_type='roberta-large') F1, Precision, Recall Requires pip install bert-score. Captures paraphrasing that ROUGE misses by comparing contextualised token embeddings.

Faithfulness / Factual Consistency Metrics (source-based, no reference needed)

These metrics check whether the summary is factually consistent with the source document, detecting hallucinations that lexical metrics cannot see.

Metric Method Returns Notes
SummaC summa_c(generated, nli_model='cross-encoder/nli-deberta-v3-small') Score ∈ [0, 1] Self-contained NLI-based faithfulness scorer using HuggingFace transformers. No extra install needed.
AlignScore align_score(generated, model='AlignScore-base') Score ∈ [0, 1] Requires pip install pyAutoSummarizer[faithfulness] and python -m spacy download en_core_web_sm. Based on Zha et al., ACL 2023.

LLM-as-Judge Metric

Metric Method Returns Notes
G-Eval g_eval(generated, api_key, model='gpt-4o-mini', dimensions=['coherence','consistency','fluency','relevance']) dict {dimension: int 1–5} Uses an OpenAI chat model to score the summary across four quality dimensions. Based on Liu et al., 2023. Requires an OpenAI API key.

Installation

Core install (extractive/abstractive methods + lexical/BERTScore metrics)

pip install pyAutoSummarizer

With faithfulness metrics (AlignScore)

pip install "pyAutoSummarizer[faithfulness]"
python -m spacy download en_core_web_sm

Requirements: Python ≥ 3.9

Quick Start

from pyAutoSummarizer.base import psr

text = """
Your long text goes here. It can be multiple paragraphs.
The library will pre-process it, split it into sentences,
and summarize it using any of the available methods.
"""

# Initialise — pre-processes the text
s = psr.summarization(text, stop_words=['en'], lowercase=True,
                      rmv_accents=True, rmv_special_chars=True, rmv_numbers=True)

# --- Extractive summarization ---
rank    = s.summ_text_rank()          # TextRank
summary = s.show_summary(rank, n=3)   # top-3 sentences
print(summary)

# --- Abstractive summarization ---
summary = s.summ_abst_chatgpt(api_key='YOUR_KEY', model='gpt-4o-mini')

# --- Evaluation (classic) ---
f1, p, r = s.rouge_N(summary, reference, n=1)
bleu_s   = s.bleu(summary, reference)

# --- Evaluation (semantic) ---
f1, p, r = s.bert_score(summary, reference)

# --- Evaluation (faithfulness — no reference needed) ---
faith_sc = s.summa_c(summary)    # SummaC (built-in NLI)
align_sc = s.align_score(summary) # AlignScore (requires [faithfulness] extra)

# --- Evaluation (LLM-as-judge) ---
scores   = s.g_eval(summary, api_key='YOUR_KEY')
# {'coherence': 4, 'consistency': 5, 'fluency': 5, 'relevance': 4}

Colab Demos

Extractive Summarization

Abstractive Summarization

Related Projects

  • pyBibX — A Bibliometric and Scientometric Python Library Powered with Artificial Intelligence Tools

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautosummarizer-1.2.0.tar.gz (55.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyautosummarizer-1.2.0-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file pyautosummarizer-1.2.0.tar.gz.

File metadata

  • Download URL: pyautosummarizer-1.2.0.tar.gz
  • Upload date:
  • Size: 55.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.9

File hashes

Hashes for pyautosummarizer-1.2.0.tar.gz
Algorithm Hash digest
SHA256 3751d6cba51b35b69f14b1aa98cc416f261467aa4adbda8198ae6a97ec42e1a2
MD5 1c25f81cb3a8e4d00260ae411def1e17
BLAKE2b-256 1101f5d84471c74f1d0cf62bf2dbd20d6c9653f02d933ca207c847e60755c50b

See more details on using hashes here.

File details

Details for the file pyautosummarizer-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pyautosummarizer-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1dc731bf31a0d37c3f6b39a05c8fc63acb316a313f7f6fa8f8d91709818b8832
MD5 e42b44b1c769ec15bc6da12334a8b0cc
BLAKE2b-256 b7724c01b45fac7a11ecb963bf1ac9c12b15e6381ac8e0f9d92ca960b7951ee3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page