An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence

Project description

pyAutoSummarizer

pyAutoSummarizer - An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence.

Introduction

pyAutoSummarizer is a sophisticated Python library developed to handle the complex task of text summarization, an essential component of NLP (Natural Language Processing). The library implements several advanced summarization algorithms, both extractive and abstractive. Extractive summarization algorithms focus on identifying and extracting key sentences or phrases from the original text to form the summary. Among the techniques utilized by pyAutoSummarizer are TextRank, LexRank, LSA (Latent Semantic Analysis), and KL-Sum. In the domain of deep learning, pyAutoSummarizer incorporates BART (Bidirectional and Auto-Regressive Transformers) and the use of T5 (Text-to-Text Transfer Transformer) model, which is known for its versatility in handling a range of language tasks including summarization. Furthermore, pyAutoSummarizer also utilizes PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive Summarization) and the OpenAI's GPT (Generative Pretrained Transformer), specifically the chatGPT model for abstractive summarization. Unlike extractive techniques, abstractive summarization involves generating new sentences, offering a summary that maintains the essence of the original text but may not use the exact wording.

pyAutoSummarizer stands out for its proficient preprocessing capabilities that pave the way for high-quality text summarization. Recognizing the importance of text normalization, the library offers a range of text cleansing and standardization features. It can convert text to lowercase, ensuring uniformity across the data. Additionally, it can remove accents, remove special characters, and remove numbers, which helps mitigate the text's noise. It also offers the functionality to remove custom words, enabling users to tailor their preprocessing needs. Notably, pyAutoSummarizer supports stopwords removal across various languages, including Arabic, Bengali, Bulgarian, Chinese, Czech, English, Finnish, French, German, Greek, Hebrew, Hind, Hungarian, Italian, Japanese, Korean, Marathi, Persia, Polish, Portuguese-br, Romanian, Russian, Slovak, Spanish, Swedish, Thai, and Ukrainian. The library provides flexibility in sentence segmentation, allowing sentences to be split based on punctuation, character count, or word count.

To evaluate the quality of the summaries generated, pyAutoSummarizer integrates various metrics such as Rouge-N, Rouge-L, and Rouge-S, which compare the overlap of n-grams, longest common subsequence, and skip-bigram between the generated summary and the reference summary respectively. Additionally, it employs BLEU (Bilingual Evaluation Understudy), and METEOR (Metric for Evaluation of Translation with Explicit ORdering).

Usage

Install

pip install pyAutoSummarizer

Try it in Colab:

Extractive Summarization

Example 01: TextRank ( Colab Demo )
Example 02: LexRank ( Colab Demo )
Example 03: LSA ( Colab Demo )
Example 04: KL-Sum ( Colab Demo )
Example 05: BART (Deep Learning) ( Colab Demo )
Example 06: T5 (Deep Learning) ( Colab Demo )

Abstractive Summarization.

Example 01: chatGPT (Deep Learning) ( Colab Demo ) Requires the user to have an API key (https://platform.openai.com/account/api-keys)
Example 02: PEGASUS (Deep Learning) ( Colab Demo )

Others

pyBibX - A Bibliometric and Scientometric Python Library Powered with Artificial Intelligence Tools

Project details

Release history Release notifications | RSS feed

This version

1.1.8

Dec 3, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyAutoSummarizer-1.1.8.tar.gz (50.4 kB view details)

Uploaded Dec 3, 2023 Source

Built Distribution

pyAutoSummarizer-1.1.8-py3-none-any.whl (50.6 kB view details)

Uploaded Dec 3, 2023 Python 3

File details

Details for the file pyAutoSummarizer-1.1.8.tar.gz.

File metadata

Download URL: pyAutoSummarizer-1.1.8.tar.gz
Upload date: Dec 3, 2023
Size: 50.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.25.11 tqdm/4.64.1 importlib-metadata/4.11.3 keyring/23.4.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.7.6

File hashes

Hashes for pyAutoSummarizer-1.1.8.tar.gz
Algorithm	Hash digest
SHA256	`b88e6878fd084659d1e1ffd437efe5fe31eb39cf761a92e78add737c0c40c781`
MD5	`67feebe2292dcf3c4b2fe1f7dfe1da00`
BLAKE2b-256	`6060c2649940805a774ffbf7dfda871d697f03f97072bd215d2cbdcffe643f76`

See more details on using hashes here.

File details

Details for the file pyAutoSummarizer-1.1.8-py3-none-any.whl.

File metadata

Download URL: pyAutoSummarizer-1.1.8-py3-none-any.whl
Upload date: Dec 3, 2023
Size: 50.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.25.11 tqdm/4.64.1 importlib-metadata/4.11.3 keyring/23.4.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.7.6

File hashes

Hashes for pyAutoSummarizer-1.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f8b4424da6bcb7da177b8d89187d4859e268df631b6b282c5dcf137268f75501`
MD5	`5b0c711617849e1a6a93995d82915fdd`
BLAKE2b-256	`538fd5bdef951867010dc5deb457c030affd286d1c53d5f821dab691e9ff7c4f`