Skip to main content

A text summarization tool using GloVe embeddings and PageRank algorithm

Project description

Text Summarizer

A Python-based text summarization tool that uses GloVe word embeddings and PageRank algorithm to generate extractive summaries of documents.

Features

  • Extractive Summarization: Uses sentence similarity and PageRank to identify the most important sentences
  • GloVe Embeddings: Leverages pre-trained GloVe word vectors for semantic similarity calculation
  • Multiple Input Methods: Support for single documents, CSV files, or interactive creation
  • GUI Interface: User-friendly Tkinter-based graphical interface
  • Command Line Interface: Scriptable command-line tool for automation
  • Batch Processing: Process multiple documents at once

Installation

Prerequisites

  • Python 3.8 or higher
  • Required packages (automatically installed): pandas, numpy, nltk, scikit-learn, networkx

Install from PyPI

pip install text-summarizer-aweebtaku

Install from Source

  1. Clone the repository:
git clone https://github.com/AWeebTaku/Summarizer.git
cd Summarizer
  1. Install the package:
pip install -e .

Upgrade Package

To upgrade to the latest version with new features:

pip install --upgrade text-summarizer-aweebtaku

Create Desktop Shortcuts (Windows)

After installation, create desktop shortcuts for easy access:

Option 1: Automatic (Recommended)

text-summarizer-shortcuts

This will create desktop shortcuts for both GUI and CLI versions.

Option 2: Manual Run the included batch file:

create_shortcuts.bat

Download GloVe Embeddings

No manual download required! The package will automatically download GloVe embeddings (100d, ~400MB) on first use and cache them in your home directory (~/.text_summarizer/).

If you prefer to use your own GloVe file, you can specify the path:

summarizer = TextSummarizer(glove_path='path/to/your/glove.6B.100d.txt')

Usage

Console Scripts

After installation, you can use these commands from anywhere:

# Upgrade to the latest version
pip install --upgrade text-summarizer-aweebtaku

# Launch the graphical user interface
text-summarizer-gui

# Use the command line interface
text-summarizer-aweebtaku --help

# Create desktop shortcuts (Windows only)
text-summarizer-shortcuts

Command Line Interface

# Summarize a CSV file
text-summarizer-aweebtaku --csv-file data/tennis.csv --article-id 1

# Interactive mode
text-summarizer-aweebtaku

Graphical User Interface

# Launch GUI (easiest way)
text-summarizer-aweebtaku --gui

# Or use the dedicated GUI command
text-summarizer-gui

Python API

from text_summarizer import TextSummarizer

# Initialize summarizer (automatic GloVe download)
summarizer = TextSummarizer(num_sentences=3)

# Simple text summarization
text = "Your long text here..."
summary = summarizer.summarize_text(text)
print(summary)

# Advanced usage with DataFrame
import pandas as pd
df = pd.DataFrame([{'article_id': 1, 'article_text': text}])
scored_sentences = summarizer.run_summarization(df)
article_text, summary = summarizer.summarize_article(scored_sentences, 1, df)

Data Format

Input data should be in CSV format with columns:

  • article_id: Unique identifier for each document
  • article_text: The full text of the document

Example:

article_id,article_text
1,"This is the first article. It contains multiple sentences..."
2,"This is the second article. It also has several sentences..."

Algorithm

The summarization process follows these steps:

  1. Sentence Tokenization: Split documents into individual sentences
  2. Text Cleaning: Remove punctuation, convert to lowercase, remove stopwords
  3. Sentence Vectorization: Convert sentences to vectors using GloVe embeddings
  4. Similarity Calculation: Compute cosine similarity between all sentence pairs
  5. PageRank Scoring: Apply PageRank algorithm to identify important sentences
  6. Summary Extraction: Select top-ranked sentences in original order

Configuration

  • glove_path: Path to GloVe embeddings file (default: 'glove.6B.100d.txt/glove.6B.100d.txt')
  • num_sentences: Number of sentences in summary (default: 5)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Citation

If you use this tool in your research, please cite:

@software{text_summarizer,
  title = {Text Summarizer},
  author = {Aditya Chaurasiya},
  url = {https://github.com/AWeebTaku/Summarizer},
  year = {2026}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_summarizer_aweebtaku-1.3.2.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

text_summarizer_aweebtaku-1.3.2-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file text_summarizer_aweebtaku-1.3.2.tar.gz.

File metadata

File hashes

Hashes for text_summarizer_aweebtaku-1.3.2.tar.gz
Algorithm Hash digest
SHA256 a8aa31d3359e2f5bafc67c4a01f06a1d5bd008c0518bb61c8b09df3655b7a53a
MD5 e6053b7974c494ad984ecfd20b1c33e7
BLAKE2b-256 7f57024178926b8821d7e21a99f327920572272c717eb899f4f6ecd89ae705d0

See more details on using hashes here.

File details

Details for the file text_summarizer_aweebtaku-1.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for text_summarizer_aweebtaku-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 16a1067b5fe3d8e5d760139ec3ae93653d6cc3de87eb930aee5bdf398dad0b75
MD5 912d87c539e172766a1d297b180a6835
BLAKE2b-256 71b02869d6130f20f8321f99d3ebdca8396a4b32c4e589f94c96293e3491e877

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page