A text summarization tool using GloVe embeddings and PageRank algorithm
Project description
Text Summarizer
A Python-based text summarization tool that uses GloVe word embeddings and PageRank algorithm to generate extractive summaries of documents.
Features
- Extractive Summarization: Uses sentence similarity and PageRank to identify the most important sentences
- GloVe Embeddings: Leverages pre-trained GloVe word vectors for semantic similarity calculation
- Multiple Input Methods: Support for single documents, CSV files, or interactive creation
- GUI Interface: User-friendly Tkinter-based graphical interface
- Command Line Interface: Scriptable command-line tool for automation
- Batch Processing: Process multiple documents at once
Installation
Prerequisites
- Python 3.8 or higher
- Required packages (automatically installed): pandas, numpy, nltk, scikit-learn, networkx
Install from PyPI
pip install text-summarizer-aweebtaku
Install from Source
- Clone the repository:
git clone https://github.com/AWeebTaku/Summarizer.git
cd Summarizer
- Install the package:
pip install -e .
Upgrade Package
To upgrade to the latest version with new features:
pip install --upgrade text-summarizer-aweebtaku
Create Desktop Shortcuts (Windows)
After installation, create desktop shortcuts for easy access:
Option 1: Automatic (Recommended)
text-summarizer-shortcuts
This will create desktop shortcuts for both GUI and CLI versions.
Option 2: Manual Run the included batch file:
create_shortcuts.bat
Download GloVe Embeddings
No manual download required! The package will automatically download GloVe embeddings (100d, ~400MB) on first use and cache them in your home directory (~/.text_summarizer/).
If you prefer to use your own GloVe file, you can specify the path:
summarizer = TextSummarizer(glove_path='path/to/your/glove.6B.100d.txt')
Usage
Console Scripts
After installation, you can use these commands from anywhere:
# Upgrade to the latest version
pip install --upgrade text-summarizer-aweebtaku
# Launch the graphical user interface
text-summarizer-gui
# Use the command line interface
text-summarizer-aweebtaku --help
# Create desktop shortcuts (Windows only)
text-summarizer-shortcuts
Command Line Interface
# Summarize a CSV file
text-summarizer-aweebtaku --csv-file data/tennis.csv --article-id 1
# Interactive mode
text-summarizer-aweebtaku
Graphical User Interface
# Launch GUI (easiest way)
text-summarizer-aweebtaku --gui
# Or use the dedicated GUI command
text-summarizer-gui
Python API
from text_summarizer import TextSummarizer
# Initialize summarizer (automatic GloVe download)
summarizer = TextSummarizer(num_sentences=3)
# Simple text summarization
text = "Your long text here..."
summary = summarizer.summarize_text(text)
print(summary)
# Advanced usage with DataFrame
import pandas as pd
df = pd.DataFrame([{'article_id': 1, 'article_text': text}])
scored_sentences = summarizer.run_summarization(df)
article_text, summary = summarizer.summarize_article(scored_sentences, 1, df)
Data Format
Input data should be in CSV format with columns:
article_id: Unique identifier for each documentarticle_text: The full text of the document
Example:
article_id,article_text
1,"This is the first article. It contains multiple sentences..."
2,"This is the second article. It also has several sentences..."
Algorithm
The summarization process follows these steps:
- Sentence Tokenization: Split documents into individual sentences
- Text Cleaning: Remove punctuation, convert to lowercase, remove stopwords
- Sentence Vectorization: Convert sentences to vectors using GloVe embeddings
- Similarity Calculation: Compute cosine similarity between all sentence pairs
- PageRank Scoring: Apply PageRank algorithm to identify important sentences
- Summary Extraction: Select top-ranked sentences in original order
Configuration
glove_path: Path to GloVe embeddings file (default: 'glove.6B.100d.txt/glove.6B.100d.txt')num_sentences: Number of sentences in summary (default: 5)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Citation
If you use this tool in your research, please cite:
@software{text_summarizer,
title = {Text Summarizer},
author = {Aditya Chaurasiya},
url = {https://github.com/AWeebTaku/Summarizer},
year = {2026}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file text_summarizer_aweebtaku-1.3.1.tar.gz.
File metadata
- Download URL: text_summarizer_aweebtaku-1.3.1.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79c8e37a83a7b09603c4a4a03bf4744f755ffbb6dfa88b54801b14e30b3e44d5
|
|
| MD5 |
8acd32b80c5f1507b7f5e1b81f033d24
|
|
| BLAKE2b-256 |
575f33ce6d13c5490aeb904dc3745ca40029ac15c9527e6035d9bc905c4c9736
|
File details
Details for the file text_summarizer_aweebtaku-1.3.1-py3-none-any.whl.
File metadata
- Download URL: text_summarizer_aweebtaku-1.3.1-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53ee76a55073bb6fdc6a4392813eb696d283eaac0604fa8816daa376d7e97edd
|
|
| MD5 |
300a59b99ee7e7b79ddf7e58038f2428
|
|
| BLAKE2b-256 |
c3da62adc47e24a5ae5bb7d96e1db3c6d532d47a303b1953e6d839f41e242d20
|