Python package for extractive text summarization using various embeddings and methods.
Project description
MultiExtractiveSummarizer
MultiExtractiveSummarizer
is a Python package designed for extractive text summarization. It leverages advanced embedding techniques and sentence ranking algorithms to provide high-quality summaries of text documents. This package currently includes embedding methods from SBERT and TF-IDF, and sentence ranking using LexRank and K-means clustering. Future updates will include additional embedding methods like Word2Vec, GloVe, and BERT embeddings, as well as other sentence ranking algorithms such as TextRank and KLA.
Table of Contents
Installation
You can install MultiExtractiveSummarizer from PyPI using pip:
pip install MultiExtractiveSummarizer
Description
Extractive Summarization
Extractive summarization involves selecting sentences from a document to create a summary that retains the most important information. Unlike abstractive summarization, which generates new sentences, extractive summarization works by identifying and extracting existing sentences.
Embedding Methods
- SBERT (Sentence-BERT): SBERT is a modification of BERT that uses Siamese and triplet networks to derive semantically meaningful sentence embeddings.
- TF-IDF (Term Frequency-Inverse Document Frequency): TF-IDF is a numerical statistic that reflects the importance of a word in a document relative to a corpus.
Sentence Ranking Methods
- LexRank: LexRank is a graph-based algorithm for computing sentence importance based on eigenvector centrality in a similarity graph.
- K-means Clustering: K-means is a clustering algorithm that partitions sentences into k clusters, and representative sentences from each cluster are selected for the summary.
Features
- Flexible Embedding Methods: Choose between SBERT and TF-IDF for embedding sentences.
- Multiple Sentence Ranking Algorithms: Use LexRank or K-means clustering to rank sentences and create summaries.
- Modular and Extensible: Designed to easily incorporate new embedding methods and ranking algorithms.
Usage
Basic Usage
Here's an example of how to use the MultiExtractiveSummarizer
package to create a summary of a text document.
from MultiExtractiveSummarizer import MultiExtractiveSummarizer
# Initialize the summarizer
summarizer = MultiExtractiveSummarizer(embedding_method='sbert', summarization_method='lexrank')
# Example text document
text = """
Your text document goes here...
"""
# Generate the summary with number of sentences
summary = summarizer.summarize(text, num_sentences=5)
print("Summary:")
print(summary)
# Generate the summary ratio of text
summary = summarizer.summarize(text, ratio=0.5)
print("Summary:")
print(summary)
Advanced Usage
For advanced usage, you can specify different parameters for embedding methods and sentence ranking algorithms.
from MultiExtractiveSummarizer import MultiExtractiveSummarizer
# Initialize the summarizer with TF-IDF and K-means
summarizer = MultiExtractiveSummarizer(embedding_method='tfidf', summarization_method='kmeans')
# Example text document
text = """
Your long text document goes here...
"""
# Generate the summary
summary = summarizer.summarize(text, num_sentences=5)
print("Summary:")
print(summary)
Future Work
I plan to expand the capabilities of the MultiExtractiveSummarizer
package by including:
- Additional embedding methods: Word2Vec, GloVe, and BERT embeddings.
- New sentence ranking algorithms: TextRank, KLA, and others.
Stay tuned for updates and new features!
Contributing
We welcome contributions from the community. If you have suggestions or would like to contribute, please fork the repository and create a pull request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file multiextractivesummarizer-0.1.1.tar.gz
.
File metadata
- Download URL: multiextractivesummarizer-0.1.1.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01e781287b9178bdd70787a847a3793f92a04dca614491fbc5223884ca71ca8e |
|
MD5 | 3862f026cfaf73c7f3ca6836290afd73 |
|
BLAKE2b-256 | 9168e0229db0e570f7140d77a533c466816437d08bdad986853a33300318f082 |
File details
Details for the file MultiExtractiveSummarizer-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: MultiExtractiveSummarizer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68b2b70ed877a929763ff9e34ea9e6a0f0653aefcc239bfd1f3fc3b89a7e2150 |
|
MD5 | fa128c6acf41aa7c1fa260b94db773e0 |
|
BLAKE2b-256 | 584e50113211c3d5615a043b7dd7b992af3702eb6d89ea888b480e2823a21800 |