Skip to main content

Python package for extractive text summarization using various embeddings and methods.

Project description

MultiExtractiveSummarizer

MultiExtractiveSummarizer is a Python package designed for extractive text summarization. It leverages advanced embedding techniques and sentence ranking algorithms to provide high-quality summaries of text documents. This package currently includes embedding methods from SBERT and TF-IDF, and sentence ranking using LexRank and K-means clustering. Future updates will include additional embedding methods like Word2Vec, GloVe, and BERT embeddings, as well as other sentence ranking algorithms such as TextRank and KLA.

Table of Contents

Installation

You can install MultiExtractiveSummarizer from PyPI using pip:

pip install MultiExtractiveSummarizer

Description

Extractive Summarization

Extractive summarization involves selecting sentences from a document to create a summary that retains the most important information. Unlike abstractive summarization, which generates new sentences, extractive summarization works by identifying and extracting existing sentences.

Embedding Methods

  1. SBERT (Sentence-BERT): SBERT is a modification of BERT that uses Siamese and triplet networks to derive semantically meaningful sentence embeddings.
  2. TF-IDF (Term Frequency-Inverse Document Frequency): TF-IDF is a numerical statistic that reflects the importance of a word in a document relative to a corpus.

Sentence Ranking Methods

  1. LexRank: LexRank is a graph-based algorithm for computing sentence importance based on eigenvector centrality in a similarity graph.
  2. K-means Clustering: K-means is a clustering algorithm that partitions sentences into k clusters, and representative sentences from each cluster are selected for the summary.

Features

  • Flexible Embedding Methods: Choose between SBERT and TF-IDF for embedding sentences.
  • Multiple Sentence Ranking Algorithms: Use LexRank or K-means clustering to rank sentences and create summaries.
  • Modular and Extensible: Designed to easily incorporate new embedding methods and ranking algorithms.

Usage

Basic Usage

Here's an example of how to use the MultiExtractiveSummarizer package to create a summary of a text document.

from MultiExtractiveSummarizer import MultiExtractiveSummarizer

# Initialize the summarizer
summarizer = MultiExtractiveSummarizer(embedding_method='sbert', summarization_method='lexrank')

# Example text document
text = """
Your text document goes here...
"""

# Generate the summary with number of sentences
summary = summarizer.summarize(text, num_sentences=5)

print("Summary:")
print(summary)

# Generate the summary ratio of text
summary = summarizer.summarize(text, ratio=0.5)

print("Summary:")
print(summary)

Advanced Usage

For advanced usage, you can specify different parameters for embedding methods and sentence ranking algorithms.

from MultiExtractiveSummarizer import MultiExtractiveSummarizer

# Initialize the summarizer with TF-IDF and K-means
summarizer = MultiExtractiveSummarizer(embedding_method='tfidf', summarization_method='kmeans')

# Example text document
text = """
Your long text document goes here...
"""

# Generate the summary
summary = summarizer.summarize(text, num_sentences=5)

print("Summary:")
print(summary)

Future Work

I plan to expand the capabilities of the MultiExtractiveSummarizer package by including:

  • Additional embedding methods: Word2Vec, GloVe, and BERT embeddings.
  • New sentence ranking algorithms: TextRank, KLA, and others.

Stay tuned for updates and new features!

Contributing

We welcome contributions from the community. If you have suggestions or would like to contribute, please fork the repository and create a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multiextractivesummarizer-0.1.1.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file multiextractivesummarizer-0.1.1.tar.gz.

File metadata

File hashes

Hashes for multiextractivesummarizer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 01e781287b9178bdd70787a847a3793f92a04dca614491fbc5223884ca71ca8e
MD5 3862f026cfaf73c7f3ca6836290afd73
BLAKE2b-256 9168e0229db0e570f7140d77a533c466816437d08bdad986853a33300318f082

See more details on using hashes here.

File details

Details for the file MultiExtractiveSummarizer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for MultiExtractiveSummarizer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 68b2b70ed877a929763ff9e34ea9e6a0f0653aefcc239bfd1f3fc3b89a7e2150
MD5 fa128c6acf41aa7c1fa260b94db773e0
BLAKE2b-256 584e50113211c3d5615a043b7dd7b992af3702eb6d89ea888b480e2823a21800

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page