Extractive text summarization using centroid distance
Project description
Extractive text summarization using centroid distance
Install
python3 -m venv venv
source venv/bin/activate
pip install -U pip # optional but recommended
pip install centroid_summarizer
Usage
import centroid_summarizer
from nltk.tokenize import word_tokenize, sent_tokenize
# Courtesy officeipsum.com
text = "Just do what you think. I trust you. The hair is just too polarising. Low resolution? It looks ok on my screen. This turned out different than I decscribed. Will royalties in the company do instead of cash? Appeal to the client. Sue the vice president, is there a way we can make the page feel more introductory without being cheesy? So can my website be in English? Try a more powerful colour. Concept is bang on, but can we look at a better execution? I really like the colour but can you change it, can we try some other colours maybe? I have an awesome idea for a startup, and I need you to build it for me."
raw = sent_tokenize(text)
clean = list(centroid_summarizer.simple_clean(text))
cbs = centroid_summarizer.CentroidBOWSummarizer()
print(list(cbs.summarize(
raw,
clean,
limit=20
)))
from gensim.models import Word2Vec
model = Word2Vec(min_count=1)
model.build_vocab(raw)
model.train([ word_tokenize(_) for _ in clean ], total_examples=model.corpus_count, epochs=model.epochs)
cws = centroid_summarizer.CentroidWordEmbeddingsSummarizer(model)
print(list(cws.summarize(
raw,
clean,
limit=20
)))
About
This package is derived from the original implementation by the authors of the paper "Centroid-based Text Summarization through Compositionality of Word Embeddings" accepted at MultiLing Workshop at EACL 2017. http://www.aclweb.org/anthology/W17-1003
Original author: Gaetano Rossiello gaetano.rossiello@ibm.com
Tutorial
The method is described in this step-by-step guide: A Better Approach to Text Summarization
Citation
@inproceedings{DBLP:conf/acl-multiling/RossielloBS17,
author = {Gaetano Rossiello and
Pierpaolo Basile and
Giovanni Semeraro},
title = {Centroid-based Text Summarization through Compositionality of Word
Embeddings},
booktitle = {MultiLing at EACL},
pages = {12--21},
publisher = {Association for Computational Linguistics},
year = {2017}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
centroid_summarizer-0.0.1.tar.gz
(17.7 kB
view hashes)
Built Distribution
Close
Hashes for centroid_summarizer-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbbb0ea3d0d0b3ab2c358e6027bdaf8bf72aa0ae804899e37d28093f0caf71df |
|
MD5 | abef34670d383a1bee723914528d7393 |
|
BLAKE2b-256 | 4db8930828b4ff11ee8e6615cdd1387396d365d4ec3982bfea16770174d5cc43 |
Close
Hashes for centroid_summarizer-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 504a9b0c40a1cce833b89936b774065a43e94d937fb9368d57a371e008c67f5a |
|
MD5 | 4e3f206052f5d070ef08f50005a1bb07 |
|
BLAKE2b-256 | 8d43da0dff36bf70b04f68eed0a4479bcd59ada8292230aab7bcd1ad878e80b2 |