A streamlined and approximate implementation of the LexRank algorithm for rapid text summarization.
Project description
FastLexRank
A streamlined and approximate implementation of the LexRank algorithm for rapid text summarization.
LexRank for large scale data
The original implementation of LexRank utilizes the power method to calculate the eigenvector associated with an eigenvalue of 1. In the foundational paper by Erkan and Radev[1], they mathematically demonstrated why the normalized similarity matrix is a stochastic matrix and will, therefore, converge.
However, a key challenge with the original LexRank algorithm is its dependence on the power method, which often requires multiple iterations to converge. For a large corpus, matrix multiplication can become a bottleneck, slowing down the computation considerably.
To address this issue, we introduce an approximate approach that efficiently computes a score for each sentence while retaining the essential characteristic of relative centrality. Our modified method offers significant speed improvements in LexRank calculations and delivers reliable results.
Reference
[1] Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fastlexrank-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7eec850b844cdbded316bb7474d2705220b666b0069fcfa10ee3381c70b04d38 |
|
MD5 | cf0908157c0fe3670e3e9a88d5f7c6f9 |
|
BLAKE2b-256 | 98abe080420ee9618d88f1b984d6163edae596df1eda029435ecce7a3b7241ce |