Skip to main content

Module for automatic summarization of text documents and HTML pages.

Project description

Automatic text summarizer

image

Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains simple evaluation framework for text summaries. Implemented summarization methods:

Here are some other summarizers:

Installation

Make sure you have Python 2.7/3.3+ and pip (Windows, Linux) installed. Run simply (preferred way):

$ [sudo] pip install sumy

Or for the fresh version:

$ [sudo] pip install git+git://github.com/miso-belica/sumy.git

Usage

Sumy contains command line utility for quick summarization of documents.

$ sumy lex-rank --length=10 --url=http://en.wikipedia.org/wiki/Automatic_summarization # what's summarization?
$ sumy luhn --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy edmundson --language=czech --length=3% --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy --help # for more info

Various evaluation methods for some summarization method can be executed by commands below:

$ sumy_eval lex-rank reference_summary.txt --url=http://en.wikipedia.org/wiki/Automatic_summarization
$ sumy_eval lsa reference_summary.txt --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy_eval edmundson reference_summary.txt --language=czech --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy_eval --help # for more info

Python API

Or you can use sumy like a library in your project. Create file sumy_example.py (don't name it sumy.py) with the code below to test it.

# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals

from sumy.parsers.html import HtmlParser
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words


LANGUAGE = "english"
SENTENCES_COUNT = 10


if __name__ == "__main__":
    url = "https://en.wikipedia.org/wiki/Automatic_summarization"
    parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
    # or for plain text files
    # parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))
    stemmer = Stemmer(LANGUAGE)

    summarizer = Summarizer(stemmer)
    summarizer.stop_words = get_stop_words(LANGUAGE)

    for sentence in summarizer(parser.document, SENTENCES_COUNT):
        print(sentence)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sumy-0.8.1.tar.gz (58.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sumy-0.8.1-py2.py3-none-any.whl (83.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file sumy-0.8.1.tar.gz.

File metadata

  • Download URL: sumy-0.8.1.tar.gz
  • Upload date:
  • Size: 58.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for sumy-0.8.1.tar.gz
Algorithm Hash digest
SHA256 899155ed9e50ac128167ac9e4f91ecd38493a32fdeca8213f93e9df135c9e0c7
MD5 c7c6d6f3a8bee65f8f7e5e20242cc707
BLAKE2b-256 37315427d7fb71e1e9e7bca27ffbfe3256b8669e27da120acb606ed66ab70eee

See more details on using hashes here.

File details

Details for the file sumy-0.8.1-py2.py3-none-any.whl.

File metadata

  • Download URL: sumy-0.8.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 83.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for sumy-0.8.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ddcf78f283534a7c0964f23a35a27b28eaaa9f512d6ac3169daae19be2bd9432
MD5 a1eb7b0c9a69d7836bcad2c0708e95bb
BLAKE2b-256 61208abf92617ec80a2ebaec8dc1646a790fc9656a4a4377ddb9f0cc90bc9326

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page