Skip to main content

open source implementation of summly

Project description

Code Climate


This package allows you to summarize text by reducing an article in size to several sentences retaining the idea of the text.

Besides of that the package extracts the following from the document:

  1. Canonical URL of the article
  2. Title of the article
  3. URL of the image characterizing this article
  4. Strips the document of excessive information (headers, footers, navigation, advertisement, etc.) and forms a clean HTML based on structured data of



easy_install wanish
pip install wanish


from wanish import Wanish
wanish = Wanish()

# getting doc's source canonical url
url = wanish.url
# getting document's title
title = wanish.title
# getting url of related image if document has it
image_url = wanish.image_url
# getting two-letter code of the document's language (en, de, es...)
language_code = wanish.language
# getting a clean html page of a document with article
clean_html = wanish.clean_html
# getting a short summarized description of the article reduced to several sentences (5 by default)
description = wanish.description

Available kwarg options for Wanish() class (all are optional):

wanish = Wanish(url=document_url,
                positive_keywords=["main", "story"],
                negative_keywords=["banner", "adv", "similar", "top-ad"],
                headers={'user-agent': 'test-purposes/0.0.1'})
  • url: Allows to pass an url of a document in constructor. If set, then it will automatically launch self.perform_url(url) after initialization. Default is None.
  • positive_keywords: A list of positive search patterns in classes and ids, for example: [“main”, “story”] . Default is None.
  • negative_keywords: A list of negative search patterns in classes and ids, for example: [“banner”, “adv”, “similar”, “top-ad”] . Default is None.
  • summary_sentences_qty: Maximum quantity of sentences in summarized text of the document. Set to 5 by default.
  • headers: Dict of additional custom headers for GET request to obtain web page of the article. Default is None.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for wanish, version 0.6.3
Filename, size File type Python version Upload date Hashes
Filename, size wanish-0.6.3.tar.gz (1.9 MB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page