Skip to main content

open source implementation of summly

Project description

About

This package allows you to summarize text by reducing an article in size to several sentences retaining the idea of the text.

Besides of that the package extracts the following from the document:

  1. Canonical URL of the article

  2. Title of the article

  3. URL of the image characterizing this article

  4. Strips the document of excessive information (headers, footers, navigation, advertisement, etc.) and forms a clean HTML based on structured data of schema.org

DEMO

Installation

easy_install wanish
or
pip install wanish

Usage

from wanish import Wanish
wanish = Wanish()
wanish.perform_url(document_url)

# getting doc's source canonical url
url = wanish.url
# getting document's title
title = wanish.title
# getting url of related image if document has it
image_url = wanish.image_url
# getting two-letter code of the document's language (en, de, es...)
language_code = wanish.language
# getting a clean html page of a document with article
clean_html = wanish.clean_html
# getting a short summarized description of the article reduced to several sentences (5 by default)
description = wanish.description

Available kwarg options for Wanish() class:

wanish = Wanish(url=document_url,
                positive_keywords=["main", "story"],
                negative_keywords=["banner", "adv", "similar", "top-ad"],
                summary_sentences_qty=5)
  • url: Allows to pass an url of a document in constructor. If set, then it will automatically launch self.perform_url(url) after initialization.

  • positive_keywords: A list of positive search patterns in classes and ids, for example: [“main”, “story”]

  • negative_keywords: A list of negative search patterns in classes and ids, for example: [“banner”, “adv”, “similar”, “top-ad”]

  • summary_sentences_qty: Maximum quantity of sentences in summarized text of the document. Set to 5 by default.

Special Thanks

http://www.apache.org/licenses/LICENSE-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wanish-0.3.1.tar.gz (1.9 MB view details)

Uploaded Source

File details

Details for the file wanish-0.3.1.tar.gz.

File metadata

  • Download URL: wanish-0.3.1.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wanish-0.3.1.tar.gz
Algorithm Hash digest
SHA256 b2bf6b7cb468fbec1cce61418cd4034a8f388254289533e830e9f96b7884f92a
MD5 a1ae91e4f2e6c762636802733d4b41d7
BLAKE2b-256 da0d5d2abafa8eb625f596c3ca8c0991f41d6014347987e6c5412eb0e3a06ea1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page