Skip to main content

fast python port of arc90's readability tool

Project description

This code is under the Apache License 2.0. http://www.apache.org/licenses/LICENSE-2.0

This is a python port of a ruby port of arc90’s readability project

http://lab.arc90.com/experiments/readability/

In few words, Given a html document, it pulls out the main body text and cleans it up. It also can clean up title based on latest readability.js code.

Based on:

Usage:

from readability.readability import Document
import urllib
html = urllib.urlopen(url).read()
readable_article = Document(html).summary()
readable_title = Document(html).short_title()

Command-line usage:

python -m readability.readability -u http://pypi.python.org/pypi/readability-lxml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readability-lxml-0.2.1.zip (13.5 kB view details)

Uploaded Source

File details

Details for the file readability-lxml-0.2.1.zip.

File metadata

File hashes

Hashes for readability-lxml-0.2.1.zip
Algorithm Hash digest
SHA256 f36c00469b30bc0260c36fef267a65bf8ef37690b14cc2a4f30c923156b26a9e
MD5 6be4137f67619fcf7e0e60b749688561
BLAKE2b-256 acebf50632e2801176789380322d7cdce6a5c6b4c0bcc5b82dbba0acb275c0c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page