htmllaundry

Simple HTML cleanup utilities

These details have been verified by PyPI

Maintainers

alert goibhniu pilz reinhardt thet thomasw wichert

These details have not been verified by PyPI

Project description

Introduction

This package contains several handy python methods to cleanup HTML markup or perform other common changes. The cleanup is strict enough to be able to clean HTML pasted from MS Word or Apple Pages. This package also contains integration code for z3c.form to provide fields that automatically sanitize HTML on save.

The implementation is based on the Cleaner class from lxml.

Cleanup routines

All cleanup routines can be invoked through the single sanitize function. This functions takes an input string as input and will return a cleaned up version of that string. Here is a simple example:

>>> from htmllaundry import sanitize
>>> sanitize('Hello, <em>world</em>')
'<p>Hello, <em>world</em></p>'

The sanitize method takes an extra optional parameter with a lxml Cleaner instance, which can be used to use different filtering rules. htmllaundry includes three cleaners:

htmllaundry.cleaners.DocumentCleaner, which is the default cleaner. This cleaner will allow most safe tags, while stripping out inline styles and insecure markup.
htmllaundry.cleaners.LineCleaner is a more strict cleaner which only allows a few inline elements. This is useful in places where you only want to accept single-line input, for example in document titles.
htmllaundry.cleaners.CommentCleaner only allows a very limited set of HTML elements, and is designed to be useful for user provided comments. It will also force all external links to open in a new browser window.

If you want to go all the way you can also use strip_markup to strip all markup from your input:

>>> from htmllaundry import strip_markup
>>> strip_markup('Hello, <em>world</em>')
'Hello, world'

z3c.form integration

If you want to use the z3c.form integration you should use the z3cform extra for this package:

install_requires=[
     ....
     htmllaundry [z3cform]
     ...
     ],

In addition you will need to load the ZCML. In your configure.zcml add a line like this:

<include package="htmllaundry" />

You can then use the HtmlText field type in your schemas. For example:

from zope.interface import Interface
from zope import schema
from htmllaundry.z3cform import HtmlText

class IDocument(Interface):
    title = schema.TextLine(
            title = _(u"Title"),
            required = True)

    description = HtmlText(
            title = _(u"Description"),
          required = True)

Please note that using HtmlText will not automatically give you a WYSYWIG widget.

Changelog

2.2 (2020-01-28)

Use @implementer and @adapter class decorators for Python3 compatibility [ale-rt]

2.1 - May 10, 2016

Do not remove empty <a> tags that could be used as anchors.
When removing empty tags, allow to define additional tags that are considered OK to be empty

2.0 - December 7, 2012

When wrapping unwrapped text do not create separate wrappers for inline elements.
Use PEP8 naming for all functions. The old names for public methods will continue to work for backwards compatibility.
Add support for Python 3.

1.10 - May 17, 2011

Add option to sanitize to specify a different wrap element or skip wrapping completely.

1.9 - April 27, 2011

Add MANIFEST.in to faciliate releases not made from subversion.
Fix all cleaners to strip javascript. This fixes issue 1.

1.8 - November 30, 2010

Remove link target enforcement from hardcoded code path from sanitize. This makes it possible to use the new link_target cleaner option.

1.7 - November 30, 2010

Make forcing of target attributes on externals linke configurable via a new link_target option in the cleaners. Only enable this option for the CommentCleaner.

1.6 - November 18, 2010

Correct whitespace test for wrapping bare text as well.

1.5 - November 18, 2010

Correct whitespace checks to handle all unicode whitespace. This fixes problems with xA0 (or   in HTML-speak) being treated as text.

1.4 - August 3, 2010

Small code cleanup.
Strip leading breaks.

1.3 - July 30, 2010

Strip all top level br elements. Breaks are fine in blocklevel elements, but should not be used to add vertical spacing between block elements.

1.2 - February 15, 2010

Fix a typo in the documentation.
Strip trailing breaks.

1.1 - February 5, 2010

Add a simple StripMarkup method.
Add ZCML necessary for z3c.form integration.

1.0 - February 5, 2010

First release

Project details

These details have been verified by PyPI

Maintainers

alert goibhniu pilz reinhardt thet thomasw wichert

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.2

Jan 28, 2020

2.1

May 10, 2016

2.0

Dec 7, 2012

1.10

May 17, 2011

1.9

Apr 27, 2011

1.8

Nov 30, 2010

1.7

Nov 30, 2010

1.6

Nov 18, 2010

1.5

Nov 18, 2010

1.4

Aug 3, 2010

1.3

Jul 30, 2010

1.2

Feb 15, 2010

1.1

Feb 5, 2010

1.0

Feb 5, 2010

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmllaundry-2.2.tar.gz (10.8 kB view details)

Uploaded Jan 28, 2020 Source

File details

Details for the file htmllaundry-2.2.tar.gz.

File metadata

Download URL: htmllaundry-2.2.tar.gz
Upload date: Jan 28, 2020
Size: 10.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/2.7.17

File hashes

Hashes for htmllaundry-2.2.tar.gz
Algorithm	Hash digest
SHA256	`9124f067d3c06ef2613e2cc246b2fde2299802280a8b0e60dc504137085f0334`
MD5	`7783edc1b67ab1d0627cf77a09c825a8`
BLAKE2b-256	`36f5c7acf875d41a2a742396df3654416444cc5e0e5c262d1c51a68c9842a24c`

See more details on using hashes here.

htmllaundry 2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Introduction

Cleanup routines

z3c.form integration

Changelog

2.2 (2020-01-28)

2.1 - May 10, 2016

2.0 - December 7, 2012

1.10 - May 17, 2011

1.9 - April 27, 2011

1.8 - November 30, 2010

1.7 - November 30, 2010

1.6 - November 18, 2010

1.5 - November 18, 2010

1.4 - August 3, 2010

1.3 - July 30, 2010

1.2 - February 15, 2010

1.1 - February 5, 2010

1.0 - February 5, 2010

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes