Skip to main content

Simple HTML cleanup utilities

Project description

Introduction

This package contains several handy python methods to cleanup HTML markup or perform other common changes. The cleanup is strict enough to be able to clean HTML pasted from MS Word or Apple Pages. This package also contains integration code for z3c.form to provide fields that automatically sanitize HTML on save.

The implementation is based on the Cleaner class from lxml.

Cleanup routines

All cleanup routines can be invoked through the single sanitize function. This functions takes an input string as input and will return a cleaned up version of that string. Here is a simple example:

>>> from htmllaundry import sanitize
>>> sanitize('Hello, <em>world</em>')
'<p>Hello, <em>world</em></p>'

The sanitize method takes an extra optional parameter with a lxml Cleaner instance, which can be used to use different filtering rules. htmllaundry includes three cleaners:

  • htmllaundry.cleaners.DocumentCleaner, which is the default cleaner. This cleaner will allow most safe tags, while stripping out inline styles and insecure markup.

  • htmllaundry.cleaners.LineCleaner is a more strict cleaner which only allows a few inline elements. This is useful in places where you only want to accept single-line input, for example in document titles.

  • htmllaundry.cleaners.CommentCleaner only allows a very limited set of HTML elements, and is designed to be useful for user provided comments.

If you want to go all the way you can also use StripMarkup to strip all markup from your input:

>>> from htmllaundry import StripMarkup
>>> StripMarkup('Hello, <em>world</em>')
'Hello, world'

z3c.form integration

If you want to use the z3c.form integration you should use the z3cform extra for this package:

install_requires=[
     ....
     htmllaundry [z3cform]
     ...
     ],

In addition you will need to load the ZCML. In your configure.zcml add a line like this:

<include package="htmllaundry" />

You can then use the HtmlText field type in your schemas. For example:

from zope.interface import Interface
from zope import schema
from htmllaundry.z3cform import HtmlText

class IDocument(Interface):
    title = schema.TextLine(
            title = _(u"Title"),
            required = True)

    description = HtmlText(
            title = _(u"Description"),
          required = True)

Please note that using HtmlText will not automatically give you a WYSYWIG widget.

Changelog

1.2 - February 15, 2010

  • Fix a typo in the documentation.

  • Strip trailing breaks.

1.1 - February 5, 2010

  • Add a simple StripMarkup method.

  • Add ZCML necessary for z3c.form integration.

1.0 - February 5, 2010

  • First release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmllaundry-1.2.tar.gz (5.8 kB view details)

Uploaded Source

File details

Details for the file htmllaundry-1.2.tar.gz.

File metadata

  • Download URL: htmllaundry-1.2.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for htmllaundry-1.2.tar.gz
Algorithm Hash digest
SHA256 191df702ecdcb0402866a1706a82eafcece1268a7e489d0b0e0a56b6f712db3e
MD5 12966eb7ac2cad30df1a7b920043f316
BLAKE2b-256 01b09de3e535df5f5a8bae96c8c71398d7a4e941a38ab37876456fe2f57c9c6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page