New Python wrapper for tidylib
Project description
A new Python wrapper for tidylib, which allows you to convert slightly invalid HTML/XHTML markup into valid markup. E.g. this will correct unescaped ampersands, some unclosed tags, missing elements, missing attributes, etc. Tidylib is highly configurable; it can output HTML or XHTML, and perform other functions such as converting named entities to numeric entities (named entities work only along with an HTML or XHTML doctype; numeric entities work in generic XML data).
Note 1: Now hosted directly on PyPi so the download should work for easy_install
Note 2: Unfortunately, neither this library, nor uTidyLib, nor a barebones test case seems to work with the prepackaged tidy.dll on Windows. Until this is fixed, this is a Linux/BSD/OS X/Cygwin library.
Trivial example of use:
from tidylib import tidy_document document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''', options={'numeric-entities':1}) print document print errors
For documentation see the pytidylib project page
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.