Skip to main content

New Python wrapper for tidylib

Project description

A new Python wrapper for tidylib, which allows you to convert slightly invalid HTML/XHTML markup into valid markup. E.g. this will correct unescaped ampersands, some unclosed tags, missing elements, missing attributes, etc. Tidylib is highly configurable; it can output HTML or XHTML, and perform other functions such as converting named entities to numeric entities (named entities work only along with an HTML or XHTML doctype; numeric entities work in generic XML data).

Note 1: Now hosted directly on PyPi so the download should work for easy_install

Note 2: Unfortunately, neither this library, nor uTidyLib, nor a barebones test case seems to work with the prepackaged tidy.dll on Windows. Until this is fixed, this is a Linux/BSD/OS X/Cygwin library.

Trivial example of use:

from tidylib import tidy_document
document, errors = tidy_document('''<p>f&otilde;o <img src="bar.jpg">''',
    options={'numeric-entities':1})
print document
print errors

For documentation see the pytidylib project page

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pytidylib, version 0.1.2
Filename, size File type Python version Upload date Hashes
Filename, size pytidylib-0.1.2.tar.gz (149.3 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page