Skip to main content

New Python wrapper for tidylib

Project description

A new Python wrapper for tidylib, which allows you to convert slightly invalid HTML/XHTML markup into valid markup. E.g. this will correct unescaped ampersands, some unclosed tags, missing elements, missing attributes, etc. Tidylib is highly configurable; it can output HTML or XHTML, and perform other functions such as converting named entities to numeric entities (named entities work only along with an HTML or XHTML doctype; numeric entities work in generic XML data).

Note 1: Now hosted directly on PyPi so the download should work for easy_install

Note 2: Unfortunately, neither this library, nor uTidyLib, nor a barebones test case seems to work with the prepackaged tidy.dll on Windows. Until this is fixed, this is a Linux/BSD/OS X/Cygwin library.

Trivial example of use:

from tidylib import tidy_document
document, errors = tidy_document('''<p>f&otilde;o <img src="bar.jpg">''',
    options={'numeric-entities':1})
print document
print errors

For documentation see the pytidylib project page

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytidylib-0.1.2.tar.gz (149.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page