Skip to main content

Python wrapper for HTML Tidy (tidylib) on Python 2 and 3

Project description

`PyTidyLib`_ is a Python package that wraps the `HTML Tidy`_ library. This
allows you, from Python code, to "fix" invalid (X)HTML markup. Some of the
library's many capabilities include:

* Clean up unclosed tags and unescaped characters such as ampersands
* Output HTML 4 or XHTML, strict or transitional, and add missing doctypes
* Convert named entities to numeric entities, which can then be used in XML
documents without an HTML doctype.
* Clean up HTML from programs such as Word (to an extent)
* Indent the output, including proper (i.e. no) indenting for ``pre`` elements,
which some (X)HTML indenting code overlooks.

Changes
=======

* 0.3.2: Initialization bug fix

* 0.3.1: find_library support while still allowing a list of library names

* 0.3.0: Refactored to use Tidy and PersistentTidy classes while keeping the
functional interface (which will lazily create a global Tidy() object) for
backward compatibility. You can now pass a list of library names and base
options when instantiating Tidy. The keep_doc argument is now deprecated
and does nothing; use PersistentTidy.

* 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.

* 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox tests.

Small example of use
====================

The following code cleans up an invalid HTML document and sets an option::

from tidylib import tidy_document
document, errors = tidy_document('''<p>f&otilde;o <img src="bar.jpg">''',
options={'numeric-entities':1})
print document
print errors

Docs
====

Documentation is shipped with the source distribution and is available at
the `PyTidyLib`_ web page.

.. _`HTML Tidy`: http://tidy.sourceforge.net/
.. _`PyTidyLib`: http://countergram.com/open-source/pytidylib/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytidylib-0.3.2.tar.gz (87.7 kB view details)

Uploaded Source

File details

Details for the file pytidylib-0.3.2.tar.gz.

File metadata

  • Download URL: pytidylib-0.3.2.tar.gz
  • Upload date:
  • Size: 87.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pytidylib-0.3.2.tar.gz
Algorithm Hash digest
SHA256 22b1c8d75970d8064ff999c2369e98af1d0685417eda4c829a5c9f56764b0af3
MD5 06569f09914df642da09ba83dbec3112
BLAKE2b-256 2d5e4d2b5e2d443d56f444e2a3618eb6d044c97d14bf47cab0028872c0a468e0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page