Python wrapper for HTML Tidy (tidylib) on Python 2 and 3
Project description
`PyTidyLib`_ is a Python package that wraps the `HTML Tidy`_ library. This
allows you, from Python code, to "fix" invalid (X)HTML markup. Some of the
library's many capabilities include:
* Clean up unclosed tags and unescaped characters such as ampersands
* Output HTML 4 or XHTML, strict or transitional, and add missing doctypes
* Convert named entities to numeric entities, which can then be used in XML
documents without an HTML doctype.
* Clean up HTML from programs such as Word (to an extent)
* Indent the output, including proper (i.e. no) indenting for ``pre`` elements,
which some (X)HTML indenting code overlooks.
Changes
=======
* 0.3.2: Initialization bug fix
* 0.3.1: find_library support while still allowing a list of library names
* 0.3.0: Refactored to use Tidy and PersistentTidy classes while keeping the
functional interface (which will lazily create a global Tidy() object) for
backward compatibility. You can now pass a list of library names and base
options when instantiating Tidy. The keep_doc argument is now deprecated
and does nothing; use PersistentTidy.
* 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.
* 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox tests.
Small example of use
====================
The following code cleans up an invalid HTML document and sets an option::
from tidylib import tidy_document
document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''',
options={'numeric-entities':1})
print document
print errors
Docs
====
Documentation is shipped with the source distribution and is available at
the `PyTidyLib`_ web page.
.. _`HTML Tidy`: http://tidy.sourceforge.net/
.. _`PyTidyLib`: http://countergram.com/open-source/pytidylib/
allows you, from Python code, to "fix" invalid (X)HTML markup. Some of the
library's many capabilities include:
* Clean up unclosed tags and unescaped characters such as ampersands
* Output HTML 4 or XHTML, strict or transitional, and add missing doctypes
* Convert named entities to numeric entities, which can then be used in XML
documents without an HTML doctype.
* Clean up HTML from programs such as Word (to an extent)
* Indent the output, including proper (i.e. no) indenting for ``pre`` elements,
which some (X)HTML indenting code overlooks.
Changes
=======
* 0.3.2: Initialization bug fix
* 0.3.1: find_library support while still allowing a list of library names
* 0.3.0: Refactored to use Tidy and PersistentTidy classes while keeping the
functional interface (which will lazily create a global Tidy() object) for
backward compatibility. You can now pass a list of library names and base
options when instantiating Tidy. The keep_doc argument is now deprecated
and does nothing; use PersistentTidy.
* 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.
* 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox tests.
Small example of use
====================
The following code cleans up an invalid HTML document and sets an option::
from tidylib import tidy_document
document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''',
options={'numeric-entities':1})
print document
print errors
Docs
====
Documentation is shipped with the source distribution and is available at
the `PyTidyLib`_ web page.
.. _`HTML Tidy`: http://tidy.sourceforge.net/
.. _`PyTidyLib`: http://countergram.com/open-source/pytidylib/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytidylib-0.3.2.tar.gz
(87.7 kB
view details)
File details
Details for the file pytidylib-0.3.2.tar.gz
.
File metadata
- Download URL: pytidylib-0.3.2.tar.gz
- Upload date:
- Size: 87.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22b1c8d75970d8064ff999c2369e98af1d0685417eda4c829a5c9f56764b0af3 |
|
MD5 | 06569f09914df642da09ba83dbec3112 |
|
BLAKE2b-256 | 2d5e4d2b5e2d443d56f444e2a3618eb6d044c97d14bf47cab0028872c0a468e0 |