Skip to main content

Convert HTML to plain text

Project description

htmltextconvert renders HTML to plain text, for example to autogenerate a plain text versions of HTML emails, or to index HTML documents for search.

It differs from other packages in these ways:

  • Pure Python, no dependencies

  • High quality, well tested code

  • Permissive license (Apache)

  • Renders the HTML to text suitable for an text/plain email body (it doesn’t aim to convert to a structured text format like markdown, but rather at giving a readable text-only representation of the rendered HTML).

Usage:

>>> import htmltextconvert
>>> print(
...     htmltextconvert.html_to_text(
...         """
...         <p>This is a paragraph.</p>
...         <p>This is another paragraph.</p>
...         """
...     )
... )
This is a paragraph

This is another paragraph

htmltextconvert handles the following HTML tags:

  • Character entity references (&name;, &#nnnn;, &#xhhhh)

  • Unordered lists (<ul>)

  • Ordered lists (<ol>)

  • Paragraphs (<p>)

  • Block quotes (<blockquote>)

  • Linebreaks (<br>)

  • Links (<a href="…">)

  • Bold (<strong>)

  • Italic (<em>)

  • Code (<code>)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmltextconvert-0.1.2.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

htmltextconvert-0.1.2-py3-none-any.whl (9.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page