Skip to main content

Convert HTML to plain text

Project description

htmltextconvert renders HTML to plain text, for example to autogenerate a plain text versions of HTML emails, or to index HTML documents for search.

It differs from other packages in these ways:

  • Pure Python, no dependencies
  • High quality, well tested code
  • Permissive license (Apache)
  • Renders the HTML to text suitable for an text/plain email body (it doesn’t aim to convert to a structured text format like markdown, but rather at giving a readable text-only representation of the rendered HTML).

Usage:

>>> import htmltextconvert
>>> print(
...     htmltextconvert.html_to_text(
...         """
...         <p>This is a paragraph.</p>
...         <p>This is another paragraph.</p>
...         """
...     )
... )
This is a paragraph

This is another paragraph

htmltextconvert handles the following HTML tags:

  • Character entity references (&name;, &#nnnn;, &#xhhhh)
  • Unordered lists (<ul>)
  • Ordered lists (<ol>)
  • Paragraphs (<p>)
  • Block quotes (<blockquote>)
  • Linebreaks (<br>)
  • Links (<a href="…">)
  • Bold (<strong>)
  • Italic (<em>)
  • Code (<code>)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for htmltextconvert, version 0.1.2
Filename, size File type Python version Upload date Hashes
Filename, size htmltextconvert-0.1.2-py3-none-any.whl (9.9 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size htmltextconvert-0.1.2.tar.gz (6.0 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page