Skip to main content

Turn HTML into equivalent Markdown-structured text.

Project description

html2text

CI codecov

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Usage: html2text [filename [encoding]]

Option Description
--version Show program's version number and exit
-h, --help Show this help message and exit
--ignore-links Don't include any formatting for links
--escape-all Escape all special characters. Output is less readable, but avoids corner case formatting issues.
--reference-links Use reference links instead of links to create markdown
--mark-code Mark preformatted and code blocks with [code]...[/code]

For a complete list of options see the docs

Or you can use it from within Python:

>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.

Or with some configuration options:

>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!

>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))

Hello, world!

>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

How to install

html2text is available on pypi https://pypi.org/project/html2text/

$ pip install html2text

Development

How to run unit tests

$ tox

To see the coverage results:

$ coverage html

then open the ./htmlcov/index.html file in your browser.

Code Quality & Pre Commit

The CI runs several linting steps, including:

  • mypy
  • Flake8
  • Black

To make sure the code passes the CI linting steps, run:

$ tox -e pre-commit

Documentation

Documentation lives here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html2text-2025.4.15.tar.gz (64.3 kB view details)

Uploaded Source

Built Distribution

html2text-2025.4.15-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file html2text-2025.4.15.tar.gz.

File metadata

  • Download URL: html2text-2025.4.15.tar.gz
  • Upload date:
  • Size: 64.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for html2text-2025.4.15.tar.gz
Algorithm Hash digest
SHA256 948a645f8f0bc3abe7fd587019a2197a12436cd73d0d4908af95bfc8da337588
MD5 d4fb2b23350c6fff96dd1f47e35acb56
BLAKE2b-256 f827e158d86ba1e82967cc2f790b0cb02030d4a8bef58e0c79a8590e9678107f

See more details on using hashes here.

Provenance

The following attestation bundles were made for html2text-2025.4.15.tar.gz:

Publisher: pypi.yml on Alir3z4/html2text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file html2text-2025.4.15-py3-none-any.whl.

File metadata

File hashes

Hashes for html2text-2025.4.15-py3-none-any.whl
Algorithm Hash digest
SHA256 00569167ffdab3d7767a4cdf589b7f57e777a5ed28d12907d8c58769ec734acc
MD5 04b36d8960a922593d1f49565abb6073
BLAKE2b-256 1d841a0f9555fd5f2b1c924ff932d99b40a0f8a6b12f6dd625e2a47f415b00ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for html2text-2025.4.15-py3-none-any.whl:

Publisher: pypi.yml on Alir3z4/html2text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page