Skip to main content

Turn HTML into equivalent Markdown-structured text.

Project description

Build Status Coverage Status Downloads Version Wheel? Format License

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Usage: html2text [(filename|url) [encoding]]

Option

Description

--version

Show program’s version number and exit

-h, --help

Show this help message and exit

--ignore-links

Don’t include any formatting for links

--escape-all

Escape all special characters. Output is less readable, but avoids corner case formatting issues.

--reference-links

Use reference links instead of links to create markdown

--mark-code

Mark preformatted and code blocks with [code]…[/code]

For a complete list of options see the docs

Or you can use it from within Python:

>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.

Or with some configuration options:

>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
Hello, world!

>>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))

Hello, world!

>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))
Hello, [world](http://earth.google.com/)!

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

How to install

html2text is available on pypi https://pypi.python.org/pypi/html2text

$ pip install html2text

How to run unit tests

PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v

To see the coverage results:

coverage combine
coverage html

then open the ./htmlcov/index.html file in your browser.

Documentation

Documentation lives here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html2text-2018.1.9.tar.gz (52.2 kB view details)

Uploaded Source

Built Distribution

html2text-2018.1.9-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file html2text-2018.1.9.tar.gz.

File metadata

  • Download URL: html2text-2018.1.9.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for html2text-2018.1.9.tar.gz
Algorithm Hash digest
SHA256 627514fb30e7566b37be6900df26c2c78a030cc9e6211bda604d8181233bcdd4
MD5 db43de61793d431618bd0b298f9f7410
BLAKE2b-256 dd79f8387c4e82275a7b540e0b948d261a636eb5aedd1d23be8ca05fbf605726

See more details on using hashes here.

File details

Details for the file html2text-2018.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for html2text-2018.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 490db40fe5b2cd79c461cf56be4d39eb8ca68191ae41ba3ba79f6cb05b7dd662
MD5 551a1ab41c39a77ebfb7f4f39f3bfa15
BLAKE2b-256 1620de2b458ef434713053dd83209a03a5431ebe0527c8e14d9ae7838ff67d8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page