Skip to main content

Turn HTML into equivalent Markdown-structured text.

Project description

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Usage: html2text.py [(filename|url) [encoding]]

Options:
--version

show program’s version number and exit

-h, --help

show this help message and exit

--ignore-links

don’t include any formatting for links

--ignore-images

don’t include any formatting for images

-g, --google-doc

convert an html-exported Google Document

-d, --dash-unordered-list

use a dash rather than a star for unordered list items

-b BODY_WIDTH, --body-width=BODY_WIDTH

number of characters per output line, 0 for no wrap

-i LIST_INDENT, --google-list-indent=LIST_INDENT

number of pixels Google indents nested lists

-s, --hide-strikethrough

hide strike-through text. only relevent when -g is specified as well

Or you can use it from within Python:

import html2text
print html2text.html2text("<p>Hello, world.</p>")

Or with some configuration options:

import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")

_Originally written by Aaron Swartz. This code is distributed under the GPLv3._

## How to do a release

  1. Update the version in html2text.py

  2. Update the version in setup.py

  3. Run python setup.py sdist upload

## How to run unit tests

cd test/
python run_tests.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

djangoplicity-html2text-3.200.3.tar.gz (22.0 kB view details)

Uploaded Source

File details

Details for the file djangoplicity-html2text-3.200.3.tar.gz.

File metadata

  • Download URL: djangoplicity-html2text-3.200.3.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for djangoplicity-html2text-3.200.3.tar.gz
Algorithm Hash digest
SHA256 95f8d19b85f8e95d9cfd5ef5949b47fb9cd162bdeb257526b339b369e24c4dbe
MD5 43cf8f20ddf56fce490d425886d38bf2
BLAKE2b-256 827adfcedbfced86eb1f897b51ec8f6c25801b80952b7faa6b7bc5714c5e168d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page