Turn HTML into equivalent Markdown-structured text.
Project description
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: html2text.py [(filename|url) [encoding]]
- Options:
- --version
show program’s version number and exit
- -h, --help
show this help message and exit
- --ignore-links
don’t include any formatting for links
- --ignore-images
don’t include any formatting for images
- -g, --google-doc
convert an html-exported Google Document
- -d, --dash-unordered-list
use a dash rather than a star for unordered list items
- -b BODY_WIDTH, --body-width=BODY_WIDTH
number of characters per output line, 0 for no wrap
- -i LIST_INDENT, --google-list-indent=LIST_INDENT
number of pixels Google indents nested lists
- -s, --hide-strikethrough
hide strike-through text. only relevent when -g is specified as well
Or you can use it from within Python:
import html2text
print html2text.html2text("<p>Hello, world.</p>")
Or with some configuration options:
import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
_Originally written by Aaron Swartz. This code is distributed under the GPLv3._
## How to do a release
Update the version in html2text.py
Update the version in setup.py
Run python setup.py sdist upload
## How to run unit tests
cd test/
python run_tests.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for djangoplicity-html2text-3.200.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95f8d19b85f8e95d9cfd5ef5949b47fb9cd162bdeb257526b339b369e24c4dbe |
|
MD5 | 43cf8f20ddf56fce490d425886d38bf2 |
|
BLAKE2b-256 | 827adfcedbfced86eb1f897b51ec8f6c25801b80952b7faa6b7bc5714c5e168d |