html2markdown·PyPI

Conservatively convert html to markdown

Project description

Experimental

Purpose: Converts html to markdown while preserving unsupported html markup. The goal is to generate markdown that can be converted back into html. This is the major difference between html2markdown and html2text. The latter doesn’t purport to be reversible.

Usage example

import html2markdown
print html2markdown.convert('<h2>Test</h2><pre><code>Here is some code</code></pre>')

Output:

## Test

    Here is some code

Information and caveats

Does not convert the content of block-type tags other than <p> – such as <div> tags – into Markdown

It does convert to markdown the content of inline-type tags, e.g. <span>.

Input: <div>this is stuff. <strong>stuff</strong></div>

Result: <div>this is stuff. <strong>stuff</strong></div>

Input: <p>this is stuff. <strong>stuff</strong></p>

Result: this is stuff. __stuff__ (surrounded by a newline on either side)

Input: <span style="text-decoration:line-through;">strike <strong>through</strong> some text</span> here

Result: <span style="text-decoration:line-through;">strike __through__ some text</span> here

Except in unprocessed block-type tags, formatting characters are escaped

Input: <p>**escape me?**</p> (in html, we would use <strong> here)

Result: \*\*escape me?\*\*

Input: <span>**escape me?**</span>

Result: <span>\*\*escape me?\*\*</span>

Input: <div>**escape me?**</div>

Result: <div>**escape me?**</div> (block-type)

Attributes not supported by Markdown are kept

Example: <a href="http://myaddress" title="click me"><strong>link</strong></a>

Result: [__link__](http://myaddress "click me")

Example: <a onclick="javascript:dostuff()" href="http://myaddress" title="click me"><strong>link</strong></a>

Result: <a onclick="javascript:dostuff()" href="http://myaddress" title="click me">__link__</a> (the attribute onclick is not supported, so the tag is left alone)

Limitations

Tables are kept as html.

Changes

0.1.7:

Improved handling of inline tags.
Fix: Ignore <a> tags without an href attribute.
Improve escaping.

0.1.6: Added tests and support for Python versions below 2.7.

0.1.5: Fix Unicode issue in Python 3.

0.1.0: First version.

Project details

Release history Release notifications | RSS feed

This version

0.1.7

Feb 9, 2019

0.1.6.post0

Dec 1, 2018

0.1.6

Jun 6, 2018

0.1.5

Feb 27, 2017

0.1.2

Feb 6, 2017

0.1.1

Jan 22, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html2markdown-0.1.7.tar.gz (5.3 kB view details)

Uploaded Feb 9, 2019 Source

File details

Details for the file html2markdown-0.1.7.tar.gz.

File metadata

Download URL: html2markdown-0.1.7.tar.gz
Upload date: Feb 9, 2019
Size: 5.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2

File hashes

Hashes for html2markdown-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`92baf932c7f216be6d9459a191d45b6401e204bda7a5413febafa875512cfa8c`
MD5	`d066e82ee5f598c6d721dfa0529e2706`
BLAKE2b-256	`ba05666b8105c1c45ee05fcbcb210176c73638710e402b99c5968c5dfdf3c67d`

See more details on using hashes here.

html2markdown 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta