A Python HTML to Markdown parser

These details have not been verified by PyPI

Project links

Homepage

Project description

一款Python版本的HTML转markdown解析器，不使用任何第三方工具，实验demo

请勿使用于生成环境，这个只是一次尝试demo项目

注意：

无法解析多层级HTML
只能是单Node
这项目目前只针对 tensorflow-docs 项目
存在个别自定义标签无法识别，基本可以适用平常场景
python官方html_parser https://docs.python.org/zh-cn/3.7/library/html.parser.html
html.parser 源码 https://github.com/python/cpython/blob/3.7/Lib/html/parser.py
不支持美化后的HTML内容，需要内容紧凑

parser

single html node element
infinite html list node element
img
head html node element

todo 开发中

核心的问题是，粘在一起的代码如何拆分？
本质还是要分割的，但具体怎么分割呢？
table

bug

已解决list算法问题：
- 当前的list 标签算法无法解析这种结构：
- 因为算法中，假定是依次序性ul组成结束的标签
- 核心算法一：算出开始标签的level
- 核心算法二：根据左边的开始标签索引值算出其所对应的右边索引值序列，我自己给他起了一个炫酷拽炸天的名字：标记逆序奇偶互斥算法
- 上面两个算法我自己算出来的，第一个花了两天，第二个花了1-2周

install

pip install pyhtmd

usage

from pyhtmd import Pyhtmd
html="<code> Hello, world ! by Pyhtmd. </code>"
md= Pyhtmd(html)
content=md.markdown()
print(content) # `Hello, world ! by Pyhtmd.`

API

Pyhtmd(html, language="", img=True )

language：类型 string （js、python、java等）
img:{Boolean}，默认True，可以不需要img渲染

from pyhtmd import Pyhtmd
html="<pre><code>import time\n print(time.time()) </code><pre>"
md= Pyhtmd(html,language="python")
content=md.markdown()
print(content) # `Hello, world ! by Pyhtmd.`

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.2

Jan 1, 2020

1.0.1

Dec 19, 2019

1.0.0

Dec 19, 2019

0.1.4

Nov 23, 2019

0.1.2

Nov 4, 2019

0.1.1

Oct 31, 2019

0.1.0

Oct 31, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhtmd-1.0.2.tar.gz (11.3 kB view details)

Uploaded Jan 1, 2020 Source

File details

Details for the file pyhtmd-1.0.2.tar.gz.

File metadata

Download URL: pyhtmd-1.0.2.tar.gz
Upload date: Jan 1, 2020
Size: 11.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for pyhtmd-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`71c3dbd0e59aeee318117e209d117a46560277eaf6b32aed3f2e231f216c916b`
MD5	`3d68bf9d40e60ec6a50d65a928dbc4df`
BLAKE2b-256	`76ef0955ad1bb076ad2ba2dad27f4c4e8568740568b59fa6cd45414ccd98b91f`

See more details on using hashes here.

pyhtmd 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

一款Python版本的HTML转markdown解析器，不使用任何第三方工具，实验demo

注意：

parser

todo 开发中

bug

install

usage

API

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes