Transform your HTML into clean, easy-to-read markdown with pyhtml2md.
Project description
pyhtml2md
pyhtml2md provides a way to use the html2md C++ library in Python. html2md is a fast and reliable library for converting HTML content into markdown.
[TOC]
Installation
You can install using pip:
pip3 install pyhtml2md
Basic usage
Here is an example of how to use the pyhtml2md to convert HTML to markdown:
import pyhtml2md
markdown = pyhtml2md.convert("<h1>Hello, world!</h1>")
print(markdown)
The convert
function takes an HTML string as input and returns a markdown string.
Advanced usage
pyhtml2md provides a Options
class to customize the generation process.
You can find all information on the c++ documentation
Here is an example:
import pyhtml2md
options = pyhtml2md.Options()
options.splitLines = False
converter = pyhtml2md.Converter("<h1>Hello Python!</h1>", options)
markdown = converter.convert()
print(markdown)
print(converter.ok())
Supported Tags
pyhtml2md supports the following HTML tags:
Tag | Description | Comment |
---|---|---|
a |
Anchor or link | Supports the href , name and title attributes. |
b |
Bold | |
blockquote |
Indented paragraph | |
br |
Line break | |
cite |
Inline citation | Same as i . |
code |
Code | |
dd |
Definition data | |
del |
Strikethrough | |
dfn |
Definition | Same as i . |
div |
Document division | |
em |
Emphasized | Same as i . |
h1 |
Level 1 heading | |
h2 |
Level 2 heading | |
h3 |
Level 3 heading | |
h4 |
Level 4 heading | |
h5 |
Level 5 heading | |
h6 |
Level 6 heading | |
head |
Document header | Ignored. |
hr |
Horizontal line | |
i |
Italic | |
img |
Image | Supports src , alt , title attributes. |
li |
List item | |
meta |
Meta-information | Ignored. |
ol |
Ordered list | |
p |
Paragraph | |
pre |
Preformatted text | Works only with code . |
s |
Strikethrough | Same as del . |
span |
Grouped elements | Does nothing. |
strong |
Strong | Same as b . |
table |
Table | Tables are formatted! |
tbody |
Table body | Does nothing. |
td |
Table data cell | Uses align from th . |
tfoot |
Table footer | Does nothing. |
th |
Table header cell | Supports the align attribute. |
thead |
Table header | Does nothing. |
title |
Document title | Same as h1 . |
tr |
Table row | |
u |
Underlined | Uses HTML. |
ul |
Unordered list |
License
pyhtml2md is licensed under The MIT License (MIT)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyhtml2md-1.6.0.tar.gz
.
File metadata
- Download URL: pyhtml2md-1.6.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47a1b173ca49610457e438dfea57f89f16dbe6cbd26ca1ee0b5fd2b61f5fe60c |
|
MD5 | 81c3d333a84ffe1254cd1046a59dd2fd |
|
BLAKE2b-256 | 9c6d16cd30465700df4797d778420154eb114c6df5138ae144c9946bed1a98cb |