Convert HTML to markdown
Project description
html_to_markdown
This library is a refactored and modernized fork of markdownify, supporting Python 3.9 and above.
Differences with the Markdownify
- The refactored codebase uses a strict functional approach - no classes are involved.
- There is full typing with strict MyPy strict adherence and a py.typed file included.
- The
convert_to_markdown
function allows passing a pre-configured instance ofBeautifulSoup
instead of html. - This library releases follows standard semver. Its version v1.0.0 was branched from markdownify's v0.13.1, at which point versioning is no longer aligned.
Installation
pip install html_to_markdown
Usage
Convert an string HTML to Markdown:
from html_to_markdown import convert_to_markdown
convert_to_markdown('<b>Yay</b> <a href="http://github.com">GitHub</a>') # > '**Yay** [GitHub](http://github.com)'
Or pass a pre-configured instance of BeautifulSoup
:
from bs4 import BeautifulSoup
from html_to_markdown import convert_to_markdown
soup = BeautifulSoup('<b>Yay</b> <a href="http://github.com">GitHub</a>', 'lxml') # lxml requires an extra dependency.
convert_to_markdown(soup) # > '**Yay** [GitHub](http://github.com)'
Options
The convert_to_markdown
function accepts the following kwargs:
- autolinks (bool): Automatically convert valid URLs into Markdown links. Defaults to True.
- bullets (str): A string of characters to use for bullet points in lists. Defaults to '*+-'.
- code_language (str): Default language identifier for fenced code blocks. Defaults to an empty string.
- code_language_callback (Callable[[Any], str] | None): Function to dynamically determine the language for code blocks.
- convert (Iterable[str] | None): A list of tag names to convert to Markdown. If None, all supported tags are converted.
- default_title (bool): Use the default title when converting certain elements (e.g., links). Defaults to False.
- escape_asterisks (bool): Escape asterisks (*) to prevent unintended Markdown formatting. Defaults to True.
- escape_misc (bool): Escape miscellaneous characters to prevent conflicts in Markdown. Defaults to True.
- escape_underscores (bool): Escape underscores (_) to prevent unintended italic formatting. Defaults to True.
- heading_style (Literal["underlined", "atx", "atx_closed"]): The style to use for Markdown headings. Defaults to " underlined".
- keep_inline_images_in (Iterable[str] | None): Tags in which inline images should be preserved. Defaults to None.
- newline_style (Literal["spaces", "backslash"]): Style for handling newlines in text content. Defaults to "spaces".
- strip (Iterable[str] | None): Tags to strip from the output. Defaults to None.
- strong_em_symbol (Literal["", "_"]): Symbol to use for strong/emphasized text. Defaults to "".
- sub_symbol (str): Custom symbol for subscript text. Defaults to an empty string.
- sup_symbol (str): Custom symbol for superscript text. Defaults to an empty string.
- wrap (bool): Wrap text to the specified width. Defaults to False.
- wrap_width (int): The number of characters at which to wrap text. Defaults to 80.
- convert_as_inline (bool): Treat the content as inline elements (no block elements like paragraphs). Defaults to False.
CLI
For compatibility with the original markdownify, a CLI is provided. Use html_to_markdown example.html > example.md
or
pipe input from stdin:
cat example.html | html_to_markdown > example.md
Use html_to_markdown -h
to see all available options. They are the same as listed above and take the same arguments.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file html_to_markdown-1.1.0.tar.gz
.
File metadata
- Download URL: html_to_markdown-1.1.0.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6912217f555f526261096ea886e1a87073b1c5327228954315d94965871c1cd |
|
MD5 | 6980fa6fb5cfc30d9062d646d3ffd2c3 |
|
BLAKE2b-256 | 74d352475e5b023ced614b7738bec1d99386ad893c1cbdcdea63865a0db82d5f |
File details
Details for the file html_to_markdown-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: html_to_markdown-1.1.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1aa42c056b6f3606f7d137c90b893a655d11bc818b93fc534bafdde4ea21553b |
|
MD5 | 4057325f43bafd09479241f5214cd266 |
|
BLAKE2b-256 | 14e01c78aff17b862d2e0f0edea0f1f24a089ef71cd8393435afede9850f1f29 |