Skip to main content

Convert HTML to markdown

Project description

html_to_markdown

This library is a refactored and modernized fork of markdownify, supporting Python 3.9 and above.

Differences with the Markdownify

  • The refactored codebase uses a strict functional approach - no classes are involved.
  • There is full typing with strict MyPy strict adherence and a py.typed file included.
  • The convert_to_markdown function allows passing a pre-configured instance of BeautifulSoup instead of html.
  • This library releases follows standard semver. Its version v1.0.0 was branched from markdownify's v0.13.1, at which point versioning is no longer aligned.

Installation

pip install html_to_markdown

Usage

Convert an string HTML to Markdown:

from html_to_markdown import convert_to_markdown

convert_to_markdown('<b>Yay</b> <a href="http://github.com">GitHub</a>')  # > '**Yay** [GitHub](http://github.com)'

Or pass a pre-configured instance of BeautifulSoup:

from bs4 import BeautifulSoup
from html_to_markdown import convert_to_markdown

soup = BeautifulSoup('<b>Yay</b> <a href="http://github.com">GitHub</a>', 'lxml')  # lxml requires an extra dependency.

convert_to_markdown(soup)  # > '**Yay** [GitHub](http://github.com)'

Options

The convert_to_markdown function accepts the following kwargs:

  • autolinks (bool): Automatically convert valid URLs into Markdown links. Defaults to True.
  • bullets (str): A string of characters to use for bullet points in lists. Defaults to '*+-'.
  • code_language (str): Default language identifier for fenced code blocks. Defaults to an empty string.
  • code_language_callback (Callable[[Any], str] | None): Function to dynamically determine the language for code blocks.
  • convert (Iterable[str] | None): A list of tag names to convert to Markdown. If None, all supported tags are converted.
  • default_title (bool): Use the default title when converting certain elements (e.g., links). Defaults to False.
  • escape_asterisks (bool): Escape asterisks (*) to prevent unintended Markdown formatting. Defaults to True.
  • escape_misc (bool): Escape miscellaneous characters to prevent conflicts in Markdown. Defaults to True.
  • escape_underscores (bool): Escape underscores (_) to prevent unintended italic formatting. Defaults to True.
  • heading_style (Literal["underlined", "atx", "atx_closed"]): The style to use for Markdown headings. Defaults to " underlined".
  • keep_inline_images_in (Iterable[str] | None): Tags in which inline images should be preserved. Defaults to None.
  • newline_style (Literal["spaces", "backslash"]): Style for handling newlines in text content. Defaults to "spaces".
  • strip (Iterable[str] | None): Tags to strip from the output. Defaults to None.
  • strong_em_symbol (Literal["", "_"]): Symbol to use for strong/emphasized text. Defaults to "".
  • sub_symbol (str): Custom symbol for subscript text. Defaults to an empty string.
  • sup_symbol (str): Custom symbol for superscript text. Defaults to an empty string.
  • wrap (bool): Wrap text to the specified width. Defaults to False.
  • wrap_width (int): The number of characters at which to wrap text. Defaults to 80.
  • convert_as_inline (bool): Treat the content as inline elements (no block elements like paragraphs). Defaults to False.

CLI

For compatibility with the original markdownify, a CLI is provided. Use html_to_markdown example.html > example.md or pipe input from stdin:

cat example.html | html_to_markdown > example.md

Use html_to_markdown -h to see all available options. They are the same as listed above and take the same arguments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_to_markdown-1.1.0.tar.gz (10.8 kB view hashes)

Uploaded Source

Built Distribution

html_to_markdown-1.1.0-py3-none-any.whl (13.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page