Skip to main content

Convert HTML to markdown.

Project description

GitHub Workflow Status Pypi version License Pypi Downloads


pip install markdownify


Convert some HTML to Markdown:

from markdownify import markdownify as md
md('<b>Yay</b> <a href="">GitHub</a>')  # > '**Yay** [GitHub]('

Specify tags to exclude (blacklist):

from markdownify import markdownify as md
md('<b>Yay</b> <a href="">GitHub</a>', strip=['a'])  # > '**Yay** GitHub'

...or specify the tags you want to include (whitelist):

from markdownify import markdownify as md
md('<b>Yay</b> <a href="">GitHub</a>', convert=['b'])  # > '**Yay** GitHub'


Markdownify supports the following options:

A list of tags to strip (blacklist). This option can’t be used with the convert option.
A list of tags to convert (whitelist). This option can’t be used with the strip option.
A boolean indicating whether the “automatic link” style should be used when a a tag’s contents match its href. Defaults to True.
A boolean to enable setting the title of a link to its href, if no title is given. Defaults to False.
Defines how headings should be converted. Accepted values are ATX, ATX_CLOSED, SETEXT, and UNDERLINED (which is an alias for SETEXT). Defaults to UNDERLINED.
An iterable (string, list, or tuple) of bullet styles to be used. If the iterable only contains one item, it will be used regardless of how deeply lists are nested. Otherwise, the bullet will alternate based on nesting level. Defaults to '*+-'.
In markdown, both * and _ are used to encode strong or emphasized texts. Either of these symbols can be chosen by the options ASTERISK (default) or UNDERSCORE respectively.
sub_symbol, sup_symbol
Define the chars that surround <sub> and <sup> text. Defaults to an empty string, because this is non-standard behavior. Could be something like ~ and ^ to result in ~sub~ and ^sup^.
Defines the style of marking linebreaks (<br>) in markdown. The default value SPACES of this option will adopt the usual two spaces and a newline, while BACKSLASH will convert a linebreak to \\n (a backslash an a newline). While the latter convention is non-standard, it is commonly preferred and supported by a lot of interpreters.
Defines the language that should be assumed for all <pre> sections. Useful, if all code on a page is in the same programming language and should be annotated with ```python or similar. Defaults to '' (empty string) and can be any string.
If set to False, do not escape _ to \_ in text. Defaults to True.

Options may be specified as kwargs to the markdownify function, or as a nested Options class in MarkdownConverter subclasses.

Creating Custom Converters

If you have a special usecase that calls for a special conversion, you can always inherit from MarkdownConverter and override the method you want to change:

from markdownify import MarkdownConverter

class ImageBlockConverter(MarkdownConverter):
    Create a custom MarkdownConverter that adds two newlines after an image
    def convert_img(self, el, text, convert_as_inline):
        return super().convert_img(el, text, convert_as_inline) + '\n\n'

# Create shorthand method for conversion
def md(html, **options):
    return ImageBlockConverter(**options).convert(html)


To run tests:

python test

To lint:

python lint

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for markdownify, version 0.10.2
Filename, size File type Python version Upload date Hashes
Filename, size markdownify-0.10.2.tar.gz (12.9 kB) File type Source Python version None Upload date Hashes View
Filename, size markdownify-0.10.2-py3-none-any.whl (13.3 kB) File type Wheel Python version py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page