Skip to main content

Convert HTML to Markdown.

Project description

tomd

CI PyPI Python License Downloads Ruff uv

Convert HTML to Markdown.

tomd is a small Python library that takes an HTML string and returns the Markdown that produced it (or an approximation, when the HTML wasn't originally Markdown). Handy for archiving articles, scraped blog posts, or anything else where you'd rather work with plain text than DOM trees.

Install

pip install tomd

Requires Python 3.10+.

Quickstart

import tomd

tomd.convert("<h1>Hello, world!</h1>")
# => '\n# Hello, world!\n'

Or via the class:

from tomd import Tomd

Tomd("<h1>Hello, world!</h1>").markdown

What it supports

Markdown HTML
Headings <h1><h6>
Bold / italic <b>, <strong>, <i>, <em>
Inline code <code>
Strikethrough <del>
Links <a href="https://...">
Images <img src="..." alt="..."/>
Unordered lists <ul><li>...</li></ul>
Ordered lists <ol><li>...</li></ol>
Blockquotes <blockquote>...</blockquote>
Horizontal rule <hr/>
Code blocks <pre><code>...</code></pre>
Tables <table><thead>...<tbody>...</table>

If your HTML can't be cleanly expressed in Markdown (nested layouts, floating divs, etc.), the output will lose some structure — tomd aims at the round-trip Markdown → HTML → Markdown case, not arbitrary HTML.

Example

from tomd import Tomd

html = """
<h1>Heading</h1>
<p><b>bold</b> and <i>italic</i> and <a href="https://example.com">a link</a></p>
<ul>
  <li>one</li>
  <li>two</li>
</ul>
<blockquote>a quote</blockquote>
"""

print(Tomd(html).markdown)
# Heading

**bold** and *italic* and [a link](https://example.com)

- one
- two

> a quote

Development

git clone https://github.com/elliotgao2/tomd.git
cd tomd
uv sync             # install deps into .venv
uv run pytest       # run tests
uv run ruff check . # lint

We use uv for packaging and ruff for lint + format. Install the pre-commit hooks:

uv run pre-commit install

Roadmap

  • Headings, emphasis, links, images, lists, blockquotes, tables, HR, code blocks
  • Nested lists
  • CLI (tomd < input.html > output.md)

Contributing

Pull requests are welcome. For non-trivial changes, please open an issue first to discuss. Make sure uv run pytest and uv run ruff check . pass before submitting.

License

MIT © Elliot Gao

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tomd-1.1.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tomd-1.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file tomd-1.1.0.tar.gz.

File metadata

  • Download URL: tomd-1.1.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.9

File hashes

Hashes for tomd-1.1.0.tar.gz
Algorithm Hash digest
SHA256 c810e6cd370ceebedee547cdcd776382787126ac0dc0afaa3ba0b78891b45256
MD5 b6c633d987cfb4aaec4414ed3d10acdc
BLAKE2b-256 3040a8e3e2eb9e2046bfc342baa445077454921aaebfd9e7ad2c9c67ee3b0c68

See more details on using hashes here.

File details

Details for the file tomd-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: tomd-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.9

File hashes

Hashes for tomd-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 de66beb25670e49055402927d3d36abfea144fd597fcfb793acec57c36b8eb70
MD5 c36c2e80150c45b5706afae48760e30e
BLAKE2b-256 3f626da8a58520e3c7bbfe286993e1075fe2c8b1d81b22ee07981c32adc96325

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page