Skip to main content

Convert HTML to Markdown.

Project description

tomd

CI PyPI Python License Downloads Ruff uv

Convert HTML to Markdown.

tomd is a small Python library that takes an HTML string and returns the Markdown that produced it (or an approximation, when the HTML wasn't originally Markdown). Handy for archiving articles, scraped blog posts, or anything else where you'd rather work with plain text than DOM trees.

Install

pip install tomd

Requires Python 3.10+.

Quickstart

import tomd

tomd.convert("<h1>Hello, world!</h1>")
# => '\n# Hello, world!\n'

Or via the class:

from tomd import Tomd

Tomd("<h1>Hello, world!</h1>").markdown

What it supports

Markdown HTML
Headings <h1><h6>
Bold / italic <b>, <strong>, <i>, <em>
Inline code <code>
Strikethrough <del>
Links <a href="https://...">
Images <img src="..." alt="..."/>
Unordered lists <ul><li>...</li></ul>
Ordered lists <ol><li>...</li></ol>
Blockquotes <blockquote>...</blockquote>
Horizontal rule <hr/>
Code blocks <pre><code>...</code></pre>
Tables <table><thead>...<tbody>...</table>

If your HTML can't be cleanly expressed in Markdown (nested layouts, floating divs, etc.), the output will lose some structure — tomd aims at the round-trip Markdown → HTML → Markdown case, not arbitrary HTML.

Example

from tomd import Tomd

html = """
<h1>Heading</h1>
<p><b>bold</b> and <i>italic</i> and <a href="https://example.com">a link</a></p>
<ul>
  <li>one</li>
  <li>two</li>
</ul>
<blockquote>a quote</blockquote>
"""

print(Tomd(html).markdown)
# Heading

**bold** and *italic* and [a link](https://example.com)

- one
- two

> a quote

Development

git clone https://github.com/elliotgao2/tomd.git
cd tomd
uv sync             # install deps into .venv
uv run pytest       # run tests
uv run ruff check . # lint

We use uv for packaging and ruff for lint + format. Install the pre-commit hooks:

uv run pre-commit install

Roadmap

  • Headings, emphasis, links, images, lists, blockquotes, tables, HR, code blocks
  • Nested lists
  • CLI (tomd < input.html > output.md)

Contributing

Pull requests are welcome. For non-trivial changes, please open an issue first to discuss. Make sure uv run pytest and uv run ruff check . pass before submitting.

License

MIT © Elliot Gao

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tomd-1.0.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tomd-1.0.0-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file tomd-1.0.0.tar.gz.

File metadata

  • Download URL: tomd-1.0.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.9

File hashes

Hashes for tomd-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e3a9a3dcf14217d713f411b3d039824936774ff9d2a4eba790fe6e8463c78a6a
MD5 54a3bc0de224d2649ab84a93913e2b36
BLAKE2b-256 d0058ef82162f082fe61ba6915f0e9c2d241559b7a81a1034f3db4dcbd05e957

See more details on using hashes here.

File details

Details for the file tomd-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tomd-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.9

File hashes

Hashes for tomd-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 876ffa42df7ec1651f8ce49bb76a179596a355cde9f85c58710073df96b85fe2
MD5 309c765e6c0b755c47053b6842806f25
BLAKE2b-256 cc36f882b0064aff69c31f31a06e80a458e4df35b2ca9601548caedaf76a97bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page