Convert HTML to Markdown.
Project description
tomd
Convert HTML to Markdown.
tomd is a small Python library that takes an HTML string and returns the
Markdown that produced it (or an approximation, when the HTML wasn't
originally Markdown). Handy for archiving articles, scraped blog posts, or
anything else where you'd rather work with plain text than DOM trees.
Install
pip install tomd
Requires Python 3.10+.
Quickstart
import tomd
tomd.convert("<h1>Hello, world!</h1>")
# => '\n# Hello, world!\n'
Or via the class:
from tomd import Tomd
Tomd("<h1>Hello, world!</h1>").markdown
What it supports
| Markdown | HTML |
|---|---|
| Headings | <h1>–<h6> |
| Bold / italic | <b>, <strong>, <i>, <em> |
| Inline code | <code> |
| Strikethrough | <del> |
| Links | <a href="https://..."> |
| Images | <img src="..." alt="..."/> |
| Unordered lists | <ul><li>...</li></ul> |
| Ordered lists | <ol><li>...</li></ol> |
| Blockquotes | <blockquote>...</blockquote> |
| Horizontal rule | <hr/> |
| Code blocks | <pre><code>...</code></pre> |
| Tables | <table><thead>...<tbody>...</table> |
If your HTML can't be cleanly expressed in Markdown (nested layouts,
floating divs, etc.), the output will lose some structure — tomd aims at
the round-trip Markdown → HTML → Markdown case, not arbitrary HTML.
Example
from tomd import Tomd
html = """
<h1>Heading</h1>
<p><b>bold</b> and <i>italic</i> and <a href="https://example.com">a link</a></p>
<ul>
<li>one</li>
<li>two</li>
</ul>
<blockquote>a quote</blockquote>
"""
print(Tomd(html).markdown)
# Heading
**bold** and *italic* and [a link](https://example.com)
- one
- two
> a quote
Development
git clone https://github.com/elliotgao2/tomd.git
cd tomd
uv sync # install deps into .venv
uv run pytest # run tests
uv run ruff check . # lint
We use uv for packaging and ruff for lint + format. Install the pre-commit hooks:
uv run pre-commit install
Roadmap
- Headings, emphasis, links, images, lists, blockquotes, tables, HR, code blocks
- Nested lists
- CLI (
tomd < input.html > output.md)
Contributing
Pull requests are welcome. For non-trivial changes, please open an issue
first to discuss. Make sure uv run pytest and uv run ruff check . pass
before submitting.
License
MIT © Elliot Gao
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tomd-1.0.0.tar.gz.
File metadata
- Download URL: tomd-1.0.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3a9a3dcf14217d713f411b3d039824936774ff9d2a4eba790fe6e8463c78a6a
|
|
| MD5 |
54a3bc0de224d2649ab84a93913e2b36
|
|
| BLAKE2b-256 |
d0058ef82162f082fe61ba6915f0e9c2d241559b7a81a1034f3db4dcbd05e957
|
File details
Details for the file tomd-1.0.0-py3-none-any.whl.
File metadata
- Download URL: tomd-1.0.0-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
876ffa42df7ec1651f8ce49bb76a179596a355cde9f85c58710073df96b85fe2
|
|
| MD5 |
309c765e6c0b755c47053b6842806f25
|
|
| BLAKE2b-256 |
cc36f882b0064aff69c31f31a06e80a458e4df35b2ca9601548caedaf76a97bf
|