A pure Python HTML5 parser that just works.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

EmilStenstrom

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

JustHTML

JustHTML is a pure Python HTML5 parser that just works. It parses HTML and returns a DOM tree that you can traverse and manipulate.

Why JustHTML?

1. ✅ Correctness: 100% Spec Compliant

JustHTML is built to be correct. It implements the official WHATWG HTML5 specification exactly (tree builder and tokenizer), including all the complex error-handling rules that browsers use.

Verified Compliance: Passes all 8,500+ tests in the official html5lib-tests suite (used by browser vendors) (see /tests/).
100% Coverage: Every single line and branch of code is covered by integration tests.
Fuzz Tested: Has parsed 3 million randomized broken HTML documents to ensure it never crashes or hangs (see fuzz.py).
Living Standard: It tracks the living standard, not a snapshot from 2012.

2. 🐍 Pure Python with zero dependencies

JustHTML has zero dependencies. It's pure Python.

Easy Installation: No C extensions to compile, no system libraries (like libxml2) required. Works on PyPy, WASM (Pyodide), and anywhere Python runs.
No dependency upgrade hassle: Some libraries depend on a large set of libraries, all which require upgrades to avoid security issues.
Debuggable: It's just Python code. You can step through it with a debugger to understand exactly how your HTML is being parsed.
Returns plain python objects: Other parsers return lxml or etree trees which means you have another API to learn. JustHTML returns a set of nested objects you can iterate over. Simple.

3. ⚡ Fast enough™ Performance

If you need to parse terabytes of data, use a C or Rust parser (like html5ever). They are 10x-20x faster (see benchmarks.py).

But for most use cases, JustHTML is fast enough. It parses the Wikipedia homepage in ~0.1s. It is the fastest pure-Python HTML5 parser available, outperforming html5lib and BeautifulSoup.

Comparison to other parsers

Parser	Spec Compliant?	Pure Python?	Speed	Notes
JustHTML	✅ Yes	✅ Yes	⚡ Fast	The sweet spot. Correct, easy to install, and fast enough.
`html.parser`	❌ No	✅ Yes	⚡ Fast	Standard library. Chokes on malformed HTML.
`lxml`	❌ No	❌ No	🚀 Very Fast	C-based. Fast but not spec-compliant (different output than browsers).
`html5lib`	✅ Yes	✅ Yes	🐢 Slow	The reference implementation. Very correct but very slow.
`BeautifulSoup`	N/A	N/A	🐢 Slow	Wrapper around other parsers. Slower and more memory hungry than the underlying parser.
`gumbo` / `html5ever`	✅ Yes	❌ No	🚀 Very Fast	C/Rust based. Fast and correct, but requires compiling extensions.

Installation

pip install justhtml

Example usage

Python API

from justhtml import JustHTML

html = "<html><body><div id='main'><p>Hello, <b>world</b>!</p></div></body></html>"
doc = JustHTML(html)

# 1. Traverse the tree
# The tree is made of SimpleDomNode objects.
# Each node has .name, .attrs, .children, and .parent
root = doc.root              # #document
html_node = root.children[0] # html
body = html_node.children[1] # body (children[0] is head)
div = body.children[0]       # div

print(f"Tag: {div.name}")
print(f"Attributes: {div.attrs}")

# 2. Pretty-print HTML
# You can serialize any node back to HTML
print(div.to_html())
# Output:
# <div id="main">
#   <p>
#     Hello,
#     <b>world</b>
#     !
#   </p>
# </div>

Command Line Interface

You can also use JustHTML from the command line to pretty-print HTML files:

# Parse a file
python -m justhtml index.html

# Parse from stdin (great for piping)
curl -s https://example.com | python -m justhtml -

Develop locally and run the tests

Clone the repository:

git clone git@github.com:EmilStenstrom/justhtml.git
cd justhtml

Install the library locally (there's no dependencies!):
```
pip install -e .
```
Run the tests:
```
python run_tests.py
```
For verbose output showing diffs on failures:
```
python run_tests.py -v
```
Run the benchmarks:
```
python benchmark.py
```

License

MIT. Free to use for commercial and non-commercial use.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

EmilStenstrom

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.20.0

May 14, 2026

1.19.0

May 9, 2026

1.18.0

May 4, 2026

1.17.0

Apr 19, 2026

1.16.0

Apr 12, 2026

1.15.0

Apr 9, 2026

1.14.0

Apr 5, 2026

1.13.0

Mar 21, 2026

1.12.0

Mar 17, 2026

1.11.0

Mar 15, 2026

1.10.0

Mar 15, 2026

1.9.1

Mar 10, 2026

1.9.0

Mar 8, 2026

1.8.0

Mar 5, 2026

1.7.0

Feb 8, 2026

1.6.0

Feb 6, 2026

1.5.0

Feb 1, 2026

1.4.0

Jan 29, 2026

1.3.0

Jan 28, 2026

1.2.0

Jan 25, 2026

1.1.0

Jan 24, 2026

1.0.0

Jan 20, 2026

0.40.0

Jan 19, 2026

0.39.0

Jan 18, 2026

0.38.0

Jan 18, 2026

0.37.0

Jan 18, 2026

0.36.0

Jan 17, 2026

0.35.0

Jan 11, 2026

0.34.0

Jan 10, 2026

0.33.0

Jan 10, 2026

0.32.0

Jan 10, 2026

0.31.0

Jan 9, 2026

0.30.0

Jan 3, 2026

0.29.0

Jan 3, 2026

0.28.0

Jan 3, 2026

0.27.0

Jan 3, 2026

0.26.0

Jan 2, 2026

0.25.0

Jan 1, 2026

0.24.0

Jan 1, 2026

0.23.0

Dec 30, 2025

0.22.0

Dec 28, 2025

0.21.0

Dec 28, 2025

0.20.0

Dec 28, 2025

0.19.0

Dec 28, 2025

0.18.0

Dec 21, 2025

0.17.0

Dec 20, 2025

0.16.0

Dec 18, 2025

0.15.0

Dec 18, 2025

0.14.0

Dec 17, 2025

0.13.1

Dec 17, 2025

0.13.0

Dec 16, 2025

0.12.0

Dec 15, 2025

0.11.0

Dec 15, 2025

0.10.0

Dec 14, 2025

0.9.0

Dec 14, 2025

0.8.0

Dec 13, 2025

0.7.0

Dec 13, 2025

0.6.0

Dec 7, 2025

0.5.2

Dec 7, 2025

0.5.1

Dec 7, 2025

0.5.0

Dec 7, 2025

0.4.0

Dec 6, 2025

0.3.0

Dec 1, 2025

0.2.0

Dec 1, 2025

This version

0.1.0

Nov 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

justhtml-0.1.0.tar.gz (82.2 kB view details)

Uploaded Nov 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

justhtml-0.1.0-py3-none-any.whl (48.0 kB view details)

Uploaded Nov 30, 2025 Python 3

File details

Details for the file justhtml-0.1.0.tar.gz.

File metadata

Download URL: justhtml-0.1.0.tar.gz
Upload date: Nov 30, 2025
Size: 82.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for justhtml-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`52167a0a932aa15683e703f837fa9602f2eeb846cd581b23316509046c104d3e`
MD5	`c3a7d78bf39c091ee83c1397d88500dc`
BLAKE2b-256	`786d934823e10a9ab7d40b427a1a5fbc92fab47db8c0c3a9c84afef1d0384f4b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for justhtml-0.1.0.tar.gz:

Publisher: publish.yml on EmilStenstrom/justhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: justhtml-0.1.0.tar.gz
- Subject digest: 52167a0a932aa15683e703f837fa9602f2eeb846cd581b23316509046c104d3e
- Sigstore transparency entry: 731938629
- Sigstore integration time: Nov 30, 2025
Source repository:
- Permalink: EmilStenstrom/justhtml@8bc80c5baa29a53a417db7a0db56a0ab5fa92104
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/EmilStenstrom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8bc80c5baa29a53a417db7a0db56a0ab5fa92104
- Trigger Event: release

File details

Details for the file justhtml-0.1.0-py3-none-any.whl.

File metadata

Download URL: justhtml-0.1.0-py3-none-any.whl
Upload date: Nov 30, 2025
Size: 48.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for justhtml-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b5589bbd454085614dffdaebe3f86860188c9bfff0c82a36ee8f5b5044cd4f12`
MD5	`11a93366b00fdedf1d3cee94653e8852`
BLAKE2b-256	`1ef1b0d7db941db5f4557eeecd05bfa86c0c730fec84386c09ceed31d47f4a66`

See more details on using hashes here.

Provenance

The following attestation bundles were made for justhtml-0.1.0-py3-none-any.whl:

Publisher: publish.yml on EmilStenstrom/justhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: justhtml-0.1.0-py3-none-any.whl
- Subject digest: b5589bbd454085614dffdaebe3f86860188c9bfff0c82a36ee8f5b5044cd4f12
- Sigstore transparency entry: 731938630
- Sigstore integration time: Nov 30, 2025
Source repository:
- Permalink: EmilStenstrom/justhtml@8bc80c5baa29a53a417db7a0db56a0ab5fa92104
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/EmilStenstrom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8bc80c5baa29a53a417db7a0db56a0ab5fa92104
- Trigger Event: release

justhtml 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

JustHTML

Why JustHTML?

1. ✅ Correctness: 100% Spec Compliant

2. 🐍 Pure Python with zero dependencies

3. ⚡ Fast enough™ Performance

Comparison to other parsers

Installation

Example usage

Python API

Command Line Interface

Develop locally and run the tests

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance