A pure Python HTML5 parser that just works.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

EmilStenstrom

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

JustHTML

HTML from the real web is messy. It is often malformed, user supplied, scraped from unknown pages, or headed for a browser where small parsing differences can become security bugs.

JustHTML gives Python projects one small dependency for the common HTML jobs:

parse HTML like a browser, including broken markup
sanitize untrusted HTML by default
query with CSS selectors
transform, serialize, extract text, or convert to Markdown
run anywhere Python runs, with no C extension and no system package to install

pip install justhtml

Requires Python 3.10 or later.

Documentation | Comparison | Playground | Security policy

JustHTML turns messy unsafe HTML into a sanitized, queryable DOM, then serializes it to text, Markdown, or HTML.

Why Use It?

Most Python HTML libraries optimize for one part of the problem.

html.parser is built in, but not HTML5-correct. BeautifulSoup is convenient, but depends heavily on the parser underneath. lxml and C/Rust-backed parsers are fast, but usually leave sanitization as a separate concern. html5lib and Bleach shaped the Python ecosystem, but both are no longer the obvious foundation for new projects.

JustHTML is for applications that want a boring, inspectable, pure-Python default:

Correct parsing: browser-style HTML5 recovery, tested against the official html5lib fixtures.
Safe by default: JustHTML(html) sanitizes before you query or serialize.
One DOM: parse once, then sanitize, query, transform, serialize, extract text, or produce Markdown.
Easy deployment: zero runtime dependencies, no compiler, works on PyPy and Pyodide.
Honest tradeoff: if you are parsing terabytes of trusted HTML, use a C/Rust parser. If you need reliable handling of untrusted or malformed HTML inside a Python app, use JustHTML.

Quick Start

from justhtml import JustHTML

doc = JustHTML(
    "<p>Hello<script>alert(1)</script> "
    "<a href='javascript:alert(1)'>bad</a> "
    "<a href='https://example.com'>ok</a></p>",
    fragment=True,
)

print(doc.to_html(pretty=False))
# => <p>Hello <a>bad</a> <a href="https://example.com">ok</a></p>

Sanitization is enabled by default. Disable it only for trusted input:

doc = JustHTML("<main><p class='intro'>Hello</p></main>", sanitize=False)
intro = doc.query_one("p.intro")

print(intro.to_text())
# => Hello

What You Can Do

from justhtml import JustHTML, Linkify, SetAttrs, Unwrap

doc = JustHTML(
    "<p>Hello <span>world</span> example.com</p>",
    fragment=True,
    sanitize=False,
    transforms=[
        Unwrap("span"),
        Linkify(),
        SetAttrs("a", rel="nofollow"),
    ],
)

print(doc.to_html(pretty=False))
# => <p>Hello world <a href="http://example.com" rel="nofollow">example.com</a></p>

JustHTML includes:

CSS selectors: query() and query_one()
Sanitization: allowlisted HTML cleaning, URL policies, inline CSS controls
Transforms: unwrap, drop, edit attributes, linkify, compose cleanup pipelines
Text output: to_text() and Markdown generation
Builder API: construct nodes directly from Python
Streaming: process large inputs incrementally
Bleach migration guide: move existing sanitizer code to JustHTML policies

Command Line

# Pretty-print an HTML file
justhtml index.html

# Parse from stdin
curl -s https://example.com | justhtml -

# Extract text from selected nodes
justhtml index.html --selector "main p" --format text

# Convert selected HTML to Markdown
justhtml index.html --selector "article" --format markdown

Correctness

JustHTML is tested against the official html5lib tokenizer, tree-construction, serializer, and encoding fixtures, plus project-specific sanitizer, selector, transform, CLI, and regression tests.

The current test summary is 10,257 passing tests with 100% line and branch coverage. See Correctness Testing for details.

Documentation

Security

JustHTML sanitizes by default, but output safety still depends on where you put it. HTML body output is not automatically safe inside JavaScript, CSS, URL attributes, or other contexts.

For the supported-version policy and vulnerability reporting, see SECURITY.md.

License

MIT. Free to use for commercial and non-commercial projects.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

EmilStenstrom

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

3.1.0

Jul 2, 2026

3.0.0

Jun 21, 2026

2.4.1

Jun 21, 2026

2.4.0

Jun 20, 2026

2.3.0

Jun 12, 2026

This version

2.2.0

Jun 7, 2026

2.1.0

Jun 6, 2026

2.0.0

May 24, 2026

1.23.0

May 24, 2026

1.22.0

May 22, 2026

1.21.0

May 15, 2026

1.20.0

May 14, 2026

1.19.0

May 9, 2026

1.18.0

May 4, 2026

1.17.0

Apr 19, 2026

1.16.0

Apr 12, 2026

1.15.0

Apr 9, 2026

1.14.0

Apr 5, 2026

1.13.0

Mar 21, 2026

1.12.0

Mar 17, 2026

1.11.0

Mar 15, 2026

1.10.0

Mar 15, 2026

1.9.1

Mar 10, 2026

1.9.0

Mar 8, 2026

1.8.0

Mar 5, 2026

1.7.0

Feb 8, 2026

1.6.0

Feb 6, 2026

1.5.0

Feb 1, 2026

1.4.0

Jan 29, 2026

1.3.0

Jan 28, 2026

1.2.0

Jan 25, 2026

1.1.0

Jan 24, 2026

1.0.0

Jan 20, 2026

0.40.0

Jan 19, 2026

0.39.0

Jan 18, 2026

0.38.0

Jan 18, 2026

0.37.0

Jan 18, 2026

0.36.0

Jan 17, 2026

0.35.0

Jan 11, 2026

0.34.0

Jan 10, 2026

0.33.0

Jan 10, 2026

0.32.0

Jan 10, 2026

0.31.0

Jan 9, 2026

0.30.0

Jan 3, 2026

0.29.0

Jan 3, 2026

0.28.0

Jan 3, 2026

0.27.0

Jan 3, 2026

0.26.0

Jan 2, 2026

0.25.0

Jan 1, 2026

0.24.0

Jan 1, 2026

0.23.0

Dec 30, 2025

0.22.0

Dec 28, 2025

0.21.0

Dec 28, 2025

0.20.0

Dec 28, 2025

0.19.0

Dec 28, 2025

0.18.0

Dec 21, 2025

0.17.0

Dec 20, 2025

0.16.0

Dec 18, 2025

0.15.0

Dec 18, 2025

0.14.0

Dec 17, 2025

0.13.1

Dec 17, 2025

0.13.0

Dec 16, 2025

0.12.0

Dec 15, 2025

0.11.0

Dec 15, 2025

0.10.0

Dec 14, 2025

0.9.0

Dec 14, 2025

0.8.0

Dec 13, 2025

0.7.0

Dec 13, 2025

0.6.0

Dec 7, 2025

0.5.2

Dec 7, 2025

0.5.1

Dec 7, 2025

0.5.0

Dec 7, 2025

0.4.0

Dec 6, 2025

0.3.0

Dec 1, 2025

0.2.0

Dec 1, 2025

0.1.0

Nov 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

justhtml-2.2.0.tar.gz (885.3 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

justhtml-2.2.0-py3-none-any.whl (156.3 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file justhtml-2.2.0.tar.gz.

File metadata

Download URL: justhtml-2.2.0.tar.gz
Upload date: Jun 7, 2026
Size: 885.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for justhtml-2.2.0.tar.gz
Algorithm	Hash digest
SHA256	`219f163245e456060ad472cd70c33b4fe544521bb7d9947d85fa59cdf65a87d9`
MD5	`7d9f08c709c01cb1fc23fa5540de142e`
BLAKE2b-256	`4319266d12809f7241de4914a100b641b04642593fbef6723b912c0ffca0a418`

See more details on using hashes here.

Provenance

The following attestation bundles were made for justhtml-2.2.0.tar.gz:

Publisher: publish.yml on EmilStenstrom/justhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: justhtml-2.2.0.tar.gz
- Subject digest: 219f163245e456060ad472cd70c33b4fe544521bb7d9947d85fa59cdf65a87d9
- Sigstore transparency entry: 1744355564
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: EmilStenstrom/justhtml@fd78e99fe52bd563fc81a11dd6515596649e1130
- Branch / Tag: refs/tags/v2.2.0
- Owner: https://github.com/EmilStenstrom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fd78e99fe52bd563fc81a11dd6515596649e1130
- Trigger Event: release

File details

Details for the file justhtml-2.2.0-py3-none-any.whl.

File metadata

Download URL: justhtml-2.2.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 156.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for justhtml-2.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`41227bf28cb7bedc6d00bf69d0e06fc72d224f4ea5fa1623c2eb52da5e4d4b81`
MD5	`9efa149e5abc8d8293f182c4e0d56f7c`
BLAKE2b-256	`fa0d2ff282f09816de373e3a37042a7c2bce3ec38c95a781805f24cb0d688de5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for justhtml-2.2.0-py3-none-any.whl:

Publisher: publish.yml on EmilStenstrom/justhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: justhtml-2.2.0-py3-none-any.whl
- Subject digest: 41227bf28cb7bedc6d00bf69d0e06fc72d224f4ea5fa1623c2eb52da5e4d4b81
- Sigstore transparency entry: 1744355683
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: EmilStenstrom/justhtml@fd78e99fe52bd563fc81a11dd6515596649e1130
- Branch / Tag: refs/tags/v2.2.0
- Owner: https://github.com/EmilStenstrom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fd78e99fe52bd563fc81a11dd6515596649e1130
- Trigger Event: release

justhtml 2.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

JustHTML

Why Use It?

Quick Start

What You Can Do

Command Line

Correctness

Documentation

Security

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance