Skip to main content

Pure Python implementation of the WHATWG URL Standard.

Project description

pywhatwgurl - Python WHATWG URL Parser

CI PyPI Python License: MIT Typed Docs

Pure Python implementation of the WHATWG URL Standard. The goal is a small, spec-faithful library for parsing, serializing, and manipulating URLs.

pywhatwgurl is listed as a complete Python implementation in the official WHATWG URL repository.

Status

  • 100% WHATWG URL Standard conformance for core URL parsing
  • Full URL and URLSearchParams API implementation
  • See Conformance for IDNA limitations

Installation

Requires Python 3.10+.

pip install pywhatwgurl

Quickstart

from pywhatwgurl import URL

url = URL("https://user:pass@example.com:8080/path?q=1#frag")

url.hostname   # 'example.com'
url.port       # '8080'
url.pathname   # '/path'
url.search     # '?q=1'
url.hash       # '#frag'
str(url)       # 'https://user:pass@example.com:8080/path?q=1#frag'

URLSearchParams works just like the browser API:

from pywhatwgurl import URLSearchParams

params = URLSearchParams("a=1&b=2&a=3")
params.get("a")        # '1'
params.get_all("a")    # ('1', '3')
params.set("b", "42")
str(params)            # 'a=1&b=42&a=3'

For full API details, see the documentation.

Development

To set up a local development environment, use uv (recommended) or pip:

# Clone and install dev dependencies
uv sync --dev

# Run the test suite
uv run pytest

If you prefer pip:

python -m venv .venv && source .venv/bin/activate
pip install -e .
pip install pytest ruff mypy pre-commit interrogate pip-audit cyclonedx-bom
pytest

Conformance

This library targets 100% conformance with the WHATWG URL Standard for core URL parsing, validated against the official Web Platform Tests.

Test Suite Status
URL parsing (urltestdata) ✅ 873/873 (100%)
URL setters ✅ 274/274 (100%)
URL setters stripping ✅ 144/144 (100%)
URLSearchParams ✅ 139/139 (100%)
Percent-encoding ✅ 7/7 (100%)
IDNA/ToASCII ⚠️ xfail (see below)

IDNA Limitations

IDNA tests are marked as expected failures (xfail) because the Python idna library follows stricter RFC 5891/5892 rules than the WHATWG URL Standard's lenient UTS46 processing.

Why not fix these?

  1. No Python WHATWG-compliant IDNA implementation exists
  2. Even non-Python implementations with custom IDNA handling still fall short of full compliance
  3. Real-world domains work correctly — failures are obscure edge cases

For details, see tests/conformance/README.md.

Supply Chain Security

  • SBOM: A Software Bill of Materials (CycloneDX JSON) is automatically generated for every release and attached to the GitHub Release.
  • Audit: All runtime dependencies are scanned for known vulnerabilities using pip-audit in our CI pipeline.

Roadmap

  • ✅ Implement URL parsing/serialization per WHATWG URL Standard
  • ✅ Validate against the official URL test suite (100% conformance)
  • Ship a minimal, typed API suitable for frameworks and tooling

WPT URL test data

To pull down the pinned WPT URL resources, use util/wpt_url_test_data.py:

python util/wpt_url_test_data.py download \
  --dest_dir tests/conformance/data \
  --commit <WPT_URL_COMMIT>

The script downloads the WPT URL JSON resources (including urltestdata, setters, percent-encoding, toascii, and IDNA fixtures), preserves comments, validates schemas, and writes a metadata file with the pinned commit. The scheduled workflow .github/workflows/fetch_test_data.yml runs the same script via the composite action and is keyed on the pinned commit; adjust WPT_URL_COMMIT/WPT_TEST_DATA_PATH in the workflow env if you need a different pin.

Updating the pinned WPT commit

  • A helper workflow, .github/workflows/update_wpt_url.yml, can be triggered manually (or waits for its weekly schedule) to fetch the latest url/ commit from WPT, bump all pins, refresh the fixtures, and open a PR.
  • To bump manually without the bot:
    1. Get the latest url/ commit: NEW_COMMIT=$(curl -s https://api.github.com/repos/web-platform-tests/wpt/commits?path=url&per_page=1 | jq -r '.[0].sha')
    2. Set WPT_URL_COMMIT to that value in .github/workflows/main.yml and .github/workflows/fetch_test_data.yml, and update the defaults in .github/workflows/actions/fetch_wpt_url_test_data/action.yml and util/wpt_url_test_data.py to match.
    3. Refresh fixtures: python util/wpt_url_test_data.py download --dest_dir tests/conformance/data --commit "$NEW_COMMIT"
    4. Commit the workflow and data changes together.

Building

To build the package distribution (wheel and sdist):

uv build

The artifacts will be generated in the dist/ directory.

Versioning is dynamic and derived from Git tags (e.g., 0.1.0 or 0.1.dev1+...).

Documentation

To build and serve the documentation locally:

# Install docs dependencies
uv sync --group docs

# Serve locally (with live reload)
uv run mkdocs serve

# Build static site
uv run mkdocs build

Documentation is automatically deployed to GitHub Pages when changes are pushed to master.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywhatwgurl-0.1.1.tar.gz (155.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pywhatwgurl-0.1.1-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file pywhatwgurl-0.1.1.tar.gz.

File metadata

  • Download URL: pywhatwgurl-0.1.1.tar.gz
  • Upload date:
  • Size: 155.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pywhatwgurl-0.1.1.tar.gz
Algorithm Hash digest
SHA256 65c85da35367511c12a4dd87fecaf08aa3ae564259055b37b4ae53429289fcf6
MD5 ecc1f799dd46921ab25dd86ca3d16a51
BLAKE2b-256 c0d2ce0fffb9eb66ea2f88d20d7c3841b017d25559b6b617bce566811fe0bb48

See more details on using hashes here.

Provenance

The following attestation bundles were made for pywhatwgurl-0.1.1.tar.gz:

Publisher: release.yml on pywhatwgurl/pywhatwgurl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pywhatwgurl-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pywhatwgurl-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pywhatwgurl-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d67072d3f702f899e6e5de074e7fa14b5c9c5c85f616c6016c4eef9c94113d5f
MD5 c39fc2958b4f21fd2800ef87a75a01ac
BLAKE2b-256 002f6e14535f7532c836c1200ea53785570d3c37128719ff62915d0971e00598

See more details on using hashes here.

Provenance

The following attestation bundles were made for pywhatwgurl-0.1.1-py3-none-any.whl:

Publisher: release.yml on pywhatwgurl/pywhatwgurl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page