Skip to main content

WHATWG URLPattern for Python. 100% specification strict, pure Python, optimized and yarl-compatible.

Project description

yarlpattern

WPT conformance WPT auxiliary suites Stable spec API Tentative spec API Python License

WHATWG URLPattern for Python — 100% conformance to the upstream WPT corpus: 469 / 469 cases passing across all five test suites, the same files Chromium, Safari, and Firefox validate against.

Pure Python on top of yarl — immutable pattern objects, component properties named after their URL counterparts, zero non-Python dependencies. The pattern is the API: compile once, then ask .test(url) or .exec(url) from anywhere a yarl.URL lives.

from yarlpattern import URLPattern

# Multi-tenant API: the subdomain identifies the tenant, the path
# captures the API version and the resource tail — all extracted in
# one match call.
pat = URLPattern({
    "hostname": ":tenant.myapp.com",
    "pathname": "/api/v:version/*",
})

result = pat.exec("https://acme.myapp.com/api/v2/users/42")
result.hostname["groups"]["tenant"]    # 'acme'
result.pathname["groups"]["version"]   # '2'
result.pathname["groups"]["0"]         # 'users/42'

pat.test("https://foo.example.com/api/v2/users")  # False — wrong host
pat.test("https://acme.myapp.com/api/users")      # False — no version

That's the differentiator. Flask-style :id routers match the path component in isolation; URLPattern matches across protocol, hostname, port, path, and search at once, returning structured named groups per component.

Conformance

469 / 469 upstream Web Platform Tests pass (100%) across every WHATWG URLPattern test suite — the same files Chromium, Safari, Firefox, Ada, and rust-urlpattern validate against. The WPT corpus is SHA-pinned by scripts/fetch_references.sh to commit dd54691 (2026-05-11) so the pass count is reproducible at any future date.

Suite Source Status
urlpattern.any.js WPT  ·  urlpatterntestdata.json ✅   366 / 366
urlpattern-constructor.any.js WPT (inline) ✅   4 / 4
urlpattern-hasregexpgroups.any.js WPT  ·  urlpattern-hasregexpgroups-tests.js ✅   55 / 55
urlpattern-compare.tentative.any.js WPT  ·  urlpattern-compare-test-data.json ✅   25 / 25
urlpattern-generate.tentative.any.js WPT  ·  urlpattern-generate-test-data.json ✅   19 / 19
yarlpattern unit tests this repo  ·  tokenizer / parser / parts / regex / engine / pattern ✅   130 / 130
Total ✅   599 / 599

Full per-case conformance report (regenerate via just compliance-report)  ·  Documented deviations and stricter-than-yarl rules

What we get right that's easy to miss

The 100% number is the headline. Equally load-bearing — and easy to skip past — are the per-component canonicalisation rules the WHATWG URLPattern spec quietly requires. yarlpattern enforces all of them; a stdlib-only port that goes through urllib.parse cannot:

  • WHATWG URL parsing end-to-end via yarl, not urllib.parse (which is not WHATWG-conformant).
  • IDNA2008 / UTS46 hostname canonicalization via the third-party idna package, not Python's stdlib idna codec (which is IDNA2003 and not spec-compliant for modern IDN labels).
  • Strict port parsing"8080xyz" is rejected as the WHATWG URL parser's port-state requires; webhook-validation patterns that constrain on exact ports stay robust against junk suffixes.
  • Case-preserving %XX passthrough in pattern literals — caf%c3%a9 round-trips as itself, where yarl would normalise to uppercase (WPT cases 146 / 148 pin this).
  • U+FFFD substitution for unpaired surrogates before UTF-8 percent-encoding, where yarl silently drops them (WPT case 157).
  • Hostname-pattern truncation at ? / # / / / \, matching browser engine behaviour for hostnames that were pasted from full URLs.

Stdlib-only mode. Under stdlib re without the [regex] extra, conformance on urlpattern.any.js is 364 / 366 (99.5%). The two outlier patterns — [a&&b] (intersection) and [a--b] (difference) from the JS v-flag — require Matthew Barnett's regex package; they're marked xfail with an install hint when it's absent. pip install yarlpattern[regex] activates them.

API surface

Every stable and tentative method in the WHATWG URLPattern IDL is implemented: URLPattern(input | string, baseURL?, options?), test, exec, all eight component properties, has_regexp_groups, URLPattern.compare_component, and the tentative generate(component, groups). The IDL camelCase spellings (hasRegExpGroups, compareComponent) are kept as aliases so code ported verbatim from the spec or browser JS reads identically. See SPEC_DEVIATIONS.md for the intentional Python-flavour choices.

How this differs from aiohttp.web.UrlDispatcher

aiohttp.web.UrlDispatcher is a mature path-router shaped around web-request dispatch. yarlpattern is a predicate: it matches across all eight URL components (not just the path), works standalone (no server context required), and uses the same WHATWG pattern syntax browsers, Deno, Bun, and Cloudflare Workers all implement.

Use UrlDispatcher if you're building an aiohttp service. Use yarlpattern if you're matching URLs outside a server context, need to constrain on hostname / port / scheme alongside path, or want patterns that match what browsers do.

Full comparison

How this differs from yarl

yarl is a URL parser / builder; yarlpattern is a URLPattern matcher. They're complementary — yarlpattern depends on yarl for URL parsing and IDNA hostname encoding, accepts yarl.URL directly in .test(...) and .exec(...) calls (no str() round-trip), and uses WHATWG component names (protocol / hostname / pathname / search / hash) rather than yarl's (scheme / host / path / query / fragment).

Where the WHATWG URLPattern spec is stricter than yarl, yarlpattern enforces the spec — see the Conformance section above and SPEC_DEVIATIONS.md.

Component-name mapping for muscle-memory porting:

yarl yarlpattern WHATWG / browser JS
scheme protocol protocol
user username username
host hostname hostname
path pathname pathname
query (MultiDict) search (str) search
fragment hash hash

Full comparison, including the WPT cases that pin down each strictness rule, the with_* ergonomics, and the encoding philosophy yarlpattern shares with the rest of aio-libs.

Install

pip install yarlpattern            # stdlib re backend
pip install 'yarlpattern[regex]'   # full 100% conformance — see Conformance § above

Bring your own regex engine

The matcher's regex backend is pluggable behind a @runtime_checkable Protocol. Two adapters ship in-tree — stdlib re (always available; default fallback) and regex (auto-detected when yarlpattern[regex] is installed; closes the [a&&b] / [a--b] gap).

Selection priority: explicit engine= argument › URLPATTERN_REGEX_ENGINE env var › auto-probe (prefers regex when importable, falls back to re). See src/yarlpattern/_regex_engine/protocols.py for the Protocol definitions; a future PyO3-backed engine slots in as one new adapter module.

Quick start

uv sync --all-groups
uv run pytest                  # full test suite
just check                     # lint + types + tests (requires `just`)
from yarlpattern import URLPattern

# Dict form, fully wildcarded except path
api = URLPattern({"pathname": "/api/v:version/users/:id(\\d+)"})
api.test({"pathname": "/api/v2/users/42"})              # True
api.exec({"pathname": "/api/v2/users/42"}).pathname     # {'input': '...', 'groups': {'version': '2', 'id': '42'}}

# String form with base URL
route = URLPattern("/posts/:slug", "https://blog.example.com")
route.test("https://blog.example.com/posts/hello")      # True

# Match a full URL against the constructed pattern
pat = URLPattern("https://*.shop.example/products/:sku")
pat.test("https://eu.shop.example/products/SKU-991")    # True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yarlpattern-0.2.0.tar.gz (129.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yarlpattern-0.2.0-py3-none-any.whl (66.9 kB view details)

Uploaded Python 3

File details

Details for the file yarlpattern-0.2.0.tar.gz.

File metadata

  • Download URL: yarlpattern-0.2.0.tar.gz
  • Upload date:
  • Size: 129.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for yarlpattern-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6676d767919d3b21720c8dcac18cdbe62069f722ff346e745c3b042e666a270a
MD5 012677933eaef365ac60af9572c32802
BLAKE2b-256 ba78dc3c5c95c109c79eff1fce6ad25a112ba7cbf269f12d65fddbb16ac5fdf0

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarlpattern-0.2.0.tar.gz:

Publisher: release.yml on chad-loder/yarlpattern

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarlpattern-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: yarlpattern-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 66.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for yarlpattern-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b8bab62868c298b82aa822b9db61ae9c9bf413c0e3fdfee90f1b13d8cec0414
MD5 2111db50243b8721c39b12ea0b5bd8cf
BLAKE2b-256 7a7bac2d9a8cd1643c65d9f802c99ef745821d388c266aa6228a72b3f4983583

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarlpattern-0.2.0-py3-none-any.whl:

Publisher: release.yml on chad-loder/yarlpattern

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page