Python implementation of the WHATWG URL Standard
Project description
urlstd
urlstd
is a Python implementation of the WHATWG URL Living Standard.
This library provides URL
class, URLSearchParams
class, and low-level APIs that comply with the URL specification.
Supported APIs
-
- class urlstd.parse.
URL(url: str, base: Optional[str | URL] = None)
- canParse: classmethod
can_parse(url: str, base: Optional[str | URL] = None) -> bool
- stringifier:
__str__() -> str
- href:
readonly property href: str
- origin:
readonly property origin: str
- protocol:
property protocol: str
- username:
property username: str
- password:
property password: str
- host:
property host: str
- hostname:
property hostname: str
- port:
property port: str
- pathname:
property pathname: str
- search:
property search: str
- searchParams:
readonly property search_params: URLSearchParams
- hash:
property hash: str
- URL equivalence:
__eq__(other: Any) -> bool
andequals(other: URL, exclude_fragments: bool = False) → bool
- canParse: classmethod
- class urlstd.parse.
-
- class urlstd.parse.
URLSearchParams(init: Optional[str | Sequence[Sequence[str | int | float]] | dict[str, str | int | float] | URLRecord | URLSearchParams] = None)
- size:
__len__() -> int
- append:
append(name: str, value: str | int | float) -> None
- delete:
delete(name: str, value: Optional[str | int | float] = None) -> None
- get:
get(name: str) -> str | None
- getAll:
get_all(name: str) -> tuple[str, ...]
- has:
has(name: str, value: Optional[str | int | float] = None) -> bool
- set:
set(name: str, value: str | int | float) -> None
- sort:
sort() -> None
- iterable<USVString, USVString>:
__iter__() -> Iterator[tuple[str, str]]
- stringifier:
__str__() -> str
- size:
- class urlstd.parse.
-
Low-level APIs
-
- urlstd.parse.
parse_url(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> URLRecord
- urlstd.parse.
-
- class urlstd.parse.
BasicURLParser
- classmethod
parse(urlstring: str, base: Optional[URLRecord] = None, encoding: str = "utf-8", url: Optional[URLRecord] = None, state_override: Optional[URLParserState] = None) -> URLRecord
- classmethod
- class urlstd.parse.
-
- class urlstd.parse.
URLRecord
- scheme:
property scheme: str = ""
- username:
property username: str = ""
- password:
property password: str = ""
- host:
property host: Optional[str | int | tuple[int, ...]] = None
- port:
property port: Optional[int] = None
- path:
property path: list[str] | str = []
- query:
property query: Optional[str] = None
- fragment:
property fragment: Optional[str] = None
- origin:
readonly property origin: Origin | None
- is special:
is_special() -> bool
- is not special:
is_not_special() -> bool
- includes credentials:
includes_credentials() -> bool
- has an opaque path:
has_opaque_path() -> bool
- cannot have a username/password/port:
cannot_have_username_password_port() -> bool
- URL serializer:
serialize_url(exclude_fragment: bool = False) -> str
- host serializer:
serialize_host() -> str
- URL path serializer:
serialize_path() -> str
- URL equivalence:
__eq__(other: Any) -> bool
andequals(other: URLRecord, exclude_fragments: bool = False) → bool
- scheme:
- class urlstd.parse.
-
Hosts (domains and IP addresses)
- class urlstd.parse.
IDNA
- domain to ASCII: classmethod
domain_to_ascii(domain: str, be_strict: bool = False) -> str
- domain to Unicode: classmethod
domain_to_unicode(domain: str, be_strict: bool = False) -> str
- domain to ASCII: classmethod
- class urlstd.parse.
Host
- host parser: classmethod
parse(host: str, is_not_special: bool = False) -> str | int | tuple[int, ...]
- host serializer: classmethod
serialize(host: str | int | Sequence[int]) -> str
- host parser: classmethod
- class urlstd.parse.
-
- urlstd.parse.
string_percent_decode(s: str) -> bytes
- urlstd.parse.
-
- urlstd.parse.
string_percent_encode(s: str, safe: str, encoding: str = "utf-8", space_as_plus: bool = False) -> str
- urlstd.parse.
-
application/x-www-form-urlencoded parser
- urlstd.parse.
parse_qsl(query: bytes) -> list[tuple[str, str]]
- urlstd.parse.
-
application/x-www-form-urlencoded serializer
- urlstd.parse.
urlencode(query: Sequence[tuple[str, str]], encoding: str = "utf-8") -> str
- urlstd.parse.
-
Validation
- class urlstd.parse.
HostValidator
- valid host string: classmethod
is_valid(host: str) -> bool
- valid domain string: classmethod
is_valid_domain(domain: str) -> bool
- valid IPv4-address string: classmethod
is_valid_ipv4_address(address: str) -> bool
- valid IPv6-address string: classmethod
is_valid_ipv6_address(address: str) -> bool
- valid host string: classmethod
- class urlstd.parse.
URLValidator
- valid URL string: classmethod
is_valid(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> bool
- valid URL-scheme string: classmethod
is_valid_url_scheme(value: str) -> bool
- valid URL string: classmethod
- class urlstd.parse.
-
-
Compatibility with standard library
urllib
-
urlstd.parse.
urlparse(urlstring: str, base: str = None, encoding: str = "utf-8", allow_fragments: bool = True) -> urllib.parse.ParseResult
urlstd.parse.urlparse()
ia an alternative tourllib.parse.urlparse()
. Parses a string representation of a URL using the basic URL parser, and returnsurllib.parse.ParseResult
.
-
Basic Usage
To parse a string into a URL
:
from urlstd.parse import URL
URL('http://user:pass@foo:21/bar;par?b#c')
# → <URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>
To parse a string into a URL
with using a base URL:
url = URL('?ffi&🌈', base='http://example.org')
url # → <URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>
url.search # → '?%EF%AC%83&%F0%9F%8C%88'
params = url.search_params
params # → URLSearchParams([('ffi', ''), ('🌈', '')])
params.sort()
params # → URLSearchParams([('🌈', ''), ('ffi', '')])
url.search # → '?%F0%9F%8C%88=&%EF%AC%83='
str(url) # → 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='
To validate a URL string:
from urlstd.parse import URL, URLValidator, ValidityState
URL.can_parse('https://user:password@example.org/') # → True
URLValidator.is_valid('https://user:password@example.org/') # → False
validity = ValidityState()
URLValidator.is_valid('https://user:password@example.org/', validity=validity)
validity.valid # → False
validity.validation_errors # → 1
validity.descriptions[0] # → "invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21"
URL.can_parse('file:///C|/demo') # → True
URLValidator.is_valid('file:///C|/demo') # → False
validity = ValidityState()
URLValidator.is_valid('file:///C|/demo', validity=validity) # → False
validity.valid # → False
validity.validation_errors # → 1
validity.descriptions[0] # → "invalid-URL-unit: code point is found that is not a URL unit: U+007C (|) in 'file:///C|/demo' at position 9"
To parse a string into a urllib.parse.ParseResult
with using a base URL:
import html
from urllib.parse import unquote
from urlstd.parse import urlparse
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='utf-8')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
unquote(pr.query) # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1251')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
unquote(pr.query, encoding='windows-1251') # → 'aÿb'
html.unescape('aÿb') # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1252')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
unquote(pr.query, encoding='windows-1252') # → 'aÿb'
Logging
urlstd
uses standard library logging for validation error.
Change the logger log level of urlstd
if needed:
logging.getLogger('urlstd').setLevel(logging.ERROR)
Dependencies
- icupy >= 0.11.0 (pre-built packages are available)
icupy
requirements:- ICU4C (ICU - International Components for Unicode) - latest version recommended
- C++17 compatible compiler (see supported compilers)
- CMake >= 3.7
Installation
-
Configuring environment variables for icupy (ICU):
-
Windows:
-
Set the
ICU_ROOT
environment variable to the root of the ICU installation (default isC:\icu
). For example, if the ICU is located inC:\icu4c
:set ICU_ROOT=C:\icu4c
or in PowerShell:
$env:ICU_ROOT = "C:\icu4c"
-
To verify settings using icuinfo (64 bit):
%ICU_ROOT%\bin64\icuinfo
or in PowerShell:
& $env:ICU_ROOT\bin64\icuinfo
-
-
Linux/POSIX:
-
If the ICU is located in a non-regular place, set the
PKG_CONFIG_PATH
andLD_LIBRARY_PATH
environment variables. For example, if the ICU is located in/usr/local
:export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
-
To verify settings using pkg-config:
$ pkg-config --cflags --libs icu-uc -I/usr/local/include -L/usr/local/lib -licuuc -licudata
-
-
-
Installing from PyPI:
pip install urlstd
Running Tests
Install dependencies:
pipx install tox
# or
pip install --user tox
To run tests and generate a report:
git clone https://github.com/miute/urlstd.git
cd urlstd
tox -e wpt
See result: tests/wpt/report.html
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file urlstd-2023.7.26.1.tar.gz
.
File metadata
- Download URL: urlstd-2023.7.26.1.tar.gz
- Upload date:
- Size: 105.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8064d7a2034d3836cec844533b108af14429244d6119cfa6f268ef2bfc711358 |
|
MD5 | ec8481fce2fccf93bde0fb22d50b0d6d |
|
BLAKE2b-256 | d534ccc954ae7638071e312069250e4de8fc6925ce36f7f7c97e946ad21d7a61 |
File details
Details for the file urlstd-2023.7.26.1-py3-none-any.whl
.
File metadata
- Download URL: urlstd-2023.7.26.1-py3-none-any.whl
- Upload date:
- Size: 37.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0174403e956b3937038440e0da01742982b6e9711a2191b4ee79f84ae607b6f |
|
MD5 | 82b727c17a3169eef8398573e132ea3a |
|
BLAKE2b-256 | e2d216bd0ec40523996e527a3c948da40faee91682005ec932cfb7f955aa315c |