Python implementation of the WHATWG URL Standard
Project description
urlstd
urlstd is a Python implementation of the WHATWG URL Living Standard.
This library provides URL class, URLSearchParams class, and low-level APIs that comply with the URL specification.
Supported APIs
-
- class urlstd.parse.
URL(url: str, base: Optional[str | URL] = None)- canParse: classmethod
can_parse(url: str, base: Optional[str | URL] = None) -> bool - stringifier:
__str__() -> str - href:
readonly property href: str - origin:
readonly property origin: str - protocol:
property protocol: str - username:
property username: str - password:
property password: str - host:
property host: str - hostname:
property hostname: str - port:
property port: str - pathname:
property pathname: str - search:
property search: str - searchParams:
readonly property search_params: URLSearchParams - hash:
property hash: str - URL equivalence:
__eq__(other: Any) -> boolandequals(other: URL, exclude_fragments: bool = False) → bool
- canParse: classmethod
- class urlstd.parse.
-
- class urlstd.parse.
URLSearchParams(init: Optional[str | Sequence[Sequence[str | int | float]] | dict[str, str | int | float] | URLRecord | URLSearchParams] = None)- size:
__len__() -> int - append:
append(name: str, value: str | int | float) -> None - delete:
delete(name: str, value: Optional[str | int | float] = None) -> None - get:
get(name: str) -> str | None - getAll:
get_all(name: str) -> tuple[str, ...] - has:
has(name: str, value: Optional[str | int | float] = None) -> bool - set:
set(name: str, value: str | int | float) -> None - sort:
sort() -> None - iterable<USVString, USVString>:
__iter__() -> Iterator[tuple[str, str]] - stringifier:
__str__() -> str
- size:
- class urlstd.parse.
-
Low-level APIs
-
- urlstd.parse.
parse_url(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> URLRecord
- urlstd.parse.
-
- class urlstd.parse.
BasicURLParser- classmethod
parse(urlstring: str, base: Optional[URLRecord] = None, encoding: str = "utf-8", url: Optional[URLRecord] = None, state_override: Optional[URLParserState] = None) -> URLRecord
- classmethod
- class urlstd.parse.
-
- class urlstd.parse.
URLRecord- scheme:
property scheme: str = "" - username:
property username: str = "" - password:
property password: str = "" - host:
property host: Optional[str | int | tuple[int, ...]] = None - port:
property port: Optional[int] = None - path:
property path: list[str] | str = [] - query:
property query: Optional[str] = None - fragment:
property fragment: Optional[str] = None - origin:
readonly property origin: Origin | None - is special:
is_special() -> bool - is not special:
is_not_special() -> bool - includes credentials:
includes_credentials() -> bool - has an opaque path:
has_opaque_path() -> bool - cannot have a username/password/port:
cannot_have_username_password_port() -> bool - URL serializer:
serialize_url(exclude_fragment: bool = False) -> str - host serializer:
serialize_host() -> str - URL path serializer:
serialize_path() -> str - URL equivalence:
__eq__(other: Any) -> boolandequals(other: URLRecord, exclude_fragments: bool = False) → bool
- scheme:
- class urlstd.parse.
-
Hosts (domains and IP addresses)
- class urlstd.parse.
IDNA- domain to ASCII: classmethod
domain_to_ascii(domain: str, be_strict: bool = False) -> str - domain to Unicode: classmethod
domain_to_unicode(domain: str, be_strict: bool = False) -> str
- domain to ASCII: classmethod
- class urlstd.parse.
Host- host parser: classmethod
parse(host: str, is_not_special: bool = False) -> str | int | tuple[int, ...] - host serializer: classmethod
serialize(host: str | int | Sequence[int]) -> str
- host parser: classmethod
- class urlstd.parse.
-
- urlstd.parse.
string_percent_decode(s: str) -> bytes
- urlstd.parse.
-
- urlstd.parse.
string_percent_encode(s: str, safe: str, encoding: str = "utf-8", space_as_plus: bool = False) -> str
- urlstd.parse.
-
application/x-www-form-urlencoded parser
- urlstd.parse.
parse_qsl(query: bytes) -> list[tuple[str, str]]
- urlstd.parse.
-
application/x-www-form-urlencoded serializer
- urlstd.parse.
urlencode(query: Sequence[tuple[str, str]], encoding: str = "utf-8") -> str
- urlstd.parse.
-
Validation
- class urlstd.parse.
HostValidator- valid host string: classmethod
is_valid(host: str) -> bool - valid domain string: classmethod
is_valid_domain(domain: str) -> bool - valid IPv4-address string: classmethod
is_valid_ipv4_address(address: str) -> bool - valid IPv6-address string: classmethod
is_valid_ipv6_address(address: str) -> bool
- valid host string: classmethod
- class urlstd.parse.
URLValidator- valid URL string: classmethod
is_valid(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> bool - valid URL-scheme string: classmethod
is_valid_url_scheme(value: str) -> bool
- valid URL string: classmethod
- class urlstd.parse.
-
-
Compatibility with standard library
urllib-
urlstd.parse.
urlparse(urlstring: str, base: str = None, encoding: str = "utf-8", allow_fragments: bool = True) -> urllib.parse.ParseResulturlstd.parse.urlparse()ia an alternative tourllib.parse.urlparse(). Parses a string representation of a URL using the basic URL parser, and returnsurllib.parse.ParseResult.
-
Basic Usage
To parse a string into a URL:
from urlstd.parse import URL
URL('http://user:pass@foo:21/bar;par?b#c')
# → <URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>
To parse a string into a URL with using a base URL:
url = URL('?ffi&🌈', base='http://example.org')
url # → <URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>
url.search # → '?%EF%AC%83&%F0%9F%8C%88'
params = url.search_params
params # → URLSearchParams([('ffi', ''), ('🌈', '')])
params.sort()
params # → URLSearchParams([('🌈', ''), ('ffi', '')])
url.search # → '?%F0%9F%8C%88=&%EF%AC%83='
str(url) # → 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='
To validate a URL string:
from urlstd.parse import URL, URLValidator, ValidityState
URL.can_parse('https://user:password@example.org/') # → True
URLValidator.is_valid('https://user:password@example.org/') # → False
validity = ValidityState()
URLValidator.is_valid('https://user:password@example.org/', validity=validity)
validity.valid # → False
validity.validation_errors # → 1
validity.descriptions[0] # → "invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21"
URL.can_parse('file:///C|/demo') # → True
URLValidator.is_valid('file:///C|/demo') # → False
validity = ValidityState()
URLValidator.is_valid('file:///C|/demo', validity=validity) # → False
validity.valid # → False
validity.validation_errors # → 1
validity.descriptions[0] # → "invalid-URL-unit: code point is found that is not a URL unit: U+007C (|) in 'file:///C|/demo' at position 9"
To parse a string into a urllib.parse.ParseResult with using a base URL:
import html
from urllib.parse import unquote
from urlstd.parse import urlparse
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='utf-8')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
unquote(pr.query) # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1251')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
unquote(pr.query, encoding='windows-1251') # → 'aÿb'
html.unescape('aÿb') # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1252')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
unquote(pr.query, encoding='windows-1252') # → 'aÿb'
Logging
urlstd uses standard library logging for validation error.
Change the logger log level of urlstd if needed:
logging.getLogger('urlstd').setLevel(logging.ERROR)
Dependencies
- icupy >= 0.11.0 (pre-built packages are available)
icupyrequirements:- ICU4C (ICU - International Components for Unicode) - latest version recommended
- C++17 compatible compiler (see supported compilers)
- CMake >= 3.7
Installation
-
Configuring environment variables for icupy (ICU):
-
Windows:
-
Set the
ICU_ROOTenvironment variable to the root of the ICU installation (default isC:\icu). For example, if the ICU is located inC:\icu4c:set ICU_ROOT=C:\icu4c
or in PowerShell:
$env:ICU_ROOT = "C:\icu4c" -
To verify settings using icuinfo (64 bit):
%ICU_ROOT%\bin64\icuinfoor in PowerShell:
& $env:ICU_ROOT\bin64\icuinfo
-
-
Linux/POSIX:
-
If the ICU is located in a non-regular place, set the
PKG_CONFIG_PATHandLD_LIBRARY_PATHenvironment variables. For example, if the ICU is located in/usr/local:export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
-
To verify settings using pkg-config:
$ pkg-config --cflags --libs icu-uc -I/usr/local/include -L/usr/local/lib -licuuc -licudata
-
-
-
Installing from PyPI:
pip install urlstd
Running Tests
Install dependencies:
pipx install tox
# or
pip install --user tox
To run tests and generate a report:
git clone https://github.com/miute/urlstd.git
cd urlstd
tox -e wpt
See result: tests/wpt/report.html
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file urlstd-2023.7.26.1.tar.gz.
File metadata
- Download URL: urlstd-2023.7.26.1.tar.gz
- Upload date:
- Size: 105.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8064d7a2034d3836cec844533b108af14429244d6119cfa6f268ef2bfc711358
|
|
| MD5 |
ec8481fce2fccf93bde0fb22d50b0d6d
|
|
| BLAKE2b-256 |
d534ccc954ae7638071e312069250e4de8fc6925ce36f7f7c97e946ad21d7a61
|
File details
Details for the file urlstd-2023.7.26.1-py3-none-any.whl.
File metadata
- Download URL: urlstd-2023.7.26.1-py3-none-any.whl
- Upload date:
- Size: 37.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0174403e956b3937038440e0da01742982b6e9711a2191b4ee79f84ae607b6f
|
|
| MD5 |
82b727c17a3169eef8398573e132ea3a
|
|
| BLAKE2b-256 |
e2d216bd0ec40523996e527a3c948da40faee91682005ec932cfb7f955aa315c
|