Python implementation of the WHATWG URL Standard
Project description
urlstd
urlstd
is a Python implementation of the WHATWG URL Standard.
This library provides URL
class, URLSearchParams
class, and low-level APIs that comply with the URL specification.
Note: The latest release of urlstd is implemented based on the URL specification commit f787850.
Supported APIs
-
- class urlstd.parse.
URL(url: str, base: Optional[str] = None)
- href:
readonly property href: str
- origin:
readonly property origin: str
- protocol:
property protocol: str
- username:
property username: str
- password:
property password: str
- host:
property host: str
- hostname:
property hostname: str
- port:
property port: str
- pathname:
property pathname: str
- search:
property search: str
- searchParams:
readonly property search_params: URLSearchParams
- hash:
property hash: str
- href:
- class urlstd.parse.
-
- class urlstd.parse.
URLSearchParams(init: Optional[Union[str, Sequence[Sequence[Union[str, int, float]]], Dict[str, Union[str, int, float]], URLRecord, URLSearchParams]] = None)
- class urlstd.parse.
-
Low-level APIs
-
- urlstd.parse.
parse_url(urlstring: str, base: str = None, encoding: str = "utf-8") -> URLRecord
- urlstd.parse.
-
- class urlstd.parse.
BasicURLParser
- classmethod
parse(urlstring: str, base: Optional[URLRecord] = None, encoding: str = "utf-8", url: Optional[URLRecord] = None, state_override: Optional[URLParserState] = None) -> URLRecord
- classmethod
- class urlstd.parse.
-
- class urlstd.parse.
URLRecord
- scheme:
property scheme: str = ""
- username:
property username: str = ""
- password:
property password: str = ""
- host:
property host: Optional[Union[str, int, Tuple[int, ...]]] = None
- port:
property port: Optional[int] = None
- path:
property path: Union[List[str], str] = []
- query:
property query: Optional[str] = None
- fragment:
property fragment: Optional[str] = None
- origin:
readonly property origin: Optional[Origin]
- is special:
is_special() -> bool
- is not special:
is_not_special() -> bool
- includes credentials:
includes_credentials() -> bool
- has an opaque path:
has_opaque_path() -> bool
- cannot have a username/password/port:
cannot_have_username_password_port() -> bool
- URL serializer:
serialize_url(exclude_fragment: bool = False) -> str
- host serializer:
serialize_host() -> str
- URL path serializer:
serialize_path() -> str
- scheme:
- class urlstd.parse.
-
- urlstd.parse.IDNA.
domain_to_ascii(domain: str, be_strict: bool = False) -> str
- urlstd.parse.IDNA.
-
- urlstd.parse.Host.
parse(host: str, is_not_special: bool = False) -> Union[str, int, Tuple[int, ...]]
- urlstd.parse.Host.
-
- urlstd.parse.Host.
serialize(host: Union[str, int, Sequence[int]]) -> str
- urlstd.parse.Host.
-
- urlstd.parse.
string_percent_decode(s: str) -> bytes
- urlstd.parse.
-
- urlstd.parse.
string_percent_encode(s: str, safe: str, encoding: str = "utf-8", space_as_plus: bool = False) -> str
- urlstd.parse.
-
application/x-www-form-urlencoded parser
- urlstd.parse.
parse_qsl(query: bytes) -> List[Tuple[str, str]]
- urlstd.parse.
-
application/x-www-form-urlencoded serializer
- urlstd.parse.
urlencode(query: Sequence[Tuple[str, str]], encoding: str = "utf-8") -> str
- urlstd.parse.
-
-
Compatibility with standard library
urllib
-
urlstd.parse.
urlparse(urlstring: str, base: str = None, encoding: str = "utf-8", allow_fragments: bool = True) -> urllib.parse.ParseResult
urlstd.parse.urlparse()
ia an alternative tourllib.parse.urlparse()
. Parses a string representation of a URL using the basic URL parser, and returnsurllib.parse.ParseResult
.
-
Basic Usage
To parse a string into a URL
with using a base URL:
from urlstd.parse import URL
url = URL('?ffi&🌈', 'http://example.org')
url # → URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')
url.search # → '?%EF%AC%83&%F0%9F%8C%88'
params = url.search_params
params # → URLSearchParams([('ffi', ''), ('🌈', '')])
params.sort()
params # → URLSearchParams([('🌈', ''), ('ffi', '')])
url.search # → '?%F0%9F%8C%88=&%EF%AC%83='
str(url) # → 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='
To parse a string into a urllib.parse.ParseResult
with using a base URL:
import html
from urllib.parse import unquote
from urlstd.parse import urlparse
pr = urlparse('?aÿb', 'http://example.org/foo/', encoding='utf-8')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
unquote(pr.query) # → 'aÿb'
pr = urlparse('?aÿb', 'http://example.org/foo/', encoding='windows-1251')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
unquote(pr.query, encoding='windows-1251') # → 'aÿb'
html.unescape('aÿb') # → 'aÿb'
pr = urlparse('?aÿb', 'http://example.org/foo/', encoding='windows-1252')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
unquote(pr.query, encoding='windows-1252') # → 'aÿb'
Logging
urlstd
uses standard library logging
for validation error.
Change the logger log level of urlstd if needed:
logging.getLogger('urlstd').setLevel(logging.ERROR)
Dependencies
- icupy >= 0.11.0 (pre-built packages are available)
- icupy requirements:
- ICU4C (ICU - International Components for Unicode) - latest version recommended
- C++17 compatible compiler (see supported compilers)
- CMake >= 3.7
- icupy requirements:
Installation
-
Configuring environment variables for icupy (ICU):
-
Windows:
-
Set the
ICU_ROOT
environment variable to the root of the ICU installation (default isC:\icu
). For example, if the ICU is located inC:\icu4c
:set ICU_ROOT=C:\icu4c
or in PowerShell:
$env:ICU_ROOT = "C:\icu4c"
-
To verify settings using icuinfo (64 bit):
%ICU_ROOT%\bin64\icuinfo
or in PowerShell:
& $env:ICU_ROOT\bin64\icuinfo
-
-
Linux/POSIX:
-
If the ICU is located in a non-regular place, set the
PKG_CONFIG_PATH
andLD_LIBRARY_PATH
environment variables. For example, if the ICU is located in/usr/local
:export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
-
To verify settings using pkg-config:
$ pkg-config --cflags --libs icu-uc -I/usr/local/include -L/usr/local/lib -licuuc -licudata
-
-
-
Installing from PyPI:
pip install urlstd
Running Tests
Install dependencies:
pip install tox
To run tests and generate a report:
git clone https://github.com/miute/urlstd.git
cd urlstd
tox -e wpt
See result: tests/wpt/report.html
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for urlstd-2021.10.25.post1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1123390e146276a4c0767bb4b086e17512dfd241235a62eb963865127c9601dd |
|
MD5 | 1a6f51b83e6c81734c546b28b5fe5067 |
|
BLAKE2b-256 | 777dbed0f54cdec0cea522a417eabc519ca22fdb3bba99af887f91fb9ec54a50 |