Python implementation of the WHATWG URL Living Standard
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
whatwg-url
Python implementation of the WHATWG URL Living Standard.
The latest revision that this package implements of the standard is August 7th, 2018 (commit 49060c7)
Getting Started
Install the whatwg-url package using pip.
python -m pip install whatwg-url
And use the module like so:
import whatwg_url
url = whatwg_url.parse_url("https://www.google.com")
print(url)
# Url(scheme='https', hostname='www.google.com', port=None, path='', query='', fragment='')
Features
Compatibility with urllib.parse.urlparse()
import whatwg_url
parseresult = whatwg_url.urlparse("https://seth:larson@www.google.com:1234/maps?query=string#fragment")
print(parseresult.scheme) # 'https'
print(parseresult.netloc) # 'www.google.com:1234'
print(parseresult.userinfo) # 'seth:larson'
print(parseresult.path) # '/maps'
print(parseresult.params) # ''
print(parseresult.query) # 'query=string'
print(parseresult.fragment) # 'fragment'
print(parseresult.username) # 'seth'
print(parseresult.password) # 'larson'
print(parseresult.hostname) # 'www.google.com'
print(parseresult.port) # 1234
print(parseresult.geturl()) # 'https://seth:larson@www.google.com:1234/maps?query=string#fragment'
URL Normalization
The WHATWG URL specification describes methods of normalizing URL inputs to usable URLs. It handles percent-encodings, default ports, paths, IPv4 and IPv6 addresses, IDNA (2008 and 2003), multiple slashes after scheme, etc.
import whatwg_url
print(whatwg_url.normalize_url("https://////www.google.com")) # https://www.google.com
print(whatwg_url.normalize_url("https://www.google.com/dir1/../dir2")) # https://www.google.com/dir2
print(whatwg_url.normalize_url("https://你好你好")) # https://xn--6qqa088eba/
print(whatwg_url.normalize_url("https://0Xc0.0250.01")) # https://192.168.0.1/
URL Validation
print(whatwg_url.is_valid_url("https://www.google.com")) # True
print(whatwg_url.is_valid_url("https://www .google.com")) # False
Relative URLs
HTTP redirects often contain relative URLs (via the Location header) that need to be applied to the current URL location.
Specifying the base parameter allows for giving relative URLs as input and the changes be applied to a new URL object.
import whatwg_url
url = whatwg_url.parse_url("../dev?a=1#f", base="https://www.google.com/maps")
print(url.href) # https://www.google.com/dev?a=1#f
URL Property Mutators
Modifying properties on a URL object use the parser and "state overrides" to properly mutate the URL object.
url = whatwg_url.parse_url("http://www.google.com:443")
print(url.scheme) # 'http'
print(url.port) # 443
url.scheme = 'https'
print(url.scheme) # 'https'
print(url.port) # None
"Splatable"
The module is a single file which allows for easy vendoring into projects.
License
Changelog
2018.8.26
Added
- Added
UrlParserandUrl - Added
UrlParser.parse_host() - Added
UrlParser.parse_ipv4_host() - Added
Url.origin - Added
Url.authority - Added
urlparseandurljointo be compatible withurllib3.parse.urlparseandurllib.parse.urljoin - Added support for Python 2.7, 3.4, and 3.5
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file whatwg-url-2018.8.26.tar.gz.
File metadata
- Download URL: whatwg-url-2018.8.26.tar.gz
- Upload date:
- Size: 30.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4d59cc99bf6ab5967f140316dd9bb4daf6cdb18581895ef423dd54f7b41f43b
|
|
| MD5 |
4850c9eed025f946bbfd19c3f618ea2f
|
|
| BLAKE2b-256 |
3634c001514dbe3cc0bf6022dde46a56dc21c3e3f8208036baf8ab995a3df7a3
|