URL normalization for Python
Project description
url-normalize
URI Normalization function:
- Take care of IDN domains.
- Always provide the URI scheme in lowercase characters.
- Always provide the host, if any, in lowercase characters.
- Only perform percent-encoding where it is essential.
- Always use uppercase A-through-F characters when percent-encoding.
- Prevent dot-segments appearing in non-relative URI paths.
- For schemes that define a default authority, use an empty authority if the default is desired.
- For schemes that define an empty path to be equivalent to a path of "/", use "/".
- For schemes that define a port, use an empty port if the default is desired
- All portions of the URI must be utf-8 encoded NFC from Unicode strings
Inspired by Sam Ruby's urlnorm.py: http://intertwingly.net/blog/2004/08/04/Urlnorm
Example:
$ pip install url-normalize
Collecting url-normalize
...
Successfully installed future-0.16.0 url-normalize-1.3.3
$ python
Python 3.6.1 (default, Jul 8 2017, 05:00:20)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
> from url_normalize import url_normalize
> print(url_normalize('www.foo.com:80/foo'))
> https://www.foo.com/foo
History:
- 1.4.1: Added an optional param default_scheme to the url_normalize
- 1.4.0: A bit of code refactoring and cleanup
- 1.3.3: Support empty string and double slash urls (//domain.tld)
- 1.3.2: Same code support both Python 3 and Python 2.
- 1.3.1: Python 3 compatibility
- 1.2.1: PEP8, setup.py
- 1.1.2: support for shebang (#!) urls
- 1.1.1: using 'http' schema by default when appropriate
- 1.1.0: added handling of IDN domains
- 1.0.0: code pep8
- 0.1.0: forked from Sam Ruby's urlnorm.py
License: "Python" (PSF) License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
url-normalize-1.4.1.tar.gz
(5.0 kB
view hashes)
Built Distribution
Close
Hashes for url_normalize-1.4.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51e0f14050c79e732d175c33d12167f5e642cc23e0cb23275236af843faf884f |
|
MD5 | e61d139aa55a3a850579df348803fdea |
|
BLAKE2b-256 | e21247dc7437c13ddc648b796deec34cca14841dc193131f7be215baea3e9b2f |