Skip to main content

URL normalization for Python

Project description

url-normalize

Build Status Coverage Status

URI Normalization function:

  • Take care of IDN domains.
  • Always provide the URI scheme in lowercase characters.
  • Always provide the host, if any, in lowercase characters.
  • Only perform percent-encoding where it is essential.
  • Always use uppercase A-through-F characters when percent-encoding.
  • Prevent dot-segments appearing in non-relative URI paths.
  • For schemes that define a default authority, use an empty authority if the default is desired.
  • For schemes that define an empty path to be equivalent to a path of "/", use "/".
  • For schemes that define a port, use an empty port if the default is desired
  • All portions of the URI must be utf-8 encoded NFC from Unicode strings

Inspired by Sam Ruby's urlnorm.py: http://intertwingly.net/blog/2004/08/04/Urlnorm

Example:

$ pip install url-normalize
Collecting url-normalize
...
Successfully installed future-0.16.0 url-normalize-1.3.3
$ python
Python 3.6.1 (default, Jul  8 2017, 05:00:20)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
> from url_normalize import url_normalize
> print(url_normalize('www.foo.com:80/foo'))
> https://www.foo.com/foo

History:

  • 1.4.1: Added an optional param default_scheme to the url_normalize
  • 1.4.0: A bit of code refactoring and cleanup
  • 1.3.3: Support empty string and double slash urls (//domain.tld)
  • 1.3.2: Same code support both Python 3 and Python 2.
  • 1.3.1: Python 3 compatibility
  • 1.2.1: PEP8, setup.py
  • 1.1.2: support for shebang (#!) urls
  • 1.1.1: using 'http' schema by default when appropriate
  • 1.1.0: added handling of IDN domains
  • 1.0.0: code pep8
  • 0.1.0: forked from Sam Ruby's urlnorm.py

License: "Python" (PSF) License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for url-normalize, version 1.4.1
Filename, size File type Python version Upload date Hashes
Filename, size url_normalize-1.4.1-py2.py3-none-any.whl (12.3 kB) File type Wheel Python version py2.py3 Upload date Hashes View hashes
Filename, size url-normalize-1.4.1.tar.gz (5.0 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page