Skip to main content

URL normalization for Python

Project description

url-normalize

Build Status Coverage Status

URI Normalization function:

  • Take care of IDN domains.
  • Always provide the URI scheme in lowercase characters.
  • Always provide the host, if any, in lowercase characters.
  • Only perform percent-encoding where it is essential.
  • Always use uppercase A-through-F characters when percent-encoding.
  • Prevent dot-segments appearing in non-relative URI paths.
  • For schemes that define a default authority, use an empty authority if the default is desired.
  • For schemes that define an empty path to be equivalent to a path of "/", use "/".
  • For schemes that define a port, use an empty port if the default is desired
  • All portions of the URI must be utf-8 encoded NFC from Unicode strings

Inspired by Sam Ruby's urlnorm.py: http://intertwingly.net/blog/2004/08/04/Urlnorm

Example:

$ pip install url-normalize
Collecting url-normalize
...
Successfully installed future-0.16.0 url-normalize-1.3.3
$ python
Python 3.6.1 (default, Jul  8 2017, 05:00:20)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
> from url_normalize import url_normalize
> print(url_normalize('www.foo.com:80/foo'))
> https://www.foo.com/foo

History:

  • 1.4.0: A bit of code refactoring and cleanup
  • 1.3.2: Support empty string and double slash urls (//domain.tld)
  • 1.3.1: Same code support both Python 3 and Python 2.
  • 1.3: Python 3 compatibility
  • 1.2: PEP8, setup.py
  • 1.1.2: support for shebang (#!) urls
  • 1.1.1: using 'http' schema by default when appropriate
  • 1.1: added handling of IDN domains
  • 1.0: code pep8-zation
  • 0.1: forked from Sam Ruby's urlnorm.py

License: "Python" (PSF) License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

url-normalize-1.4.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

url_normalize-1.4.0-py2.py3-none-any.whl (12.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file url-normalize-1.4.0.tar.gz.

File metadata

  • Download URL: url-normalize-1.4.0.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.6 CPython/3.7.1 Darwin/18.2.0

File hashes

Hashes for url-normalize-1.4.0.tar.gz
Algorithm Hash digest
SHA256 0ed30fb5fb5bdf16f9321b43fac13ab507e75a1caed2c0e21de8c27714ecdca3
MD5 133c70a01b24ad3e81f23da788b7b81d
BLAKE2b-256 15a0f45fea8189984f3e3aec83473e8f3516b729efb68b9e546c206f929bca1e

See more details on using hashes here.

File details

Details for the file url_normalize-1.4.0-py2.py3-none-any.whl.

File metadata

  • Download URL: url_normalize-1.4.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.6 CPython/3.7.1 Darwin/18.2.0

File hashes

Hashes for url_normalize-1.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 61a28f7aa7ed7825764c57b64d977eba04a9da7ec5e3564dea6671acd61e4ac3
MD5 b354844dbe39deff85bab71fb37e103a
BLAKE2b-256 8226389f2d3424f52fb3e858235f7a35d8ebcc9ebefab449078f9ab4e996c795

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page