Skip to main content

URL normalization for Python

Project description

url-normalize

tests Coveralls PyPI

A Python library for standardizing and normalizing URLs with support for internationalized domain names (IDN).

Table of Contents

Introduction

url-normalize provides a robust URI normalization function that:

  • Takes care of IDN domains.
  • Always provides the URI scheme in lowercase characters.
  • Always provides the host, if any, in lowercase characters.
  • Only performs percent-encoding where it is essential.
  • Always uses uppercase A-through-F characters when percent-encoding.
  • Prevents dot-segments appearing in non-relative URI paths.
  • For schemes that define a default authority, uses an empty authority if the default is desired.
  • For schemes that define an empty path to be equivalent to a path of "/", uses "/".
  • For schemes that define a port, uses an empty port if the default is desired
  • Ensures all portions of the URI are utf-8 encoded NFC from Unicode strings

Inspired by Sam Ruby's urlnorm.py

Features

  • IDN Support: Full internationalized domain name handling
  • Configurable Defaults:
    • Customizable default scheme (https by default)
    • Configurable default domain for absolute paths
  • Query Parameter Control:
    • Parameter filtering with allowlists
    • Support for domain-specific parameter rules
  • Versatile URL Handling:
    • Empty string URLs
    • Double slash URLs (//domain.tld)
    • Shebang (#!) URLs
  • Developer Friendly:
    • Cross-version Python compatibility (3.8+)
    • 100% test coverage
    • Modern type hints and string handling

Installation

pip install url-normalize

Usage

Python API

from url_normalize import url_normalize

# Basic normalization (uses https by default)
print(url_normalize("www.foo.com:80/foo"))
# Output: https://www.foo.com/foo

# With custom default scheme
print(url_normalize("www.foo.com/foo", default_scheme="http"))
# Output: http://www.foo.com/foo

# With query parameter filtering enabled
print(url_normalize("www.google.com/search?q=test&utm_source=test", filter_params=True))
# Output: https://www.google.com/search?q=test

# With custom parameter allowlist as a dict
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist={"example.com": ["page", "id"]}
))
# Output: https://example.com?page=1&id=123

# With custom parameter allowlist as a list
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist=["page", "id"]
))
# Output: https://example.com?page=1&id=123

# With default domain for absolute paths
print(url_normalize("/images/logo.png", default_domain="example.com"))
# Output: https://example.com/images/logo.png

# With default domain and custom scheme
print(url_normalize("/images/logo.png", default_scheme="http", default_domain="example.com"))
# Output: http://example.com/images/logo.png

Command-line Usage

You can also use url-normalize from the command line:

$ url-normalize "www.foo.com:80/foo"
# Output: https://www.foo.com/foo

# With custom default scheme
$ url-normalize -s http "www.foo.com/foo"
# Output: http://www.foo.com/foo

# With query parameter filtering
$ url-normalize -f "www.google.com/search?q=test&utm_source=test"
# Output: https://www.google.com/search?q=test

# With custom allowlist
$ url-normalize -f -p page,id "example.com?page=1&id=123&ref=test"
# Output: https://example.com/?page=1&id=123

# With default domain for absolute paths
$ url-normalize -d example.com "/images/logo.png"
# Output: https://example.com/images/logo.png

# With default domain and custom scheme
$ url-normalize -d example.com -s http "/images/logo.png"
# Output: http://example.com/images/logo.png

# Via uv tool/uvx
$ uvx url-normalize www.foo.com:80/foo
# Output: https://www.foo.com:80/foo

Documentation

For a complete history of changes, see CHANGELOG.md.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

url_normalize-2.2.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

url_normalize-2.2.1-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file url_normalize-2.2.1.tar.gz.

File metadata

  • Download URL: url_normalize-2.2.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.17

File hashes

Hashes for url_normalize-2.2.1.tar.gz
Algorithm Hash digest
SHA256 74a540a3b6eba1d95bdc610c24f2c0141639f3ba903501e61a52a8730247ff37
MD5 2894fd86ec1ea95ef5be3cfaf4adf9df
BLAKE2b-256 8031febb777441e5fcdaacb4522316bf2a527c44551430a4873b052d545e3279

See more details on using hashes here.

File details

Details for the file url_normalize-2.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for url_normalize-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3deb687587dc91f7b25c9ae5162ffc0f057ae85d22b1e15cf5698311247f567b
MD5 98372699b312cfab109283ff34919e4d
BLAKE2b-256 bcd95ec15501b675f7bc07c5d16aa70d8d778b12375686b6efd47656efdc67cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page