URL normalization for Python
Project description
url-normalize
A Python library for standardizing and normalizing URLs with support for internationalized domain names (IDN).
Table of Contents
Introduction
url-normalize provides a robust URI normalization function that:
- Takes care of IDN domains.
- Always provides the URI scheme in lowercase characters.
- Always provides the host, if any, in lowercase characters.
- Only performs percent-encoding where it is essential.
- Always uses uppercase A-through-F characters when percent-encoding.
- Prevents dot-segments appearing in non-relative URI paths.
- For schemes that define a default authority, uses an empty authority if the default is desired.
- For schemes that define an empty path to be equivalent to a path of "/", uses "/".
- For schemes that define a port, uses an empty port if the default is desired
- Ensures all portions of the URI are utf-8 encoded NFC from Unicode strings
Inspired by Sam Ruby's urlnorm.py
Features
- IDN Support: Full internationalized domain name handling
- Configurable Defaults:
- Customizable default scheme (https by default)
- Configurable default domain for absolute paths
- Query Parameter Control:
- Parameter filtering with allowlists
- Support for domain-specific parameter rules
- Versatile URL Handling:
- Empty string URLs
- Double slash URLs (//domain.tld)
- Shebang (#!) URLs
- Developer Friendly:
- Cross-version Python compatibility (3.8+)
- 100% test coverage
- Modern type hints and string handling
Installation
pip install url-normalize
Usage
Python API
from url_normalize import url_normalize
# Basic normalization (uses https by default)
print(url_normalize("www.foo.com:80/foo"))
# Output: https://www.foo.com/foo
# With custom default scheme
print(url_normalize("www.foo.com/foo", default_scheme="http"))
# Output: http://www.foo.com/foo
# With query parameter filtering enabled
print(url_normalize("www.google.com/search?q=test&utm_source=test", filter_params=True))
# Output: https://www.google.com/search?q=test
# With custom parameter allowlist as a dict
print(url_normalize(
"example.com?page=1&id=123&ref=test",
filter_params=True,
param_allowlist={"example.com": ["page", "id"]}
))
# Output: https://example.com?page=1&id=123
# With custom parameter allowlist as a list
print(url_normalize(
"example.com?page=1&id=123&ref=test",
filter_params=True,
param_allowlist=["page", "id"]
))
# Output: https://example.com?page=1&id=123
# With default domain for absolute paths
print(url_normalize("/images/logo.png", default_domain="example.com"))
# Output: https://example.com/images/logo.png
# With default domain and custom scheme
print(url_normalize("/images/logo.png", default_scheme="http", default_domain="example.com"))
# Output: http://example.com/images/logo.png
Command-line Usage
You can also use url-normalize
from the command line:
$ url-normalize "www.foo.com:80/foo"
# Output: https://www.foo.com/foo
# With custom default scheme
$ url-normalize -s http "www.foo.com/foo"
# Output: http://www.foo.com/foo
# With query parameter filtering
$ url-normalize -f "www.google.com/search?q=test&utm_source=test"
# Output: https://www.google.com/search?q=test
# With custom allowlist
$ url-normalize -f -p page,id "example.com?page=1&id=123&ref=test"
# Output: https://example.com/?page=1&id=123
# With default domain for absolute paths
$ url-normalize -d example.com "/images/logo.png"
# Output: https://example.com/images/logo.png
# With default domain and custom scheme
$ url-normalize -d example.com -s http "/images/logo.png"
# Output: http://example.com/images/logo.png
# Via uv tool/uvx
$ uvx url-normalize www.foo.com:80/foo
# Output: https://www.foo.com:80/foo
Documentation
For a complete history of changes, see CHANGELOG.md.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
url_normalize-2.2.1.tar.gz
(18.8 kB
view details)
Built Distribution
File details
Details for the file url_normalize-2.2.1.tar.gz
.
File metadata
- Download URL: url_normalize-2.2.1.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
74a540a3b6eba1d95bdc610c24f2c0141639f3ba903501e61a52a8730247ff37
|
|
MD5 |
2894fd86ec1ea95ef5be3cfaf4adf9df
|
|
BLAKE2b-256 |
8031febb777441e5fcdaacb4522316bf2a527c44551430a4873b052d545e3279
|
File details
Details for the file url_normalize-2.2.1-py3-none-any.whl
.
File metadata
- Download URL: url_normalize-2.2.1-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
3deb687587dc91f7b25c9ae5162ffc0f057ae85d22b1e15cf5698311247f567b
|
|
MD5 |
98372699b312cfab109283ff34919e4d
|
|
BLAKE2b-256 |
bcd95ec15501b675f7bc07c5d16aa70d8d778b12375686b6efd47656efdc67cd
|