Skip to main content

Comprehensive license normalization with a three-level hierarchy.

Project description

License Normaliser Logo

Comprehensive license normalization with a three-level hierarchy.

PyPI Version Supported Python versions Build Status Documentation Status MIT

license-normaliser is a comprehensive license normalization library that maps any license representation (SPDX tokens, URLs, prose descriptions) to a canonical three-level hierarchy.

Features

  • Three-level hierarchy - LicenseFamily → LicenseName → LicenseVersion

  • Wide format support - SPDX tokens, URLs, prose descriptions

  • Creative Commons support - Full CC family with versions and IGO variants

  • Publisher-specific licenses - Elsevier, Wiley, Springer, ACS, and more

  • Caching - LRU caching for performance

  • CLI - Command-line interface for quick normalization

Hierarchy

The library uses a three-level hierarchy:

  1. LicenseFamily - broad bucket: "cc", "osi", "copyleft", "publisher-tdm", …

  2. LicenseName - version-free: "cc-by", "cc-by-nc-nd", "mit", "wiley-tdm"

  3. LicenseVersion - fully resolved: "cc-by-3.0", "cc-by-nc-nd-4.0"

Installation

With uv:

uv pip install license-normaliser

Or with pip:

pip install license-normaliser

Quick start

from license_normaliser import normalise_license

v = normalise_license("CC BY-NC-ND 4.0")
str(v)                 # "cc-by-nc-nd-4.0"  ← LicenseVersion
str(v.license)         # "cc-by-nc-nd"      ← LicenseName
str(v.license.family)  # "cc"               ← LicenseFamily

Resolution pipeline (first match wins)

  1. Direct registry lookup (cleaned lowercase key)

  2. Alias table (prose variants, SPDX tokens, mixed-case short-forms)

  3. Exact URL map (http/https, trailing-slash normalised, fragment-aware)

  4. Structural CC URL regex (any creativecommons.org URL not in the map)

  5. Prose keyword scan (full sentences from license documents)

  6. Fallback (key = cleaned string, everything else unknown/None)

CLI usage

Normalize a single license:

license-normaliser normalise "MIT"
# Output: mit

license-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# License: cc-by
# Family: cc

Batch normalize multiple licenses:

license-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
# Output:
# MIT: mit
# Apache-2.0: apache-2.0
# CC BY 4.0: cc-by-4.0

Testing

All tests run inside Docker to prevent accidental side effects:

make test

To test a specific Python version:

make test-env ENV=py312

License

MIT

Support

For issues, go to GitHub.

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

license_normaliser-0.1.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

license_normaliser-0.1-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file license_normaliser-0.1.tar.gz.

File metadata

  • Download URL: license_normaliser-0.1.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for license_normaliser-0.1.tar.gz
Algorithm Hash digest
SHA256 884dcc73a947db50d22136a770e08213f7d0c5d9d14dd1d126946db87b99ae33
MD5 bf6ce074e6fdf9216ad60d7e877638a1
BLAKE2b-256 51f0f11ebf802ef4d029f2191d082dbb5c4e2379b820898db9c8b75d45794baf

See more details on using hashes here.

File details

Details for the file license_normaliser-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for license_normaliser-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 41721b75d0e8542eb3584807f897bc10475f0c8bd596ec908f83472acc9319b1
MD5 8312c817c7c6c20573568cd01b38255a
BLAKE2b-256 263317b591cd4a01e84424ce07aadab6e83165222f5c4989ee3b5699cc8b6159

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page