Skip to main content

Comprehensive license normalisation with a three-level hierarchy.

Project description

license-normaliser logo

Comprehensive license normalsation with a three-level hierarchy.

PyPI Version Supported Python versions Build Status Documentation Status llms.txt - documentation for LLMs MIT Coverage

license-normaliser is a comprehensive license normalisation library that maps any license representation (SPDX tokens, URLs, prose descriptions) to a canonical three-level hierarchy.

Features

  • Three-level hierarchy - LicenseFamily → LicenseName → LicenseVersion.

  • Wide format support - SPDX tokens, URLs, prose descriptions.

  • Creative Commons support - Full CC family with versions and IGO variants.

  • Publisher-specific licenses - Springer, Nature, Elsevier, Wiley, ACS, and more.

  • File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.

  • Pluggable data sources - Drop in a new DataSource class to ingest any external license registry automatically.

  • Strict mode - Raise LicenseNotFoundError instead of silently returning "unknown".

  • Caching - LRU caching for performance.

  • CLI - Command-line interface with --strict support.

Hierarchy

The library uses a three-level hierarchy:

  1. LicenseFamily - broad bucket: "cc", "osi", "copyleft", "publisher-tdm", …

  2. LicenseName - version-free: "cc-by", "cc-by-nc-nd", "mit", "wiley-tdm"

  3. LicenseVersion - fully resolved: "cc-by-3.0", "cc-by-nc-nd-4.0"

Installation

With uv:

uv pip install license-normaliser

Or with pip:

pip install license-normaliser

Quick start

from license_normaliser import normalise_license

v = normalise_license("CC BY-NC-ND 4.0")
str(v)                  # "cc-by-nc-nd-4.0"   ← LicenseVersion
str(v.license)          # "cc-by-nc-nd"       ← LicenseName
str(v.license.family)   # "cc"                ← LicenseFamily

Strict mode

By default, unresolvable inputs return an "unknown" result. Pass strict=True to raise LicenseNotFoundError instead:

from license_normaliser import normalise_license
from license_normaliser.exceptions import LicenseNotFoundError

# Silent fallback (default)
v = normalise_license("some-unknown-string")
v.family.key  # "unknown"

# Strict: raises on unresolvable input
try:
    v = normalise_license("some-unknown-string", strict=True)
except LicenseNotFoundError as exc:
    print(exc.raw)      # original input
    print(exc.cleaned)  # cleaned form that failed lookup

Batch normalisation

from license_normaliser import normalise_licenses

results = normalise_licenses(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
    print(r.key)

# Strict batch - raises on first unresolvable
results = normalise_licenses(["MIT", "Apache-2.0"], strict=True)

Custom plugins

The LicenseNormaliser class lets you inject custom plugin classes for specialised use cases:

from license_normaliser import LicenseNormaliser
from license_normaliser.parsers.alias import AliasParser
from license_normaliser.parsers.spdx import SPDXParser

# Use only SPDX + Alias plugins (no CC, no publisher URLs)
ln = LicenseNormaliser(
    registry=[SPDXParser],
    alias=[AliasParser],
    family=[AliasParser],
    name=[AliasParser],
    cache=True,
    cache_maxsize=8192,
)

# MIT resolves via SPDX parser
assert str(ln.normalise_license("MIT")) == "mit"

# CC BY resolves via Alias
assert str(ln.normalise_license("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0"

For caching, LicenseNormaliser wraps the resolution method with lru_cache. Disable it by passing cache=False for debugging:

from license_normaliser import LicenseNormaliser

ln = LicenseNormaliser(cache=False)
result = ln.normalise_license("MIT")

Update data sources (CLI)

license-normaliser update-data --force
# Fetches fresh SPDX + OpenDefinition JSONs into src/license_normaliser/data/

Integration tests (public API only)

All integration tests live in src/license_normaliser/tests/test_integration.py and only import the public API.

CLI usage

Normalise a single license:

license-normaliser normalise "MIT"
# Output: mit

license-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# License: cc-by
# Family: cc

license-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an error

Batch normalise:

license-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
license-normaliser batch --strict MIT "Apache-2.0"

Exceptions

from license_normaliser.exceptions import (
    LicenseNormaliserError,   # base class
    LicenseNotFoundError,     # raised by strict mode
)

Testing

All tests run inside Docker:

make test

To test a specific Python version:

make test-env ENV=py312

License

MIT

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

license_normaliser-0.3.1.tar.gz (170.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

license_normaliser-0.3.1-py3-none-any.whl (182.4 kB view details)

Uploaded Python 3

File details

Details for the file license_normaliser-0.3.1.tar.gz.

File metadata

  • Download URL: license_normaliser-0.3.1.tar.gz
  • Upload date:
  • Size: 170.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for license_normaliser-0.3.1.tar.gz
Algorithm Hash digest
SHA256 98493c6a6d2197a91160a890797028f3bf7aa5b45d799458f7af2faacb9c962b
MD5 ac898ac9a8a4809257243c85398e1edd
BLAKE2b-256 84cc7cf57bb9451cde8619da8d29d3ff84e96c8b830ab33b634020a8f841f814

See more details on using hashes here.

File details

Details for the file license_normaliser-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for license_normaliser-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 15c67a662243861c179fae98208956e1ac609d8e5950016c37b373990512b9dc
MD5 37682fd1b5ff1387d244894c32dfcd4c
BLAKE2b-256 6313fc55cf09467970508c7460a7421dbfb5d156945be3d2be296e09b757d14d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page