Skip to main content

Comprehensive license normalisation with a three-level hierarchy.

Project description

licence-normaliser logo

Comprehensive license normalsation with a three-level hierarchy.

PyPI Version Supported Python versions Build Status Documentation Status llms.txt - documentation for LLMs MIT Coverage

licence-normaliser is a comprehensive license normalisation library that maps any license representation (SPDX tokens, URLs, prose descriptions) to a canonical three-level hierarchy.

Features

  • Three-level hierarchy - LicenseFamily → LicenseName → LicenseVersion.

  • Wide format support - SPDX tokens, URLs, prose descriptions.

  • Creative Commons support - Full CC family with versions and IGO variants.

  • Publisher-specific licenses - Springer, Nature, Elsevier, Wiley, ACS, and more.

  • File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.

  • Pluggable parsers - Drop in a new parser class to ingest any external license registry. Parsers implement plugin interfaces (RegistryPlugin, URLPlugin, etc.).

  • Strict mode - Raise LicenseNotFoundError instead of silently returning "unknown".

  • Caching - LRU caching for performance.

  • CLI - Command-line interface with --strict and --explain support.

Hierarchy

The library uses a three-level hierarchy:

  1. LicenseFamily - broad bucket: "cc", "osi", "copyleft", "publisher-tdm", …

  2. LicenseName - version-free: "cc-by", "cc-by-nc-nd", "mit", "wiley-tdm"

  3. LicenseVersion - fully resolved: "cc-by-3.0", "cc-by-nc-nd-4.0"

Installation

With uv:

uv pip install licence-normaliser

Or with pip:

pip install licence-normaliser

Quick start

from licence_normaliser import normalise_license

v = normalise_license("CC BY-NC-ND 4.0")
str(v)                  # "cc-by-nc-nd-4.0"   ← LicenseVersion
str(v.license)          # "cc-by-nc-nd"       ← LicenseName
str(v.license.family)   # "cc"                ← LicenseFamily

Strict mode

By default, unresolvable inputs return an "unknown" result. Pass strict=True to raise LicenseNotFoundError instead:

from licence_normaliser import normalise_license
from licence_normaliser.exceptions import LicenseNotFoundError

# Silent fallback (default)
v = normalise_license("some-unknown-string")
v.family.key  # "unknown"

# Strict: raises on unresolvable input
try:
    v = normalise_license("some-unknown-string", strict=True)
except LicenseNotFoundError as exc:
    print(exc.raw)      # original input
    print(exc.cleaned)  # cleaned form that failed lookup

Trace / Explain

Set ENABLE_LICENCE_NORMALISER_TRACE=1 or pass trace=True to get resolution traces showing how the license was matched:

from licence_normaliser import normalise_license

# Via function
v = normalise_license("cc by-nc-nd 3.0 igo", trace=True)
print(v.explain())

# Via class
from licence_normaliser import LicenseNormaliser
ln = LicenseNormaliser(trace=True)
v = ln.normalise_license("MIT")
print(v.explain())

Output shows the resolution pipeline (alias → registry → url → prose → fallback) and which source file + line matched:

Input: 'cc by-nc-nd 3.0 igo' → 'cc by-nc-nd 3.0 igo'
  [✓] alias: 'cc by-nc-nd 3.0 igo' → 'cc-by-nc-nd-3.0-igo' (line 139 in aliases.json)

Result:
  version_key: 'cc-by-nc-nd-3.0-igo'
  name_key: 'cc-by-nc-nd'
  family_key: 'cc'

The trace can also be accessed via v._trace for programmatic use.

Batch normalisation

from licence_normaliser import normalise_licenses

results = normalise_licenses(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
    print(r.key)

# Strict batch - raises on first unresolvable
results = normalise_licenses(["MIT", "Apache-2.0"], strict=True)

Custom plugins

The LicenseNormaliser class lets you inject custom plugin classes for specialised use cases:

from licence_normaliser import LicenseNormaliser
from licence_normaliser.parsers.alias import AliasParser
from licence_normaliser.parsers.spdx import SPDXParser

# Use only SPDX + Alias plugins (no CC, no publisher URLs)
ln = LicenseNormaliser(
    registry=[SPDXParser],
    alias=[AliasParser],
    family=[AliasParser],
    name=[AliasParser],
    cache=True,
    cache_maxsize=8192,
)

# MIT resolves via SPDX parser
assert str(ln.normalise_license("MIT")) == "mit"

# CC BY resolves via Alias
assert str(ln.normalise_license("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0"

For caching, LicenseNormaliser wraps the resolution method with lru_cache. Disable it by passing cache=False for debugging:

from licence_normaliser import LicenseNormaliser

ln = LicenseNormaliser(cache=False)
result = ln.normalise_license("MIT")

Update data (CLI)

licence-normaliser update-data --force
# Fetches fresh SPDX, OpenDefinition, OSI, CreativeCommons, and ScanCode JSONs

Integration tests (public API only)

All integration tests live in src/licence_normaliser/tests/test_integration.py and only import the public API.

CLI usage

Normalise a single license:

licence-normaliser normalise "MIT"
# Output: mit

licence-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# License: cc-by
# Family: cc

licence-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an error

Batch normalise:

licence-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
licence-normaliser batch --strict MIT "Apache-2.0"

Exceptions

from licence_normaliser.exceptions import (
    LicenseNormaliserError,   # base class
    LicenseNotFoundError,     # raised by strict mode
)

Testing

All tests run inside Docker:

make test

To test a specific Python version:

make test-env ENV=py312

License

MIT

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

licence_normaliser-0.3.2.tar.gz (174.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

licence_normaliser-0.3.2-py3-none-any.whl (186.7 kB view details)

Uploaded Python 3

File details

Details for the file licence_normaliser-0.3.2.tar.gz.

File metadata

  • Download URL: licence_normaliser-0.3.2.tar.gz
  • Upload date:
  • Size: 174.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for licence_normaliser-0.3.2.tar.gz
Algorithm Hash digest
SHA256 3297138536e3865f26fb8f5e0f53a8f9b7cb6a9bfbd8adbbdcc5e9d911efba8e
MD5 581412ee3879ff64f00956ce8abc3ece
BLAKE2b-256 6f76e89947e0175375fd6e647f6d2503bd533488a5b83c6635b5c6d6a7777513

See more details on using hashes here.

File details

Details for the file licence_normaliser-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for licence_normaliser-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b924fd8ad570fd5bb1aa244c82bab06b5e59848c71a498b25d8e7ea21206bc03
MD5 84a337730aa956298ec338fdd53509e4
BLAKE2b-256 6f12f46806217c00d9208aca4afd4767ff18d0b2645e95fb164b696c5bc64684

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page