Robust licence normalisation with a three-level hierarchy for common licences.
Project description
Robust licence normalisation with a three-level hierarchy for common licences.
licence-normaliser maps common licence representations (SPDX tokens, URLs, prose descriptions) to a canonical three-level hierarchy.
Features
Three-level hierarchy - LicenceFamily → LicenceName → LicenceVersion.
Wide format support - SPDX tokens, URLs, and prose descriptions for supported licences.
Creative Commons support - Full CC family with versions and IGO variants.
Publisher-specific licences - Springer, Nature, Elsevier, Wiley, ACS, and more.
File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.
Pluggable parsers - Drop in a new parser class to ingest any external licence registry. Parsers implement plugin interfaces (RegistryPlugin, URLPlugin, etc.).
Strict mode - Raise LicenceNotFoundError instead of silently returning "unknown".
Caching - LRU caching for performance.
CLI - Command-line interface with --strict and --trace support.
Hierarchy
The library uses a three-level hierarchy:
LicenceFamily - broad bucket: "cc", "osi", "copyleft", "publisher-tdm", …
LicenceName - version-free: "cc-by", "cc-by-nc-nd", "mit", "wiley-tdm"
LicenceVersion - fully resolved: "cc-by-3.0", "cc-by-nc-nd-4.0"
Installation
With uv:
uv pip install licence-normaliser
Or with pip:
pip install licence-normaliser
Quick start
from licence_normaliser import normalise_licence
v = normalise_licence("CC BY-NC-ND 4.0")
assert str(v) == "cc-by-nc-nd-4.0" # ← LicenceVersion
assert str(v.licence) == "cc-by-nc-nd" # ← LicenceName
assert str(v.licence.family) == "cc" # ← LicenceFamily
Strict mode
By default, unresolvable inputs return an "unknown" result. Pass strict=True to raise LicenceNotFoundError instead:
from licence_normaliser import normalise_licence
from licence_normaliser.exceptions import LicenceNotFoundError
# Silent fallback (default)
v = normalise_licence("some-unknown-string")
assert v.family.key == "unknown"
# Strict: raises on unresolvable input
try:
v = normalise_licence("some-unknown-string", strict=True)
except LicenceNotFoundError as exc:
print(exc.raw) # original input
print(exc.cleaned) # cleaned form that failed lookup
Trace / Explain
Set ENABLE_LICENCE_NORMALISER_TRACE=1 or pass trace=True to get resolution traces showing how the licence was matched:
from licence_normaliser import normalise_licence
# Via function
v = normalise_licence("cc by-nc-nd 3.0 igo", trace=True)
print(v.explain())
# Via class
from licence_normaliser import LicenceNormaliser
ln = LicenceNormaliser(trace=True)
v = ln.normalise_licence("MIT")
print(v.explain())
Output shows the resolution pipeline (alias → registry → url → prose → fallback) and which source file + line matched:
Input: 'cc by-nc-nd 3.0 igo' → 'cc by-nc-nd 3.0 igo'
[✓] alias: 'cc by-nc-nd 3.0 igo' → 'cc-by-nc-nd-3.0-igo' (line 139 in aliases.json)
Result:
version_key: 'cc-by-nc-nd-3.0-igo'
name_key: 'cc-by-nc-nd'
family_key: 'cc'
The trace can also be accessed via v._trace for programmatic use.
Batch normalisation
from licence_normaliser import normalise_licences
results = normalise_licences(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
print(r.key)
# Strict batch - raises on first unresolvable
results = normalise_licences(["MIT", "Apache-2.0"], strict=True)
Custom plugins
The LicenceNormaliser class lets you inject custom plugin classes for specialised use cases:
from licence_normaliser import LicenceNormaliser
from licence_normaliser.parsers.alias import AliasParser
from licence_normaliser.parsers.spdx import SPDXParser
# Use only SPDX + Alias plugins (no CC, no publisher URLs)
ln = LicenceNormaliser(
registry=[SPDXParser],
alias=[AliasParser],
family=[AliasParser],
name=[AliasParser],
cache=True,
cache_maxsize=8192,
)
# MIT resolves via SPDX parser
assert str(ln.normalise_licence("MIT")) == "mit"
# CC BY resolves via Alias
assert str(ln.normalise_licence("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0"
For caching, LicenceNormaliser wraps the resolution method with lru_cache. Disable it by passing cache=False for debugging:
from licence_normaliser import LicenceNormaliser
ln = LicenceNormaliser(cache=False)
result = ln.normalise_licence("MIT")
Update data (CLI)
licence-normaliser update-data --force
# Fetches fresh SPDX, OpenDefinition, OSI, CreativeCommons, and ScanCode JSONs
Integration tests (public API only)
All integration tests live in src/licence_normaliser/tests/test_integration.py and only import the public API.
CLI usage
Normalise a single licence:
licence-normaliser normalise "MIT"
# Output: mit
licence-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# Licence: cc-by
# Family: cc
licence-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an error
Batch normalise:
licence-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
licence-normaliser batch --strict MIT "Apache-2.0"
Exceptions
from licence_normaliser.exceptions import (
DataSourceError, # data source loading errors
LicenceNormaliserError, # base class
LicenceNotFoundError, # raised by strict mode
LicenceNormalisationError, # kept for backwards compatibility
)
from licence_normaliser import (
LicenceTrace, # resolution trace object
LicenceTraceStage, # resolution stage enum
)
Testing
All tests run inside Docker:
make test
To test a specific Python version:
make test-env ENV=py312
Licence
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file licence_normaliser-0.5.2.tar.gz.
File metadata
- Download URL: licence_normaliser-0.5.2.tar.gz
- Upload date:
- Size: 175.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20830d49d04254ae93685ca32d4b775aafea17e816da5b1239a5874abb142114
|
|
| MD5 |
c59a76543319883cc9940305fa55526e
|
|
| BLAKE2b-256 |
580b6791902146505abbaf4ba110733fbf04baa293ef31e71a6e7f7bb115b641
|
File details
Details for the file licence_normaliser-0.5.2-py3-none-any.whl.
File metadata
- Download URL: licence_normaliser-0.5.2-py3-none-any.whl
- Upload date:
- Size: 186.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a45b88285df0fa5b22335f0c3001a76362737bded6a9ab673ebbcbb2a2483f43
|
|
| MD5 |
dcdb9185dd785d4710175cfb3597f305
|
|
| BLAKE2b-256 |
839e42e72fe2a364071be69009b62f15e4d36f64fd6c73573b98736e8d910d27
|