Skip to main content

Robust licence normalisation with a three-level hierarchy for common licences.

Project description

licence-normaliser logo

Robust licence normalisation with a three-level hierarchy for common licences.

PyPI Version Supported Python versions Build Status Documentation Status llms.txt - documentation for LLMs Ask DeepWiki MIT Coverage

licence-normaliser maps common licence representations (SPDX tokens, URLs, prose descriptions) to a canonical three-level hierarchy.

Features

  • Three-level hierarchy - LicenceFamily → LicenceName → LicenceVersion.

  • Wide format support - SPDX tokens, URLs, and prose descriptions for supported licences.

  • Creative Commons support - Full CC family with versions and IGO variants.

  • Publisher-specific licences - Springer, Nature, Elsevier, Wiley, ACS, and more.

  • File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.

  • Pluggable parsers - Drop in a new parser class to ingest any external licence registry. Parsers implement plugin interfaces (RegistryPlugin, URLPlugin, etc.).

  • Strict mode - Raise LicenceNotFoundError instead of silently returning "unknown".

  • Caching - LRU caching for performance.

  • CLI - Command-line interface with --strict and --trace support.

Hierarchy

The library uses a three-level hierarchy:

  1. LicenceFamily - broad bucket: "cc", "osi", "copyleft", "publisher-tdm", …

  2. LicenceName - version-free: "cc-by", "cc-by-nc-nd", "mit", "wiley-tdm"

  3. LicenceVersion - fully resolved: "cc-by-3.0", "cc-by-nc-nd-4.0"

LicenceVersion also has optional jurisdiction (e.g., "uk", "au") and scope (e.g., "igo") fields for CC licences.

Installation

With uv:

uv pip install licence-normaliser

Or with pip:

pip install licence-normaliser

Quick start

from licence_normaliser import normalise_licence

v = normalise_licence("CC BY-NC-ND 4.0")
assert str(v) == "cc-by-nc-nd-4.0"      #     ← LicenceVersion
assert str(v.licence) == "cc-by-nc-nd"  #     ← LicenceName
assert str(v.licence.family) == "cc"    #     ← LicenceFamily

# With jurisdiction and scope
v = normalise_licence("http://creativecommons.org/licenses/by-nc/2.0/uk")
assert v.jurisdiction == "uk"
assert v.scope is None

v = normalise_licence("http://creativecommons.org/licenses/by-nc/3.0/igo")
assert v.jurisdiction is None
assert v.scope == "igo"

Strict mode

By default, unresolvable inputs return an "unknown" result. Pass strict=True to raise LicenceNotFoundError instead:

from licence_normaliser import normalise_licence
from licence_normaliser.exceptions import LicenceNotFoundError

# Silent fallback (default)
v = normalise_licence("some-unknown-string")
assert v.family.key == "unknown"

# Strict: raises on unresolvable input
try:
    v = normalise_licence("some-unknown-string", strict=True)
except LicenceNotFoundError as exc:
    print(exc.raw)      # original input
    print(exc.cleaned)  # cleaned form that failed lookup

Trace / Explain

Set ENABLE_LICENCE_NORMALISER_TRACE=1 or pass trace=True to get resolution traces showing how the licence was matched:

from licence_normaliser import normalise_licence

# Via function
v = normalise_licence("cc by-nc-nd 3.0 igo", trace=True)
print(v.explain())

# Via class
from licence_normaliser import LicenceNormaliser
ln = LicenceNormaliser(trace=True)
v = ln.normalise_licence("MIT")
print(v.explain())

Output shows the resolution pipeline (alias → registry → url → prose → fallback) and which source file + line matched:

Input: 'cc by-nc-nd 3.0 igo' → 'cc by-nc-nd 3.0 igo'
  [✓] alias: 'cc by-nc-nd 3.0 igo' → 'cc-by-nc-nd-3.0-igo' (line 139 in aliases.json)

Result:
  version_key: 'cc-by-nc-nd-3.0-igo'
  name_key: 'cc-by-nc-nd'
  family_key: 'cc'

The trace can also be accessed via v._trace for programmatic use.

Batch normalisation

from licence_normaliser import normalise_licences

results = normalise_licences(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
    print(r.key)

# Strict batch - raises on first unresolvable
results = normalise_licences(["MIT", "Apache-2.0"], strict=True)

Custom plugins

The LicenceNormaliser class lets you inject custom plugin classes for specialised use cases:

from licence_normaliser import LicenceNormaliser
from licence_normaliser.parsers.alias import AliasParser
from licence_normaliser.parsers.spdx import SPDXParser

# Use only SPDX + Alias plugins (no CC, no publisher URLs)
ln = LicenceNormaliser(
    registry=[SPDXParser],
    alias=[AliasParser],
    family=[AliasParser],
    name=[AliasParser],
    cache=True,
    cache_maxsize=8192,
)

# MIT resolves via SPDX parser
assert str(ln.normalise_licence("MIT")) == "mit"

# CC BY resolves via Alias
assert str(ln.normalise_licence("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0"

For caching, LicenceNormaliser wraps the resolution method with lru_cache. Disable it by passing cache=False for debugging:

from licence_normaliser import LicenceNormaliser

ln = LicenceNormaliser(cache=False)
result = ln.normalise_licence("MIT")

Update data (CLI)

licence-normaliser update-data --force
# Fetches fresh SPDX, OpenDefinition, OSI, CreativeCommons, and ScanCode JSONs

Integration tests (public API only)

All integration tests live in src/licence_normaliser/tests/test_integration.py and only import the public API.

CLI usage

Normalise a single licence:

licence-normaliser normalise "MIT"
# Output: mit

licence-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# Licence: cc-by
# Family: cc

licence-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an error

Batch normalise:

licence-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
licence-normaliser batch --strict MIT "Apache-2.0"

Exceptions

from licence_normaliser.exceptions import (
    DataSourceError,           # data source loading errors
    LicenceNormaliserError,    # base class
    LicenceNotFoundError,      # raised by strict mode
    LicenceNormalisationError, # kept for backwards compatibility
)

from licence_normaliser import (
    LicenceTrace,        # resolution trace object
    LicenceTraceStage,   # resolution stage enum
)

Testing

All tests run inside Docker:

make test

To test a specific Python version:

make test-env ENV=py312

Licence

MIT

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

licence_normaliser-0.6.1.tar.gz (180.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

licence_normaliser-0.6.1-py3-none-any.whl (190.9 kB view details)

Uploaded Python 3

File details

Details for the file licence_normaliser-0.6.1.tar.gz.

File metadata

  • Download URL: licence_normaliser-0.6.1.tar.gz
  • Upload date:
  • Size: 180.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for licence_normaliser-0.6.1.tar.gz
Algorithm Hash digest
SHA256 a31fc290b2eb8846a1d43b6636e5013a431ec27d5d13184c774941b9ea48df75
MD5 a80ad85238515d89d4a5f2247dae998b
BLAKE2b-256 f84004800e51a772cffaf259cdc4eae6ececc162b2515eb0ed394a77d2d93f96

See more details on using hashes here.

File details

Details for the file licence_normaliser-0.6.1-py3-none-any.whl.

File metadata

File hashes

Hashes for licence_normaliser-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ec2a92c25c8f2a8fbf92548b3d94887b66d6d403c79787f62085ea41a3766948
MD5 318579d77d9bf9585494f32637079078
BLAKE2b-256 0ee2c36b7c03c6a8ed37e08c35711e2f29450fc9b3bacb58be25208cb56ef2a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page