Skip to main content

Comprehensive license normalisation with a three-level hierarchy.

Project description

license-normaliser logo

Comprehensive license normalsation with a three-level hierarchy.

PyPI Version Supported Python versions Build Status Documentation Status llms.txt - documentation for LLMs MIT Coverage

license-normaliser is a comprehensive license normalisation library that maps any license representation (SPDX tokens, URLs, prose descriptions) to a canonical three-level hierarchy.

Features

  • Three-level hierarchy - LicenseFamily → LicenseName → LicenseVersion.

  • Wide format support - SPDX tokens, URLs, prose descriptions.

  • Creative Commons support - Full CC family with versions and IGO variants.

  • Publisher-specific licenses - Springer, Nature, Elsevier, Wiley, ACS, and more.

  • File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.

  • Pluggable data sources - Drop in a new DataSource class to ingest any external license registry automatically.

  • Strict mode - Raise LicenseNotFoundError instead of silently returning "unknown".

  • Caching - LRU caching for performance.

  • CLI - Command-line interface with --strict support.

Hierarchy

The library uses a three-level hierarchy:

  1. LicenseFamily - broad bucket: "cc", "osi", "copyleft", "publisher-tdm", …

  2. LicenseName - version-free: "cc-by", "cc-by-nc-nd", "mit", "wiley-tdm"

  3. LicenseVersion - fully resolved: "cc-by-3.0", "cc-by-nc-nd-4.0"

Installation

With uv:

uv pip install license-normaliser

Or with pip:

pip install license-normaliser

Quick start

from license_normaliser import normalise_license

v = normalise_license("CC BY-NC-ND 4.0")
str(v)                  # "cc-by-nc-nd-4.0"   ← LicenseVersion
str(v.license)          # "cc-by-nc-nd"       ← LicenseName
str(v.license.family)   # "cc"                ← LicenseFamily

Strict mode

By default, unresolvable inputs return an "unknown" result. Pass strict=True to raise LicenseNotFoundError instead:

from license_normaliser import normalise_license
from license_normaliser.exceptions import LicenseNotFoundError

# Silent fallback (default)
v = normalise_license("some-unknown-string")
v.family.key  # "unknown"

# Strict: raises on unresolvable input
try:
    v = normalise_license("some-unknown-string", strict=True)
except LicenseNotFoundError as exc:
    print(exc.raw)      # original input
    print(exc.cleaned)  # cleaned form that failed lookup

Batch normalisation

from license_normaliser import normalise_licenses

results = normalise_licenses(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
    print(r.key)

# Strict batch - raises on first unresolvable
results = normalise_licenses(["MIT", "Apache-2.0"], strict=True)

Custom plugins

The LicenseNormaliser class lets you inject custom plugin classes for specialised use cases:

from license_normaliser import LicenseNormaliser
from license_normaliser.parsers.spdx import SPDXParser
from license_normaliser.parsers.alias import AliasParser

# Use only SPDX + Alias plugins (no CC, no publisher URLs)
ln = LicenseNormaliser(
    registry=[SPDXParser],
    alias=[AliasParser],
    family=[AliasParser],
    name=[AliasParser],
)

# MIT resolves via SPDX parser
assert str(ln.normalise_license("MIT")) == "mit"

# CC BY resolves via Alias
assert str(ln.normalise_license("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0"

To use all defaults, import from defaults:

from license_normaliser import LicenseNormaliser
from license_normaliser.defaults import (
    get_default_registry,
    get_default_url,
    get_default_alias,
    get_default_family,
    get_default_name,
    get_default_prose,
)

ln = LicenseNormaliser(
    registry=get_default_registry(),
    url=get_default_url(),
    alias=get_default_alias(),
    family=get_default_family(),
    name=get_default_name(),
    prose=get_default_prose(),
    cache=True,
    cache_maxsize=8192,
)

For caching, LicenseNormaliser wraps the resolution method with lru_cache. Disable it by passing cache=False for debugging:

from license_normaliser import LicenseNormaliser

ln = LicenseNormaliser(cache=False)
result = ln.normalise_license("MIT")

Update data sources (CLI)

license-normaliser update-data --force
# Fetches fresh SPDX + OpenDefinition JSONs into src/license_normaliser/data/

Integration tests (public API only)

All integration tests live in src/license_normaliser/tests/test_integration.py and only import the public API.

CLI usage

Normalise a single license:

license-normaliser normalise "MIT"
# Output: mit

license-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# License: cc-by
# Family: cc

license-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an error

Batch normalise:

license-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
license-normaliser batch --strict MIT "Apache-2.0"

Exceptions

from license_normaliser.exceptions import (
    LicenseNormaliserError,   # base class
    LicenseNotFoundError,     # raised by strict mode
)

Testing

All tests run inside Docker:

make test

To test a specific Python version:

make test-env ENV=py312

License

MIT

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

license_normaliser-0.3.tar.gz (170.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

license_normaliser-0.3-py3-none-any.whl (182.5 kB view details)

Uploaded Python 3

File details

Details for the file license_normaliser-0.3.tar.gz.

File metadata

  • Download URL: license_normaliser-0.3.tar.gz
  • Upload date:
  • Size: 170.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for license_normaliser-0.3.tar.gz
Algorithm Hash digest
SHA256 a6be6dff98a4d9a4947f6c639aa1b3197af2413984e46dfbf247e887b8913f4f
MD5 fd98f1aae6c9c4809c47fe256b54ab8a
BLAKE2b-256 5d2a1f39a3a1170b2ebe34519c25f13c200bd15a79ab80d7a441b820c62bb2e6

See more details on using hashes here.

File details

Details for the file license_normaliser-0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for license_normaliser-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a1eb2b49c4b2c19acd08031c183ac44fce65c4e5038784fab51af817894cfad4
MD5 81c5fdffc32fe7303d681735cc7a0b4e
BLAKE2b-256 8f2396b5c37ba104fa0e8524dfe5a2e173a5b70fc713671604822655dc92b3c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page