Skip to main content

Comprehensive license normalisation with a three-level hierarchy.

Project description

license-normaliser logo

Comprehensive license normalsation with a three-level hierarchy.

PyPI Version Supported Python versions Build Status Documentation Status llms.txt - documentation for LLMs MIT Coverage

license-normaliser is a comprehensive license normalisation library that maps any license representation (SPDX tokens, URLs, prose descriptions) to a canonical three-level hierarchy.

Features

  • Three-level hierarchy - LicenseFamily → LicenseName → LicenseVersion.

  • Wide format support - SPDX tokens, URLs, prose descriptions.

  • Creative Commons support - Full CC family with versions and IGO variants.

  • Publisher-specific licenses - Springer, Nature, Elsevier, Wiley, ACS, and more.

  • File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.

  • Pluggable data sources - Drop in a new DataSource class to ingest any external license registry automatically.

  • Strict mode - Raise LicenseNotFoundError instead of silently returning "unknown".

  • Caching - LRU caching for performance.

  • CLI - Command-line interface with --strict support.

Hierarchy

The library uses a three-level hierarchy:

  1. LicenseFamily - broad bucket: "cc", "osi", "copyleft", "publisher-tdm", …

  2. LicenseName - version-free: "cc-by", "cc-by-nc-nd", "mit", "wiley-tdm"

  3. LicenseVersion - fully resolved: "cc-by-3.0", "cc-by-nc-nd-4.0"

Installation

With uv:

uv pip install license-normaliser

Or with pip:

pip install license-normaliser

Quick start

from license_normaliser import normalise_license

v = normalise_license("CC BY-NC-ND 4.0")
str(v)                  # "cc-by-nc-nd-4.0"   ← LicenseVersion
str(v.license)          # "cc-by-nc-nd"       ← LicenseName
str(v.license.family)   # "cc"                ← LicenseFamily

Strict mode

By default, unresolvable inputs return an "unknown" result. Pass strict=True to raise LicenseNotFoundError instead:

from license_normaliser import normalise_license
from license_normaliser.exceptions import LicenseNotFoundError

# Silent fallback (default)
v = normalise_license("some-unknown-string")
v.family.key  # "unknown"

# Strict: raises on unresolvable input
try:
    v = normalise_license("some-unknown-string", strict=True)
except LicenseNotFoundError as exc:
    print(exc.raw)      # original input
    print(exc.cleaned)  # cleaned form that failed lookup

Batch normalisation

from license_normaliser import normalise_licenses

results = normalise_licenses(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
    print(r.key)

# Strict batch - raises on first unresolvable
results = normalise_licenses(["MIT", "Apache-2.0"], strict=True)

Update data sources (CLI)

license-normaliser update-data --force
# Fetches fresh SPDX + OpenDefinition JSONs into src/license_normaliser/data/

Integration tests (public API only)

All integration tests live in src/license_normaliser/tests/test_integration.py and only import the public API.

CLI usage

Normalise a single license:

license-normaliser normalise "MIT"
# Output: mit

license-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# License: cc-by
# Family: cc

license-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an error

Batch normalise:

license-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
license-normaliser batch --strict MIT "Apache-2.0"

Exceptions

from license_normaliser.exceptions import (
    LicenseNormaliserError,   # base class
    LicenseNotFoundError,     # raised by strict mode
)

Testing

All tests run inside Docker:

make test

To test a specific Python version:

make test-env ENV=py312

License

MIT

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

license_normaliser-0.2.tar.gz (162.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

license_normaliser-0.2-py3-none-any.whl (173.4 kB view details)

Uploaded Python 3

File details

Details for the file license_normaliser-0.2.tar.gz.

File metadata

  • Download URL: license_normaliser-0.2.tar.gz
  • Upload date:
  • Size: 162.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for license_normaliser-0.2.tar.gz
Algorithm Hash digest
SHA256 db2a0a8675e59622e5d2e99226950f2a6d57fbb9e49161ce1a010773415a923f
MD5 c12f3c4c04dc9b36be38cc40cfd5f4be
BLAKE2b-256 6e956140165c65d77716a8df8eabdf686380b8c9f6381dc20fc9f8dcc8b651f8

See more details on using hashes here.

File details

Details for the file license_normaliser-0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for license_normaliser-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 580775053f5e7781203e529dae60b627f5a07d0690778d0de6a76117c44a5a4d
MD5 63956353077ad35dcfb34f6f47f708ad
BLAKE2b-256 b0008fe84bcff5d38253871de67a14207b361846bb4c15df2fd4247f37cd2124

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page