Skip to main content

Get the SPDX License ID from license text

Project description

LicenseID - A portable SPDX License ID matcher

PyPI - Version DOI

Get the SPDX License ID from license text.

A portable license ID matcher with command line interface and Python API.

Used as a license detection engine for Pitloom software bill of materilas generator.

Features

  • Hybrid matching strategy:
    • Tier 0 (Shortcut): Immediate identification for exact license names and IDs.
    • Tier 1 (Recall): Rapid candidate retrieval using SQLite FTS5 (trigram) with query truncation for performance.
    • Tier 2 (Precision): Adaptive ranking using RapidFuzz with boosting for canonical matches.
    • Tier 3 (Validation): Optional final validation via tools-java if available.
  • Unix philosophy: Parseable, line-delimited CLI output.

Installation

Install with pipx:

pipx install licenseid

Or using uv:

uv tool install licenseid

Usage

1. Update the license database

Before matching, you need to build the local license index:

licenseid update

Advanced update options:

  • --version <version>: Download a specific SPDX License List version (e.g., 3.28.0).
  • --force: Force update even if the local database is already at the target version.
  • --no-cache: Bypass the local cache for downloads.

2. Match a license

Identify license text from a file:

licenseid match LICENSE.txt

Or match from a string:

licenseid match --text "Apache License\nVersion 2.0"

The --text argument supports standard escape sequences (e.g., \n, \t, \") which are automatically unescaped before matching.

Common options:

  • --db <path>: Use a custom database path (global option). Supports SQLite URIs for in-memory databases (e.g., file:test?mode=memory&cache=shared).
  • --bold: Print only the top license ID (no other info).
  • --diff: Show a word-by-word diff between the input and the best-matching candidate.
  • --json: Output results in JSON format.

The system uses a composite score (Similarity + Coverage + Popularity) to ensure the "tightest" match is preferred (e.g., distinguishing between a license and its supersets).

3. Cache management

licenseid maintains a local cache of remote data to save bandwidth.

  • licenses.json: Cached for 45 days.
  • popularity.csv: Cached for 75 days.
  • SPDX data tarballs are versioned and never expire.

To clear the cache manually:

licenseid --clear-cache

4. Output formats

Default (Unix-friendly):

LICENSE_ID=Apache-2.0 SIMILARITY=0.9850 COVERAGE=1.0000

ID only:

licenseid match LICENSE.txt --bold

Example output:

Apache-2.0

JSON:

licenseid match LICENSE.txt --json

Example output:

[
  {
    "license_id": "Apache-2.0",
    "score": 0.985,
    "similarity": 0.985,
    "coverage": 1.0,
    "is_spdx": true,
    "is_osi_approved": true
  }
]

Diff (visual comparison):

licenseid match LICENSE.txt --diff

Example output:

LICENSE_ID=Apache-2.0 SIMILARITY=0.9980 COVERAGE=0.9975

WORD DIFF:
--- DATABASE
+++ INPUT
@@ -1601,8 +1601,4 @@
 language
 governing
 permissions
-and
-limitations
-under
-the
-license
+se

Python API

You can use licenseid directly in your Python projects:

import json
from licenseid.matcher import AggregatedLicenseMatcher

# Initialize the matcher (uses default database path if not provided)
matcher = AggregatedLicenseMatcher()

# Match license text
results = matcher.match("MIT License")

# Results are returned as a list of dictionaries (JSON-serializable)
print(json.dumps(results, indent=2))

Example JSON output:

[
  {
    "license_id": "MIT",
    "score": 1.01,
    "similarity": 1.0,
    "coverage": 0.0
  }
]

Development

Running tests

Regular test suite:

pytest

Run benchmarks and accuracy tests (expensive):

pytest --run-benchmark

Configuration

  • SPDX_TOOLS_JAR: Path to the tools-java jar for Tier 3 validation.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

licenseid-0.2.2.tar.gz (32.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

licenseid-0.2.2-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file licenseid-0.2.2.tar.gz.

File metadata

  • Download URL: licenseid-0.2.2.tar.gz
  • Upload date:
  • Size: 32.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for licenseid-0.2.2.tar.gz
Algorithm Hash digest
SHA256 059df9fbfd6fa2eede4b9e5d32cd5874319a4b0e975e25dbe95e44dea0f1b39c
MD5 ec4d0d7a555e0a3ce18dc22231091723
BLAKE2b-256 fd0a53924a99453e433c7b4c2090218ce9593f4ef97e1579fe7c1768468c4bc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for licenseid-0.2.2.tar.gz:

Publisher: pypi-publish.yml on bact/licenseid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file licenseid-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: licenseid-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for licenseid-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a6a27b1f911ff7f50ab79aa27d775043d67ff987469085e64ff47df583adf67a
MD5 4c01854adf90a538bd8b22c4bd13fd99
BLAKE2b-256 e8c9e0271c91cd714c9ead47b3437f242181b64cd258251f458ee5b875ea1338

See more details on using hashes here.

Provenance

The following attestation bundles were made for licenseid-0.2.2-py3-none-any.whl:

Publisher: pypi-publish.yml on bact/licenseid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page