Skip to main content

Get the SPDX License ID from license text

Project description

LicenseID

PyPI - Version

A portable license ID matcher. Get the SPDX License ID from license text.

licenseid takes license text as input and identifies the closest matched SPDX License ID using a hybrid search strategy (trigram + token ratio ranking).

Features

  • Hybrid strategy:
    • Tier 0 (Shortcut): Immediate identification for exact license names and IDs.
    • Tier 1 (Recall): Rapid candidate retrieval using SQLite FTS5 (trigram) with query truncation for performance.
    • Tier 2 (Precision): Adaptive ranking using RapidFuzz with boosting for canonical matches.
    • Tier 3 (Validation): Optional final validation via tools-java if available.
  • Unix philosophy: Parseable, line-delimited CLI output.
  • Performance: Sub-second matching for most licenses; optimized for large file handling.

Installation

Install with pipx:

pipx install licenseid

Or using uv:

uv tool install licenseid

Usage

1. Update the license database

Before matching, you need to build the local license index:

licenseid update

Advanced update options:

  • --version <version>: Download a specific SPDX License List version (e.g., 3.28.0).
  • --force: Force update even if the local database is already at the target version.
  • --no-cache: Bypass the local cache for downloads.

2. Match a license

Identify license text from a file:

licenseid match LICENSE.txt

Or match from a string:

licenseid match --text "Apache License\nVersion 2.0"

The --text argument supports standard escape sequences (e.g., \n, \t, \") which are automatically unescaped before matching.

Common options:

  • --db <path>: Use a custom database path (global option). Supports SQLite URIs for in-memory databases (e.g., file:test?mode=memory&cache=shared).
  • --bold: Print only the top license ID (no other info).
  • --diff: Show a word-by-word diff between the input and the best-matching candidate.
  • --json: Output results in JSON format.

The system uses a composite score (Similarity + Coverage + Popularity) to ensure the "tightest" match is preferred (e.g., distinguishing between a license and its supersets).

3. Cache management

licenseid maintains a local cache of remote data to save bandwidth.

  • licenses.json: Cached for 45 days.
  • popularity.csv: Cached for 75 days.
  • SPDX data tarballs are versioned and never expire.

To clear the cache manually:

licenseid --clear-cache

4. Output formats

Default (Unix-friendly):

LICENSE_ID=Apache-2.0 SIMILARITY=0.9850 COVERAGE=1.0000

ID only:

licenseid match LICENSE.txt --bold

Example output:

Apache-2.0

JSON:

licenseid match LICENSE.txt --json

Example output:

[
  {
    "license_id": "Apache-2.0",
    "score": 0.985,
    "similarity": 0.985,
    "coverage": 1.0,
    "is_spdx": true,
    "is_osi_approved": true
  }
]

Diff (visual comparison):

licenseid match LICENSE.txt --diff

Example output:

LICENSE_ID=Apache-2.0 SIMILARITY=0.9980 COVERAGE=0.9975

WORD DIFF:
--- DATABASE
+++ INPUT
@@ -1601,8 +1601,4 @@
 language
 governing
 permissions
-and
-limitations
-under
-the
-license
+se

Python API

You can use licenseid directly in your Python projects:

import json
from licenseid.matcher import AggregatedLicenseMatcher

# Initialize the matcher (uses default database path if not provided)
matcher = AggregatedLicenseMatcher()

# Match license text
results = matcher.match("MIT License")

# Results are returned as a list of dictionaries (JSON-serializable)
print(json.dumps(results, indent=2))

Example JSON output:

[
  {
    "license_id": "MIT",
    "score": 1.01,
    "similarity": 1.0,
    "coverage": 0.0
  }
]

Development

Running Tests

Regular test suite:

pytest

Run benchmarks and accuracy tests (expensive):

pytest --run-benchmark

Configuration

  • SPDX_TOOLS_JAR: Path to the tools-java jar for Tier 3 validation.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

licenseid-0.2.1.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

licenseid-0.2.1-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file licenseid-0.2.1.tar.gz.

File metadata

  • Download URL: licenseid-0.2.1.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for licenseid-0.2.1.tar.gz
Algorithm Hash digest
SHA256 1b826b970de23877d9d22f2d5cf07d9f2f24b24d05b09e8591d1226662045e27
MD5 66e6cc5288910835c1f7aca01a5a7fbe
BLAKE2b-256 1715d2311b4fa5872b54ee754a3b3d9911ccf7f51a75d5a69d324adb1b30da4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for licenseid-0.2.1.tar.gz:

Publisher: pypi-publish.yml on bact/licenseid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file licenseid-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: licenseid-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for licenseid-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 44134a8bb2df980942a5aee36f9c4111c23c737f8b262901c9d60d3145fd0b87
MD5 df2bece25a649e8b49bde03fb1a28caf
BLAKE2b-256 c9edce6edff25fb93b6908467c17c1600f3b4dce56ebb5dc7f30fe52190cbf7f

See more details on using hashes here.

Provenance

The following attestation bundles were made for licenseid-0.2.1-py3-none-any.whl:

Publisher: pypi-publish.yml on bact/licenseid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page