Skip to main content

Get the SPDX License ID from license text

Project description

LicenseID

PyPI - Version

A portable license ID matcher. Get the SPDX License ID from license text.

licenseid takes license text as input and identifies the closest matched SPDX License ID using a hybrid search strategy (trigram + token ratio ranking).

Features

  • Hybrid strategy:
    • Tier 0 (Shortcut): Immediate identification for exact license names and IDs.
    • Tier 1 (Recall): Rapid candidate retrieval using SQLite FTS5 (trigram) with query truncation for performance.
    • Tier 2 (Precision): Adaptive ranking using RapidFuzz with boosting for canonical matches.
    • Tier 3 (Validation): Optional final validation via tools-java if available.
  • Unix philosophy: Parseable, line-delimited CLI output.
  • Performance: Sub-second matching for most licenses; optimized for large file handling.

Installation

Install with pipx:

pipx install licenseid

Or using uv:

uv tool install licenseid

Usage

1. Update the license database

Before matching, you need to build the local license index:

licenseid update

Advanced update options:

  • --version <version>: Download a specific SPDX License List version (e.g., 3.28.0).
  • --force: Force update even if the local database is already at the target version.
  • --no-cache: Bypass the local cache for downloads.

2. Match a license

Identify license text from a file:

licenseid match LICENSE.txt

Or match from a string:

licenseid match --text "Apache License\nVersion 2.0"

The --text argument supports standard escape sequences (e.g., \n, \t, \") which are automatically unescaped before matching.

Common options:

  • --diff: Show a word-by-word diff between the input and the best-matching candidate.
  • --java: Enable Tier 3 Java validation (requires SPDX_TOOLS_JAR and jpype1).
  • --pop: Enable popularity weighting as a tie-breaker.
  • --json: Output results in JSON format.
  • --db <path>: Use a custom database path (global option). Supports SQLite URIs for in-memory databases (e.g., file:test?mode=memory&cache=shared).

The system uses a composite score (Similarity + Coverage + Popularity) to ensure the "tightest" match is preferred (e.g., distinguishing between a license and its supersets).

3. Cache management

licenseid maintains a local cache of remote data to save bandwidth.

  • licenses.json: Cached for 45 days.
  • popularity.csv: Cached for 75 days.
  • SPDX data tarballs are versioned and never expire.

To clear the cache manually:

licenseid --clear-cache

4. Output formats

Default (Unix-friendly):

LICENSE_ID=Apache-2.0 SIMILARITY=0.9850 COVERAGE=1.0000

JSON:

licenseid match LICENSE.txt --json

Diff (visual comparison):

licenseid match LICENSE.txt --diff

Example output:

LICENSE_ID=Apache-2.0 SIMILARITY=0.9980 COVERAGE=0.9975

WORD DIFF:
--- DATABASE
+++ INPUT
@@ -1601,8 +1601,4 @@
 language
 governing
 permissions
-and
-limitations
-under
-the
-license
+se

Development

Running Tests

Regular test suite:

pytest

Run benchmarks and accuracy tests (expensive):

pytest --run-benchmark

Configuration

  • SPDX_TOOLS_JAR: Path to the tools-java jar for Tier 3 validation.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

licenseid-0.2.0.tar.gz (30.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

licenseid-0.2.0-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file licenseid-0.2.0.tar.gz.

File metadata

  • Download URL: licenseid-0.2.0.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for licenseid-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7e1e8397a468258cbae2d38806d64162f9e4d32d5926887059902525a65efa9c
MD5 de0f2d15aee533d46c08ec2f0720726a
BLAKE2b-256 3b591e373f96fe4c3c4981aa0576eae72415ab2877c80dd4fb06fec2653ab0cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for licenseid-0.2.0.tar.gz:

Publisher: pypi-publish.yml on bact/licenseid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file licenseid-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: licenseid-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for licenseid-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 70ce43acd3a50e978403865b4cde4fdf80b2d9176571cf443c205420579ee401
MD5 8468c10e7095e1ab9a9bd7882d692995
BLAKE2b-256 c6c6090d4b36770b923f0e961c3c340276ed6eac26c87bdfb2c1bf1e19ba033c

See more details on using hashes here.

Provenance

The following attestation bundles were made for licenseid-0.2.0-py3-none-any.whl:

Publisher: pypi-publish.yml on bact/licenseid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page