Skip to main content

Open Source License Identification Library

Project description

OSS License & Copyright Detector (osslili)

License Python 3.8+ PyPI version

A high-performance tool for identifying licenses and copyright information in local source code. Produces detailed evidence of where licenses are detected with support for all 700+ SPDX license identifiers, enabling comprehensive compliance documentation for the SEMCL.ONE ecosystem.

Features

  • Three-Tier License Detection: Dice-Sørensen similarity, TLSH fuzzy hashing, and regex pattern matching
  • Evidence-Based Output: Exact file paths, confidence scores, and detection methods
  • 700+ SPDX Licenses: Comprehensive support for all SPDX license identifiers
  • SEMCL.ONE Integration: Works seamlessly with purl2notices, ospac, and other ecosystem tools

How It Works

Three-Tier License Detection System

The tool uses a sophisticated multi-tier approach for maximum accuracy:

  1. Tier 1: Dice-Sørensen Similarity with TLSH Confirmation

    • Compares license text using Dice-Sørensen coefficient (97% threshold)
    • Confirms matches using TLSH fuzzy hashing to prevent false positives
    • Achieves 97-100% accuracy on standard SPDX licenses
  2. Tier 2: TLSH Fuzzy Hash Matching

    • Uses Trend Micro Locality Sensitive Hashing for variant detection
    • Catches license variants like MIT-0, BSD-2-Clause vs BSD-3-Clause
    • Pre-computed hashes for all 700+ SPDX licenses
  3. Tier 3: Pattern Recognition

    • Regex-based detection for license references and identifiers
    • Extracts from comments, headers, and documentation

Additional Detection Methods

  • Package Metadata Scanning: Detects licenses from package.json, composer.json, pyproject.toml, etc.
  • Copyright Extraction: Advanced pattern matching with validation and deduplication
  • SPDX Identifier Detection: Finds SPDX-License-Identifier tags in source files

Installation

pip install osslili

For development:

git clone https://github.com/SemClone/osslili.git
cd osslili
pip install -e .

Quick Start

# Scan current directory for licenses
osslili .

# Generate SBOM with license evidence
osslili ./my-project -f cyclonedx-json -o sbom.json

Usage

CLI Usage

# Scan a directory and see evidence (default format)
osslili /path/to/project

# Generate different output formats
osslili ./my-project -f kissbom -o kissbom.json
osslili ./my-project -f cyclonedx-json -o sbom.json
osslili ./my-project -f cyclonedx-xml -o sbom.xml

# Scan with parallel processing (4 threads)
osslili ./my-project --threads 4

# Scan with limited depth (only 2 levels deep)
osslili ./my-project --max-depth 2

# Extract and scan archives
osslili package.tar.gz --max-extraction-depth 2

# Use caching for faster repeated scans
osslili ./my-project --cache-dir ~/.cache/osslili

# Check version
osslili --version

# Save results to file
osslili ./my-project -o license-evidence.json

# With custom configuration and verbose output
osslili ./src --config config.yaml --verbose

# Debug mode for detailed logging
osslili ./project --debug

Example Output

{
  "scan_results": [{
    "path": "./project",
    "license_evidence": [
      {
        "file": "/path/to/project/LICENSE",
        "detected_license": "Apache-2.0",
        "confidence": 0.988,
        "detection_method": "dice-sorensen",
        "category": "declared",
        "match_type": "text_similarity",
        "description": "Text matches Apache-2.0 license (98.8% similarity)"
      },
      {
        "file": "/path/to/project/package.json",
        "detected_license": "Apache-2.0",
        "confidence": 1.0,
        "detection_method": "tag",
        "category": "declared",
        "match_type": "spdx_identifier",
        "description": "SPDX-License-Identifier: Apache-2.0 found"
      }
    ],
    "copyright_evidence": [
      {
        "file": "/path/to/project/src/main.py",
        "holder": "Example Corp",
        "years": [2023, 2024],
        "statement": "Copyright 2023-2024 Example Corp"
      }
    ]
  }],
  "summary": {
    "total_files_scanned": 42,
    "declared_licenses": {"Apache-2.0": 2},
    "detected_licenses": {},
    "referenced_licenses": {},
    "copyright_holders": ["Example Corp"]
  }
}

Library Usage

from osslili import LicenseCopyrightDetector

# Initialize detector
detector = LicenseCopyrightDetector()

# Process a local directory
result = detector.process_local_path("/path/to/source")

# Process a single file  
result = detector.process_local_path("/path/to/LICENSE")

# Generate different output formats
evidence = detector.generate_evidence([result])
kissbom = detector.generate_kissbom([result])
cyclonedx = detector.generate_cyclonedx([result], format_type="json")
cyclonedx_xml = detector.generate_cyclonedx([result], format_type="xml")

# Access results directly
for license in result.licenses:
    print(f"License: {license.spdx_id} ({license.confidence:.0%} confidence)")
    print(f"  Category: {license.category}")  # declared, detected, or referenced
for copyright in result.copyrights:
    print(f"Copyright: © {copyright.holder}")

Output Format

The tool outputs JSON evidence showing:

  • File path: Where the license was found
  • Detected license: The SPDX identifier of the license
  • Confidence: How confident the detection is (0.0 to 1.0)
  • Match type: How the license was detected (license_text, spdx_identifier, license_reference, text_similarity)
  • Description: Human-readable description of what was found

Integration with SEMCL.ONE

OSS License & Copyright Detector is a core component of the SEMCL.ONE ecosystem:

  • Works with purl2notices for comprehensive legal notice generation
  • Integrates with ospac for policy-based compliance evaluation
  • Supports ossnotices for simplified attribution documentation
  • Complements upmex for package metadata extraction workflows

Configuration

Create a config.yaml file:

similarity_threshold: 0.97
max_recursion_depth: 10
max_extraction_depth: 10
thread_count: 4
cache_dir: "~/.cache/osslili"
custom_aliases:
  "Apache 2": "Apache-2.0"
  "MIT License": "MIT"

Documentation

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details on:

  • Code of conduct
  • Development setup
  • Submitting pull requests
  • Reporting issues

Support

For support and questions:

License

Apache License 2.0 - see LICENSE file for details.

Authors

See AUTHORS.md for a list of contributors.


Part of the SEMCL.ONE ecosystem for comprehensive OSS compliance and code analysis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osslili-1.5.9.tar.gz (382.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osslili-1.5.9-py3-none-any.whl (387.1 kB view details)

Uploaded Python 3

File details

Details for the file osslili-1.5.9.tar.gz.

File metadata

  • Download URL: osslili-1.5.9.tar.gz
  • Upload date:
  • Size: 382.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osslili-1.5.9.tar.gz
Algorithm Hash digest
SHA256 8b3a9b01de7630db9e65a7b8591cce9fa144ba3361829b2054ecc542db40b8af
MD5 40b261cc1d1baa2101a9233579b92a54
BLAKE2b-256 54ed40eba71e9c9fe50edca188d221f34c3881a233784ac5df5bcb61896f6e78

See more details on using hashes here.

Provenance

The following attestation bundles were made for osslili-1.5.9.tar.gz:

Publisher: python-publish.yml on SemClone/osslili

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file osslili-1.5.9-py3-none-any.whl.

File metadata

  • Download URL: osslili-1.5.9-py3-none-any.whl
  • Upload date:
  • Size: 387.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osslili-1.5.9-py3-none-any.whl
Algorithm Hash digest
SHA256 233ec383335310374994ed1ecb01670efde498b85c18a9d68fdf39e8f90d9f57
MD5 3a6a2db63ae9a1d036c61f9ba69c4d95
BLAKE2b-256 90f7516cec7d767834b9f2f266c0b76713e0c1ccd56dc9358f1b9e5764648873

See more details on using hashes here.

Provenance

The following attestation bundles were made for osslili-1.5.9-py3-none-any.whl:

Publisher: python-publish.yml on SemClone/osslili

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page