Skip to main content

Open Source License Identification Library

Project description

OSLiLi - Open Source License Identification Library

Open Source License Identification Library is an experimental code, that use Scikit-learn to implement a Multinomial Naive Bayes classifier trained with SPDX data to identify Open Source Licenses. This should be consider as a proof of concept for identify Open Source licenses using Machine Learning.

This is an experimental project, please don't use it for production. For a more robust implementation, please check the project Askalono https://github.com/jpeddicord/askalono

Usage

On the command line

You can use OSLiLi in your terminal as command line, please install the oslili-cli package:

$ pip3 install oslili-cli
$ oslili-cli LICENSE
License: MIT (0.89 probability)
Copyright: ('2021', '(c)  Andrew Barrier')

As a library

In order to use the library, you need to import and use identify_license or identify_copyright.

import argparse
from oslili import LicenseAndCopyrightIdentifier


def main():
    msg = 'Identify open source license and copyright statements'
    parser = argparse.ArgumentParser(description=msg)
    parser.add_argument('file_path', help='Path to the file to analyze')
    args = parser.parse_args()
    file_path = args.file_path

    with open(args.file_path, 'r') as f:
        text = f.read()

    identifier = LicenseAndCopyrightIdentifier()
    license_spdx_code, license_proba = identifier.identify_license(text)
    print(f'License: {license_spdx_code} ({license_proba:.2f} probability)')
    year_range, statement = identifier.identify_copyright(text)
    if statement:
        if None not in statement:
            print(f'Copyright: {statement}')


if __name__ == '__main__':
    main()

Notice

This tool does not provide legal advice; I'm not a lawyer.

The code is an experimental implementation to match your input to a database of similar license texts and tell you if it's a close match. Refrain from relying on the accuracy of the output of this tool.

Remember: The tool can't tell you if a license works for your project or use case. Please should seek independent legal advice for any licensing questions.

Where do the licenses come from?

License SPDX dataset is sourced directly from SPDX: https://github.com/spdx/license-list-data.

Datasets for ML training were generated scanning different sources, and inspired by two academic publications:

Contributing

Contributions are very welcome! See CONTRIBUTING for more info.

License

This library is licensed under the Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oslili-0.15.tar.gz (911.6 kB view details)

Uploaded Source

Built Distribution

oslili-0.15-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file oslili-0.15.tar.gz.

File metadata

  • Download URL: oslili-0.15.tar.gz
  • Upload date:
  • Size: 911.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for oslili-0.15.tar.gz
Algorithm Hash digest
SHA256 a8165575bcc618da7dbc4353871c2b70264c4dd73cc055569608cf9a17134248
MD5 661c587d7897e6de1b7f4373f124a5f7
BLAKE2b-256 2eb60afdd6f9881359937116b00475859113407e957346daa5281ecdc38a0742

See more details on using hashes here.

File details

Details for the file oslili-0.15-py3-none-any.whl.

File metadata

  • Download URL: oslili-0.15-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for oslili-0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 d1d5899c755ac6f1c19dbc4044be2f3ca4cdc5c6bd10d390a784f4156ff232ef
MD5 fea9e7e7373f4585e38cc938a42e8745
BLAKE2b-256 8da270bb3fef060675b0dc8f079524e667cbdde21b4d603b10aeeba59433d291

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page