Skip to main content

A Python based License Classification and Copyright Statement Detection tool based on Google License Classifier

Project description

Golicense-Classifier

A Python based module to find valid copyright and license expressions in a file.

Note: This module is based on Google LicenseClassifier.

Installation

Currently, this package only supports Linux Platform. Work is in progress for Windows and Mac.

To install from Pypi, use

pip install golicense-classifier

Usage

To get started, import LicenseClassifier class from the module as

from LicenseClassifier.classifier import LicenseClassifier

Note: Work on Copyright Statement is still in progress. Expect some issues, mostly with binary files

The class comes bundled with several functions for scanning purpose.

  1. scan_directory

    This method is used to recursively walk through a directory and find license expressions and copyright statements. It returns a dictionary object with keys header and files.

    Usage


    classifier = LicenseClassifier()
    res = classifier.scan_directory('PATH_TO_DIR')
    

    Optional Parameters


    • max_size

      Maximum size of file in MB. Default is set to 10MB. Set max_size < 0 to ignore size constraints

    • use_buffer

      (Experimental) Set True to use buffered file scanning. max_size will be used as buffer size.

  2. scan_file

    This method is used to find license expressions and copyright statements on a single file.

    Usage


    classifier = LicenseClassifier()
    res = classifier.scan_file('PATH_TO_FILE')
    

    Optional Parameters


    • max_size

      Maximum size of file in MB. Default is set to 10MB. Set max_size < 0 to ignore size constraints

    • use_buffer

      (Experimental) Set True to use buffered file scanning. max_size will be used as buffer size.

Setting Custom Scanning Threshold

You can set custom threshold for scanning purpose that best suits your need. For this, you can use parameter threshold while making object as

classifier = LicenseClassifier(threshold = 0.9)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

golicense_classifier-0.0.15.tar.gz (2.1 MB view hashes)

Uploaded Source

Built Distribution

golicense_classifier-0.0.15-py3-none-any.whl (2.4 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page