A Python based License Classification and Copyright Statement Detection tool based on Google License Classifier
Project description
GoLicense-Classifier
A Python package to find license expressions and copyright statements in a codebase.
Based on Google LicenseClassifer V2, GoLicense-Classifier (or glc for short) focuses on performance without compromising with accuracy.
Installation
Note: Currently, this package only supports Linux Platform. Work is in progress for Windows and Mac.
Installing GoLicense-Classifier is as simple as
pip install golicense-classifier
Or, you can build the package from source as
git clone https://github.com/AvishrantsSh/GoLicense-Classifier.git
make dev
make package
Usage
To get started, import LicenseClassifier class from the module as
from LicenseClassifier.classifier import LicenseClassifier
Note: Work on Copyright Statement is still in beta phase. Expect some issues, mostly with binary files
The class comes bundled with some handy functions, each suited for a different task.
-
scan_directoryThis method is used to recursively walk through a directory and find license expressions and copyright statements. It returns a dictionary object with keys
headerandfiles.Usage
classifier = LicenseClassifier() res = classifier.scan_directory('PATH_TO_DIR')
Optional Parameters
-
max_sizeMaximum size of file in MB. Default is set to 10MB. Set
max_size < 0to ignore size constraints -
use_buffer(Experimental)Set toTrueto use buffered file scanning.max_sizewill be used as buffer size. -
use_scancode_mappingSet to
Trueif you want to use Scancode license key mappings. Default is set toTrue.
-
-
scan_fileThis method is used to find license expressions and copyright statements in a single file.
Usage
classifier = LicenseClassifier() res = classifier.scan_file('PATH_TO_FILE')
Optional Parameters
-
max_sizeMaximum size of file in MB. Default is set to 10MB. Set
max_size < 0to ignore size constraints -
use_buffer(Experimental)Set toTrueto use buffered file scanning.max_sizewill be used as buffer size. -
use_scancode_mappingSet to
Trueif you want to use Scancode license key mappings. Default is set toTrue.
-
Further Customization
You can set custom threshold for scanning purpose that best suits your need. Simply change the parameter threshold during object creation as
classifier = LicenseClassifier(threshold = 0.9)
Contributing
Contributions are what makes the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
To get started, read the Contributing Guide.
References
-
Google LicenseClassfifer V2 https://github.com/google/licenseclassifier/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file golicense_classifier-0.0.16.tar.gz.
File metadata
- Download URL: golicense_classifier-0.0.16.tar.gz
- Upload date:
- Size: 2.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5f3f51291ae9e5c9974a1f44b13de20272619dd3b8a46b05fc170b73ec5d423
|
|
| MD5 |
961408d1ac0e5588c65096db3224b58f
|
|
| BLAKE2b-256 |
0dc0d427d2a59462ec837bdf6c4ad3a7606f066c34eda9b5a334e58fa2dfdf44
|
File details
Details for the file golicense_classifier-0.0.16-py3-none-any.whl.
File metadata
- Download URL: golicense_classifier-0.0.16-py3-none-any.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52afd6b96e95b83e186ce15d38ffff79d38b32a5ddc690eab01a134b06553791
|
|
| MD5 |
95cf8c6fdfa17ac7263cd826f46c39bf
|
|
| BLAKE2b-256 |
db069aef911a93798466467285f848d4f5a90fb53178b752ad660957207002ac
|