Reduces Fossology's false positive copyrights by predicting whether a given copyright output is a false positive and removing extraneous text from copyright notices.
Project description
Safaa
Safaa is a Python package designed for handling false positive detection in copyright notices. Additionally, it can declutter copyright notices, removing unnecessary extra text.
Features
- Load pre-trained models or train your own.
- Integration with scikit-learn for training and prediction.
- Integrated with spaCy for named entity recognition and decluttering tasks.
- Preprocessing tools to ensure data consistency and quality.
- Ability to handle local or default model directories.
Installation
To install Safaa, simply use pip:
pip install safaa
Usage
Initialization
from safaa.Safaa import *
agent = SafaaAgent()
Preprocessing Data
data = ["Your raw data here"]
preprocessed_data = agent.preprocess_data(data)
Predicting False Positives
predictions = agent.predict(data)
Decluttering Copyright Notices
decluttered_data = agent.declutter(data, predictions)
Training Models
To train the false positive detector:
training_data = ["Your training data here"]
labels = ["Your labels here"]
agent.train_false_positive_detector_model(training_data, labels)
To train the named entity recognition model:
train_path = "path/to/train.spacy"
dev_path = "path/to/dev.spacy"
agent.train_ner_model(train_path, dev_path)
Saving Trained Models
save_path = "path/to/save"
agent.save(save_path)
Dependencies
- scikit-learn
- spaCy
- joblib
- regex
- os
- shutil
License
This project is licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 2.1, February 1999.
Contact Information
- Name: Abdelrahman Jamal
- Email: abdelrahmanjamal5565@gmail.com
- LinkedIn: linkedin.com/in/abdelrahmanjamal
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file safaa-0.0.4.tar.gz.
File metadata
- Download URL: safaa-0.0.4.tar.gz
- Upload date:
- Size: 13.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f06332cb8d417cbd92e093b71481fbebc7aaa283c2079d3b49569446251030fb
|
|
| MD5 |
bc53424f1d350b8d0a54fb4f10210b55
|
|
| BLAKE2b-256 |
b655231902d153cb59da059227c4ef1883939ddc767a5845c4bcf674f37a0985
|
Provenance
The following attestation bundles were made for safaa-0.0.4.tar.gz:
Publisher:
release-publish.yml on fossology/safaa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
safaa-0.0.4.tar.gz -
Subject digest:
f06332cb8d417cbd92e093b71481fbebc7aaa283c2079d3b49569446251030fb - Sigstore transparency entry: 1517462225
- Sigstore integration time:
-
Permalink:
fossology/safaa@54f2ff38922f5ed7fa7cd296fcc16635913cafd7 -
Branch / Tag:
refs/tags/0.0.4 - Owner: https://github.com/fossology
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-publish.yml@54f2ff38922f5ed7fa7cd296fcc16635913cafd7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file safaa-0.0.4-py3-none-any.whl.
File metadata
- Download URL: safaa-0.0.4-py3-none-any.whl
- Upload date:
- Size: 13.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b3ce5fee590926a4d0fd782d96a995907c0f0734837a08f8642005013220948
|
|
| MD5 |
3ea5674c9c1d152671a8abae15aee8f2
|
|
| BLAKE2b-256 |
463176e339ca346b55e770fc73b3deab456ada70b45ec55f8b4262e948b315e0
|
Provenance
The following attestation bundles were made for safaa-0.0.4-py3-none-any.whl:
Publisher:
release-publish.yml on fossology/safaa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
safaa-0.0.4-py3-none-any.whl -
Subject digest:
3b3ce5fee590926a4d0fd782d96a995907c0f0734837a08f8642005013220948 - Sigstore transparency entry: 1517462250
- Sigstore integration time:
-
Permalink:
fossology/safaa@54f2ff38922f5ed7fa7cd296fcc16635913cafd7 -
Branch / Tag:
refs/tags/0.0.4 - Owner: https://github.com/fossology
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-publish.yml@54f2ff38922f5ed7fa7cd296fcc16635913cafd7 -
Trigger Event:
release
-
Statement type: