Created as a part of the 2023 Google Summer of Code project: Reducing Fossology's False Positive Copyrights, the purpose is to be able to predict whether a given copyright output from the Fossology software is a false positive or not. It is also able to remove extra text from a copyright notice.
Project description
Safaa
Safaa is a Python package designed for handling false positive detection in copyright notices. Additionally, it can declutter copyright notices, removing unnecessary extra text.
Features
- Load pre-trained models or train your own.
- Integration with scikit-learn for training and prediction.
- Integrated with spaCy for named entity recognition and decluttering tasks.
- Preprocessing tools to ensure data consistency and quality.
- Ability to handle local or default model directories.
Installation
To install Safaa, simply use pip:
pip install safaa
Usage
Initialization
from safaa.Safaa import *
agent = SafaaAgent()
Preprocessing Data
data = ["Your raw data here"]
preprocessed_data = agent.preprocess_data(data)
Predicting False Positives
predictions = agent.predict(data)
Decluttering Copyright Notices
decluttered_data = agent.declutter(data, predictions)
Training Models
To train the false positive detector:
training_data = ["Your training data here"]
labels = ["Your labels here"]
agent.train_false_positive_detector_model(training_data, labels)
To train the named entity recognition model:
train_path = "path/to/train.spacy"
dev_path = "path/to/dev.spacy"
agent.train_ner_model(train_path, dev_path)
Saving Trained Models
save_path = "path/to/save"
agent.save(save_path)
Dependencies
- scikit-learn
- spaCy
- joblib
- regex
- os
- shutil
License
This project is licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 2.1, February 1999.
Contact Information
- Name: Abdelrahman Jamal
- Email: abdelrahmanjamal5565@gmail.com
- LinkedIn: linkedin.com/in/abdelrahmanjamal
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file safaa-0.0.1.tar.gz
.
File metadata
- Download URL: safaa-0.0.1.tar.gz
- Upload date:
- Size: 13.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63d9aec56ca81d35d2f539ab9a0f026c18d0fca5586cccc962f9845d3bc3f358 |
|
MD5 | adaff1cb59ca37d996d86a6c6ef56eb7 |
|
BLAKE2b-256 | 197f579581d389da3951979afe4e0203c184970bd8f09b7b4dcac05a09c3dcea |
File details
Details for the file safaa-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: safaa-0.0.1-py3-none-any.whl
- Upload date:
- Size: 13.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12a3dfd881af4d34f606fe47d9b2ed3e627f84e7ecbb7907c976277ba97531ee |
|
MD5 | 39de7c19f6ca04dca8c2dfab8e8f1c88 |
|
BLAKE2b-256 | 8e9dc74996ba4108f6f06406f9ba337ce9cb234382a9f1c8382def55f4f484af |