Skip to main content

Created as a part of the 2023 Google Summer of Code project: Reducing Fossology's False Positive Copyrights, the purpose is to be able to predict whether a given copyright output from the Fossology software is a false positive or not. It is also able to remove extra text from a copyright notice.

Project description

Safaa

Safaa is a Python package designed for handling false positive detection in copyright notices. Additionally, it can declutter copyright notices, removing unnecessary extra text.

Features

  • Load pre-trained models or train your own.
  • Integration with scikit-learn for training and prediction.
  • Integrated with spaCy for named entity recognition and decluttering tasks.
  • Preprocessing tools to ensure data consistency and quality.
  • Ability to handle local or default model directories.

Installation

To install Safaa, simply use pip:

pip install safaa

Usage

Initialization

from safaa.Safaa import *
agent = SafaaAgent()

Preprocessing Data

data = ["Your raw data here"]
preprocessed_data = agent.preprocess_data(data)

Predicting False Positives

predictions = agent.predict(data)

Decluttering Copyright Notices

decluttered_data = agent.declutter(data, predictions)

Training Models

To train the false positive detector:

training_data = ["Your training data here"]
labels = ["Your labels here"]
agent.train_false_positive_detector_model(training_data, labels)

To train the named entity recognition model:

train_path = "path/to/train.spacy"
dev_path = "path/to/dev.spacy"
agent.train_ner_model(train_path, dev_path)

Saving Trained Models

save_path = "path/to/save"
agent.save(save_path)

Dependencies

  • scikit-learn
  • spaCy
  • joblib
  • regex
  • os
  • shutil

License

This project is licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 2.1, February 1999.

Contact Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safaa-0.0.1.tar.gz (13.6 MB view hashes)

Uploaded Source

Built Distribution

safaa-0.0.1-py3-none-any.whl (13.6 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page