Skip to main content

Created as a part of the 2023 Google Summer of Code project: Reducing Fossology's False Positive Copyrights, the purpose is to be able to predict whether a given copyright output from the Fossology software is a false positive or not. It is also able to remove extra text from a copyright notice.

Project description

Safaa

Safaa is a Python package designed for handling false positive detection in copyright notices. Additionally, it can declutter copyright notices, removing unnecessary extra text.

Features

  • Load pre-trained models or train your own.
  • Integration with scikit-learn for training and prediction.
  • Integrated with spaCy for named entity recognition and decluttering tasks.
  • Preprocessing tools to ensure data consistency and quality.
  • Ability to handle local or default model directories.

Installation

To install Safaa, simply use pip:

pip install safaa

Usage

Initialization

from safaa.Safaa import *
agent = SafaaAgent()

Preprocessing Data

data = ["Your raw data here"]
preprocessed_data = agent.preprocess_data(data)

Predicting False Positives

predictions = agent.predict(data)

Decluttering Copyright Notices

decluttered_data = agent.declutter(data, predictions)

Training Models

To train the false positive detector:

training_data = ["Your training data here"]
labels = ["Your labels here"]
agent.train_false_positive_detector_model(training_data, labels)

To train the named entity recognition model:

train_path = "path/to/train.spacy"
dev_path = "path/to/dev.spacy"
agent.train_ner_model(train_path, dev_path)

Saving Trained Models

save_path = "path/to/save"
agent.save(save_path)

Dependencies

  • scikit-learn
  • spaCy
  • joblib
  • regex
  • os
  • shutil

License

This project is licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 2.1, February 1999.

Contact Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safaa-0.0.1.tar.gz (13.6 MB view details)

Uploaded Source

Built Distribution

safaa-0.0.1-py3-none-any.whl (13.6 MB view details)

Uploaded Python 3

File details

Details for the file safaa-0.0.1.tar.gz.

File metadata

  • Download URL: safaa-0.0.1.tar.gz
  • Upload date:
  • Size: 13.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for safaa-0.0.1.tar.gz
Algorithm Hash digest
SHA256 63d9aec56ca81d35d2f539ab9a0f026c18d0fca5586cccc962f9845d3bc3f358
MD5 adaff1cb59ca37d996d86a6c6ef56eb7
BLAKE2b-256 197f579581d389da3951979afe4e0203c184970bd8f09b7b4dcac05a09c3dcea

See more details on using hashes here.

File details

Details for the file safaa-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: safaa-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for safaa-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 12a3dfd881af4d34f606fe47d9b2ed3e627f84e7ecbb7907c976277ba97531ee
MD5 39de7c19f6ca04dca8c2dfab8e8f1c88
BLAKE2b-256 8e9dc74996ba4108f6f06406f9ba337ce9cb234382a9f1c8382def55f4f484af

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page