Skip to main content

Created as a part of the 2023 Google Summer of Code project: Reducing Fossology's False Positive Copyrights, the purpose is to be able to predict whether a given copyright output from the Fossology software is a false positive or not. It is also able to remove extra text from a copyright notice.

Project description

Safaa

Safaa is a Python package designed for handling false positive detection in copyright notices. Additionally, it can declutter copyright notices, removing unnecessary extra text.

Features

  • Load pre-trained models or train your own.
  • Integration with scikit-learn for training and prediction.
  • Integrated with spaCy for named entity recognition and decluttering tasks.
  • Preprocessing tools to ensure data consistency and quality.
  • Ability to handle local or default model directories.

Installation

To install Safaa, simply use pip:

pip install safaa

Usage

Initialization

from safaa.Safaa import *
agent = SafaaAgent()

Preprocessing Data

data = ["Your raw data here"]
preprocessed_data = agent.preprocess_data(data)

Predicting False Positives

predictions = agent.predict(data)

Decluttering Copyright Notices

decluttered_data = agent.declutter(data, predictions)

Training Models

To train the false positive detector:

training_data = ["Your training data here"]
labels = ["Your labels here"]
agent.train_false_positive_detector_model(training_data, labels)

To train the named entity recognition model:

train_path = "path/to/train.spacy"
dev_path = "path/to/dev.spacy"
agent.train_ner_model(train_path, dev_path)

Saving Trained Models

save_path = "path/to/save"
agent.save(save_path)

Dependencies

  • scikit-learn
  • spaCy
  • joblib
  • regex
  • os
  • shutil

License

This project is licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 2.1, February 1999.

Contact Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safaa-0.0.2.tar.gz (13.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safaa-0.0.2-py3-none-any.whl (13.8 MB view details)

Uploaded Python 3

File details

Details for the file safaa-0.0.2.tar.gz.

File metadata

  • Download URL: safaa-0.0.2.tar.gz
  • Upload date:
  • Size: 13.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for safaa-0.0.2.tar.gz
Algorithm Hash digest
SHA256 8bb75a419d6060f51403643f0ac4f6ad837eb71d8cd7bf2571c06bd4fbc2239d
MD5 80cc9b7fd3afe840418d256d7dcd3701
BLAKE2b-256 8bd3b75ef7d70c865b65ee3d6cbfa8fc9bff944a260dba990a04a9142ea0a47d

See more details on using hashes here.

Provenance

The following attestation bundles were made for safaa-0.0.2.tar.gz:

Publisher: release-publish.yml on fossology/safaa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file safaa-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: safaa-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 13.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for safaa-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9246927a11927fe73f49a915bde8f840cbfaa2fd063de07ed07d664ae0b18e8b
MD5 5f4565fb8505ca6ecf1d511b2341dc1d
BLAKE2b-256 3b3415b5a1f8893e8dd878abc0be5ceac8f2a0aba3d6fd3950ea1040138444c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for safaa-0.0.2-py3-none-any.whl:

Publisher: release-publish.yml on fossology/safaa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page