Skip to main content

Created as a part of the 2023 Google Summer of Code project: Reducing Fossology's False Positive Copyrights, the purpose is to be able to predict whether a given copyright output from the Fossology software is a false positive or not. It is also able to remove extra text from a copyright notice.

Project description

Safaa

Safaa is a Python package designed for handling false positive detection in copyright notices. Additionally, it can declutter copyright notices, removing unnecessary extra text.

Features

  • Load pre-trained models or train your own.
  • Integration with scikit-learn for training and prediction.
  • Integrated with spaCy for named entity recognition and decluttering tasks.
  • Preprocessing tools to ensure data consistency and quality.
  • Ability to handle local or default model directories.

Installation

To install Safaa, simply use pip:

pip install safaa

Usage

Initialization

from safaa.Safaa import *
agent = SafaaAgent()

Preprocessing Data

data = ["Your raw data here"]
preprocessed_data = agent.preprocess_data(data)

Predicting False Positives

predictions = agent.predict(data)

Decluttering Copyright Notices

decluttered_data = agent.declutter(data, predictions)

Training Models

To train the false positive detector:

training_data = ["Your training data here"]
labels = ["Your labels here"]
agent.train_false_positive_detector_model(training_data, labels)

To train the named entity recognition model:

train_path = "path/to/train.spacy"
dev_path = "path/to/dev.spacy"
agent.train_ner_model(train_path, dev_path)

Saving Trained Models

save_path = "path/to/save"
agent.save(save_path)

Dependencies

  • scikit-learn
  • spaCy
  • joblib
  • regex
  • os
  • shutil

License

This project is licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 2.1, February 1999.

Contact Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safaa-0.0.3.tar.gz (13.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safaa-0.0.3-py3-none-any.whl (13.8 MB view details)

Uploaded Python 3

File details

Details for the file safaa-0.0.3.tar.gz.

File metadata

  • Download URL: safaa-0.0.3.tar.gz
  • Upload date:
  • Size: 13.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for safaa-0.0.3.tar.gz
Algorithm Hash digest
SHA256 e4d6cf37541441042d8fe6169c16db2d507155fe3d7c84e4e0e20c44e37eeee9
MD5 b0959bf204b3d97bed439498948c8e76
BLAKE2b-256 32c010966ec49db6f0b89439f069f3b47c5e37f9c89aab770cf7a6edfec8d42f

See more details on using hashes here.

Provenance

The following attestation bundles were made for safaa-0.0.3.tar.gz:

Publisher: release-publish.yml on fossology/safaa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file safaa-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: safaa-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 13.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for safaa-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ef296850adee9a9ae7aee08d79f35f7293de675a2a7e2cc6ef2c08c9d79f4ff1
MD5 e63017cf8bdfb9b04d8a124edd8088b6
BLAKE2b-256 e23150afbea58f17dd9c5b0fd19b38ef534e55fe71ec480fdefe310d21413f75

See more details on using hashes here.

Provenance

The following attestation bundles were made for safaa-0.0.3-py3-none-any.whl:

Publisher: release-publish.yml on fossology/safaa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page