Skip to main content

Reduces Fossology's false positive copyrights by predicting whether a given copyright output is a false positive and removing extraneous text from copyright notices.

Project description

Safaa

Safaa is a Python package designed for handling false positive detection in copyright notices. Additionally, it can declutter copyright notices, removing unnecessary extra text.

Features

  • Load pre-trained models or train your own.
  • Integration with scikit-learn for training and prediction.
  • Integrated with spaCy for named entity recognition and decluttering tasks.
  • Preprocessing tools to ensure data consistency and quality.
  • Ability to handle local or default model directories.

Installation

To install Safaa, simply use pip:

pip install safaa

Usage

Initialization

from safaa.Safaa import *
agent = SafaaAgent()

Preprocessing Data

data = ["Your raw data here"]
preprocessed_data = agent.preprocess_data(data)

Predicting False Positives

predictions = agent.predict(data)

Decluttering Copyright Notices

decluttered_data = agent.declutter(data, predictions)

Training Models

To train the false positive detector:

training_data = ["Your training data here"]
labels = ["Your labels here"]
agent.train_false_positive_detector_model(training_data, labels)

To train the named entity recognition model:

train_path = "path/to/train.spacy"
dev_path = "path/to/dev.spacy"
agent.train_ner_model(train_path, dev_path)

Saving Trained Models

save_path = "path/to/save"
agent.save(save_path)

Dependencies

  • scikit-learn
  • spaCy
  • joblib
  • regex
  • os
  • shutil

License

This project is licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 2.1, February 1999.

Contact Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safaa-0.0.4.tar.gz (13.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safaa-0.0.4-py3-none-any.whl (13.8 MB view details)

Uploaded Python 3

File details

Details for the file safaa-0.0.4.tar.gz.

File metadata

  • Download URL: safaa-0.0.4.tar.gz
  • Upload date:
  • Size: 13.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for safaa-0.0.4.tar.gz
Algorithm Hash digest
SHA256 f06332cb8d417cbd92e093b71481fbebc7aaa283c2079d3b49569446251030fb
MD5 bc53424f1d350b8d0a54fb4f10210b55
BLAKE2b-256 b655231902d153cb59da059227c4ef1883939ddc767a5845c4bcf674f37a0985

See more details on using hashes here.

Provenance

The following attestation bundles were made for safaa-0.0.4.tar.gz:

Publisher: release-publish.yml on fossology/safaa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file safaa-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: safaa-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 13.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for safaa-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3b3ce5fee590926a4d0fd782d96a995907c0f0734837a08f8642005013220948
MD5 3ea5674c9c1d152671a8abae15aee8f2
BLAKE2b-256 463176e339ca346b55e770fc73b3deab456ada70b45ec55f8b4262e948b315e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for safaa-0.0.4-py3-none-any.whl:

Publisher: release-publish.yml on fossology/safaa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page