Skip to main content

A library to detect undesired, unbranded, or harmful content

Project description

unwanted_content_detector

A library to detect undesired, unbranded, or harmful content

Usage

In python:

pip install unwanted-content-detector

Minimal

from unwanted_content_detector import Detector
detector = Detector(models=['hatefult_content_generic_distil_bert_finetuned'])
if detector.is_unwanted('content generated by llm'):
    print("Wont continue")

With spark

spark_df.with_column('is_rejected', lambda row: detector.is_unwanted)

In the terminal

./cli.py inference infer 'text to be validated'

Training

Fine tunning

from unwanted_content_detector import Detector
model = Detector({'data_source': df}).train()
./cli.py train

Target Architecture / Features

  • multiple Swappable models
  • multiple evaluation datasets
  • possibility of configuring a custom personal dataset to fine tune
  • Single performance evaluation criteria

Use cases it could be applied to

  • detecting the generation of harmful content from LLMs
  • preventing harmful prompts to be injected into LLMs
  • using it as a validator of content being generated according to the brand guidelines

Liability

This tool aims to help you to detect harmful content but it is not meant to be used as the final decision maker alone.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unwanted_content_detector-0.1.0.tar.gz (3.8 kB view hashes)

Uploaded Source

Built Distribution

unwanted_content_detector-0.1.0-py3-none-any.whl (6.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page