Skip to main content

In English

Project description

BadD(etector)

Description (How it can help)

BadD(etector) was created for detecting bad things in user-generated content in Russian. Now this library supports obscene words detection, advertising detection and toxicity detection. All the magic done by neural networks.

Requirements

  1. Python 3.7+
  2. PyTorch 1.8.1
  3. Gensim 3.8.1
  4. NLTK 3.2.5
  5. pymorphy2 0.9.1
  6. emoji 1.2.0

How to install

locally (dev mode)

python3 -m pip install -e <path-to-lib>

from github

pip install git+https://github.com/wksmirnowa/badd.git@master

from pip

pip install badd

Usage

Download files and models for:

Obscene words detection

Import the ObsceneDetector class

import torch
from badd import ObsceneDetector

Set pathes to files and device

# path to vocab
vocab_path = "obscene_vocab.json"
# path to embeddings
fasttext_path = "obscene_embeddings.pickle"
# path to model 
model_path = "obscene_model_cpu.pth"
# set device
device = torch.device('cpu')

Use ObsceneDetector

obscene_detector = ObsceneDetector(vocab_path, fasttext_path, model_path, device)

Predict every word in text

obscene_detector.predict_text(text)

Predict probability for every word in text

obscene_detector.predict_probability(text)

Check whether any obscene word is in text

obscene_detector.obscene_in_text(text)

Attributes

  • obscene_detector.obscene_words list of found obscene words. Available after one of the methods (predict_probability, predict_text, obscene_in_text) was runned.

Ad detection

Import the AdDetector class

import torch
from badd import AdDetector

Set pathes to files and device

# path to vocab
vocab_path = "ad_vocab.json"
# path to embeddings
fasttext_path = "ad_embeddings.pickle"
# path to model 
model_path = "ad_model_cpu.pth"
# set device
device = torch.device('cpu')

Use AdDetector

ad_detector = AdDetector(vocab_path, fasttext_path, model_path, device)

Predict text

ad_detector.predict_text(text)

Predict probability for text

ad_detector.predict_probability(text)

Check whether a text is ad

ad_detector.is_ad(text)

Toxic texts detection

Import the ToxicDetector class

import torch
from badd import ToxicDetector

Set pathes to files and device

# path to vocab
vocab_path = "toxic_vocab.json"
# path to embeddings
fasttext_path = "toxic_embeddings.pickle"
# path to model 
model_path = "toxic_model_cpu.pth"
# set device
device = torch.device('cpu')

Use AdDetector

toxic_detector = ToxicDetector(vocab_path, fasttext_path, model_path, device)

Predict text

toxic_detector.predict_text(text)

Predict probability for text

toxic_detector.predict_probability(text)

Check whether a text is toxic

toxic_detector.is_toxic(text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

badd-1.0.32.tar.gz (6.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page