Skip to main content

In English

Project description

BadD(etector)

Description (How it can help)

BadD(etector) was created for detecting bad things in user-generated content in Russian. Now this library supports obscene words detection, advertising detection and toxicity detection. All the magic done by neural networks.

Requirements

  1. Python 3.7+
  2. PyTorch 1.8.1
  3. Gensim 3.8.1
  4. NLTK 3.2.5
  5. pymorphy2 0.9.1
  6. emoji 1.2.0

How to install

locally (dev mode)

python3 -m pip install -e <path-to-lib>

from github

pip install git+https://github.com/wksmirnowa/badd.git@master

from pip

pip install badd

Usage

Download files and models for:

Obscene words detection

Import the ObsceneDetector class

import torch
from badd import ObsceneDetector

Set pathes to files and device

# path to vocab
vocab_path = "obscene_vocab.json"
# path to embeddings
fasttext_path = "obscene_embeddings.pickle"
# path to model 
model_path = "obscene_model_cpu.pth"
# set device
device = torch.device('cpu')

Use ObsceneDetector

obscene_detector = ObsceneDetector(vocab_path, fasttext_path, model_path, device)

Predict every word in text

obscene_detector.predict_text(text)

Predict probability for every word in text

obscene_detector.predict_probability(text)

Check whether any obscene word is in text

obscene_detector.obscene_in_text(text)

Attributes

  • obscene_detector.obscene_words list of found obscene words. Available after one of the methods (predict_probability, predict_text, obscene_in_text) was runned.

Ad detection

Import the AdDetector class

import torch
from badd import AdDetector

Set pathes to files and device

# path to vocab
vocab_path = "ad_vocab.json"
# path to embeddings
fasttext_path = "ad_embeddings.pickle"
# path to model 
model_path = "ad_model_cpu.pth"
# set device
device = torch.device('cpu')

Use AdDetector

ad_detector = AdDetector(vocab_path, fasttext_path, model_path, device)

Predict text

ad_detector.predict_text(text)

Predict probability for text

ad_detector.predict_probability(text)

Check whether a text is ad

ad_detector.is_ad(text)

Toxic texts detection

Import the ToxicDetector class

import torch
from badd import ToxicDetector

Set pathes to files and device

# path to vocab
vocab_path = "toxic_vocab.json"
# path to embeddings
fasttext_path = "toxic_embeddings.pickle"
# path to model 
model_path = "toxic_model_cpu.pth"
# set device
device = torch.device('cpu')

Use AdDetector

toxic_detector = ToxicDetector(vocab_path, fasttext_path, model_path, device)

Predict text

toxic_detector.predict_text(text)

Predict probability for text

toxic_detector.predict_probability(text)

Check whether a text is toxic

toxic_detector.is_toxic(text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

badd-1.0.32.tar.gz (6.9 kB view details)

Uploaded Source

File details

Details for the file badd-1.0.32.tar.gz.

File metadata

  • Download URL: badd-1.0.32.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.4

File hashes

Hashes for badd-1.0.32.tar.gz
Algorithm Hash digest
SHA256 96692f007c6cea26e92833f067176eafe1537b25a7ff53ed75d079806c16750f
MD5 6b0f977a34dde9a84defa056c9a79a06
BLAKE2b-256 bba35ad6e86dfed3b314a3232948ece201ba512168336444c37a428a5004cce5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page