In English
Project description
BadD(etector)
Description (How it can help)
BadD(etector) was created for detecting bad things in user-generated content in Russian. Now this library supports obscene words detection, advertising detection and toxicity detection. All the magic done by neural networks.
Requirements
- Python 3.7+
- PyTorch 1.8.1
- Gensim 3.8.1
- NLTK 3.2.5
- pymorphy2 0.9.1
- emoji 1.2.0
How to install
locally (dev mode)
python3 -m pip install -e <path-to-lib>
from github
pip install git+https://github.com/wksmirnowa/badd.git@master
from pip
pip install badd
Usage
Download files and models for:
Obscene words detection
Import the ObsceneDetector class
import torch
from badd import ObsceneDetector
Set pathes to files and device
# path to vocab
vocab_path = "obscene_vocab.json"
# path to embeddings
fasttext_path = "obscene_embeddings.pickle"
# path to model
model_path = "obscene_model_cpu.pth"
# set device
device = torch.device('cpu')
Use ObsceneDetector
obscene_detector = ObsceneDetector(vocab_path, fasttext_path, model_path, device)
Predict every word in text
obscene_detector.predict_text(text)
Predict probability for every word in text
obscene_detector.predict_probability(text)
Check whether any obscene word is in text
obscene_detector.obscene_in_text(text)
Attributes
obscene_detector.obscene_words
list of found obscene words. Available after one of the methods (predict_probability
,predict_text
,obscene_in_text
) was runned.
Ad detection
Import the AdDetector class
import torch
from badd import AdDetector
Set pathes to files and device
# path to vocab
vocab_path = "ad_vocab.json"
# path to embeddings
fasttext_path = "ad_embeddings.pickle"
# path to model
model_path = "ad_model_cpu.pth"
# set device
device = torch.device('cpu')
Use AdDetector
ad_detector = AdDetector(vocab_path, fasttext_path, model_path, device)
Predict text
ad_detector.predict_text(text)
Predict probability for text
ad_detector.predict_probability(text)
Check whether a text is ad
ad_detector.is_ad(text)
Toxic texts detection
Import the ToxicDetector class
import torch
from badd import ToxicDetector
Set pathes to files and device
# path to vocab
vocab_path = "toxic_vocab.json"
# path to embeddings
fasttext_path = "toxic_embeddings.pickle"
# path to model
model_path = "toxic_model_cpu.pth"
# set device
device = torch.device('cpu')
Use AdDetector
toxic_detector = ToxicDetector(vocab_path, fasttext_path, model_path, device)
Predict text
toxic_detector.predict_text(text)
Predict probability for text
toxic_detector.predict_probability(text)
Check whether a text is toxic
toxic_detector.is_toxic(text)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file badd-1.0.32.tar.gz
.
File metadata
- Download URL: badd-1.0.32.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96692f007c6cea26e92833f067176eafe1537b25a7ff53ed75d079806c16750f |
|
MD5 | 6b0f977a34dde9a84defa056c9a79a06 |
|
BLAKE2b-256 | bba35ad6e86dfed3b314a3232948ece201ba512168336444c37a428a5004cce5 |