A fast, robust library to check for offensive language in strings.
A fast, robust Python library to check for profanity or offensive language in strings.
How It Works
profanity-check uses a linear SVM model trained on 200k human-labeled samples of clean and profane text strings. Its model is simple but surprisingly effective, meaning
profanity-check is both robust and extremely performant.
Why Use profanity-check?
Many profanity detection libraries use a hard-coded list of bad words to detect and filter profanity. For example, profanity uses this wordlist, and even better-profanity still uses a wordlist. There are obviously glaring issues with this approach, and, while they might be performant, these libraries are not accurate at all.
Other libraries like profanity-filter use more sophisticated methods that are much more accurate but at the cost of performance. A benchmark (performed December 2018 on a new 2018 Macbook Pro) using a Kaggle dataset of Wikipedia comments yielded roughly the following results:
|Package||1 Prediction (ms)||10 Predictions (ms)||100 Predictions (ms)|
profanity-check is anywhere from 300 - 4000 times faster than
profanity-filter in this benchmark!
$ pip install profanity-check
from profanity_check import predict, predict_prob predict(['predict() takes an array and returns a 1 for each string if it's offensive, else 0.']) #  predict(['fuck you']) #  predict_prob(['predict_prob() takes an array and returns the probability each string is offensive']) # [0.08686173] predict_prob(['go to hell, you scum']) # [0.7618861]
Note that both
More on How It Works
Special thanks to the authors of the datasets used in this project.
profanity-check was trained on a combined dataset from 2 sources:
- t-davidson/hate-speech-and-offensive-language, used in their paper Automated Hate Speech Detection and the Problem of Offensive Language
- the Toxic Comment Classification Challenge on Kaggle.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size profanity_check-1.0.0-py3-none-any.whl (2.4 MB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size profanity-check-1.0.0.tar.gz (2.4 MB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for profanity_check-1.0.0-py3-none-any.whl