Skip to main content

A fast, robust library to check for offensive language in strings.

Project description

profanity-check

Build Status

A fast, robust Python library to check for profanity or offensive language in strings.

How It Works

profanity-check uses a linear SVM model trained on 200k human-labeled samples of clean and profane text strings. Its model is simple but surprisingly effective, meaning profanity-check is both robust and extremely performant.

Why Use profanity-check?

Many profanity detection libraries use a hard-coded list of bad words to detect and filter profanity. For example, profanity uses this wordlist, and even better-profanity still uses a wordlist. There are obviously glaring issues with this approach, and, while they might be performant, these libraries are not accurate at all.

Other libraries like profanity-filter use more sophisticated methods that are much more accurate but at the cost of performance. A benchmark (performed December 2018 on a new 2018 Macbook Pro) using a Kaggle dataset of Wikipedia comments yielded roughly the following results:

Package 1 Prediction (ms) 10 Predictions (ms) 100 Predictions (ms)
profanity-check 0.2 0.5 3.5
profanity-filter 60 1200 13000
profanity 0.3 1.2 24

profanity-check is anywhere from 300 - 4000 times faster than profanity-filter in this benchmark!

Installation

$ pip install profanity-check

Usage

from profanity_check import predict, predict_prob

predict(['predict() takes an array and returns a 1 for each string if it's offensive, else 0.'])
# [0]

predict(['fuck you'])
# [1]

predict_prob(['predict_prob() takes an array and returns the probability each string is offensive'])
# [0.08686173]

predict_prob(['go to hell, you scum'])
# [0.7618861]

Note that both predict() and predict_prob return numpy arrays.

More on How It Works

Special thanks to the authors of the datasets used in this project. profanity-check was trained on a combined dataset from 2 sources:

profanity-check relies heavily on the excellent scikit-learn library. It's mostly powered by scikit-learn classes CountVectorizer, LinearSVC, and CalibratedClassifierCV.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profanity-check-1.0.0.tar.gz (2.4 MB view details)

Uploaded Source

Built Distribution

profanity_check-1.0.0-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file profanity-check-1.0.0.tar.gz.

File metadata

  • Download URL: profanity-check-1.0.0.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for profanity-check-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f9144a78cf0a487bd76d14c1d12d700e069013ad339b0e4830403d9fc2d8de75
MD5 a7e0419fbd2510a453af799bff3ec5af
BLAKE2b-256 0c9364d5b40c47990255078d49f9cfab67cbee34835e1c91ca25366d748ddcd5

See more details on using hashes here.

File details

Details for the file profanity_check-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: profanity_check-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for profanity_check-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 04602f5cd9f4379c8518f62ff1a96eba581a8972c75b0f3d7831f4cbda70d2e9
MD5 8ca1e02da7a8578495d516a0f8696664
BLAKE2b-256 81eaff64fa9d8fe520fea274309e2c05ed9317b49291b1829a95e36c1d959dbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page