A Python library for detecting and filtering profanity

These details have not been verified by PyPI

Project links

Homepage

Project description

profanity-filter: A Python library for detecting and filtering profanity

PyPI: https://pypi.python.org/pypi/profanity-filter

Installation

profanity-filter library is universal, it can detect and filter profanity in any language. To accomplish this task it needs profane word dictionaries and language tools with models installed. profanity-filter is already packaged with English and Russian profane word dictionaries.

For minimal setup for English you need to install profanity-filter with is bundled with spacy and download spacy model for tokenization and lemmatization:

$ pip install profanity-filter
$ python -m spacy download en

For more info about spacy models read: https://spacy.io/usage/models/.

Usage

from profanity_filter import ProfanityFilter

pf = ProfanityFilter()

pf.censor("That's bullshit!")
# "That's ********!"

pf.censor_char = '@'
pf.censor("That's bullshit!")
# "That's @@@@@@@@!"

pf.censor_char = '*'
pf.custom_profane_word_dictionaries = {'en': {'love', 'dog'}}
pf.censor("I love dogs and penguins!")
# "I **** **** and penguins"

pf.restore_profane_word_dictionaries()
pf.is_clean("That's awesome!")
# True

pf.is_clean("That's bullshit!")
# False

pf.is_profane("That's bullshit!")
# True

pf.extra_profane_word_dictionaries = {'en': {'chocolate', 'orange'}}
pf.censor("Fuck orange chocolates")
# "**** ****** **********"

Deep analysis

Deep analysis detects profane words that are inflected from profane words in profane word dictionary.

To get deep analysis functionality install additional libraries and dictionary for your language.

Firstly, install hunspell and hunspell-devel packages with your system package manager.

For Amazon Linux AMI run:

$ yum install hunspell

Then run (for English):

$ pip install -U -r requirements-deep-analysis.txt
$ cd profanity_filter/data
$ wget https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.aff
$ wget https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.dic
$ mv en_US.aff en.aff
$ mv en_US.dic en.dic

Then use profanity filter as usual:

from profanity_filter import ProfanityFilter

pf = ProfanityFilter()

pf.censor("fuckfuck")
# "********"

pf.censor_whole_words = False
pf.censor("oofucksoo")
# "oo*****oo"

Multilingual support

This library comes with multilingual support, which is enabled automatically after installing polyglot package and it's requirements for language detection. See https://polyglot.readthedocs.io/en/latest/Installation.html for instructions.

For Amazon Linux AMI run:

$ yum install libicu-devel

Then run:

$ pip install -U -r requirements-multilingual.txt

Add language

Let's take Russian language for example, to show how to add language support.

Russian language support

Firstly, we need to provide file profanityfilter/data/ru_badwords.txt which contains newline separated list of profane words. For Russian language it's already present, so we skip file generation.

Next, we need to download appropriate Spacy model. Unfortunately, Spacy model for Russian is not yet ready, so we will use English model for tokenization and hunspell and pymorphy2 for lemmatization.

Next, we download dictionaries for deep analysis:

> cd profanity_filter/data
> wget https://cgit.freedesktop.org/libreoffice/dictionaries/plain/ru_RU/ru_RU.aff
> wget https://cgit.freedesktop.org/libreoffice/dictionaries/plain/ru_RU/ru_RU.dic
> mv ru_RU.aff ru.aff
> mv ru_RU.dic ru.dic

Pymorphy2

For Russian and Ukrainian languages to achieve better results we suggest you to install pymorphy2. To install pymorphy2 with Russian dictionary run:

$ pip install -U -r requirements-pymorphy2-ru.txt

Usage

Let's create ProfanityFilter to filter Russian and English profanity.

from profanity_filter import ProfanityFilter

pf = ProfanityFilter(languages=['ru', 'en'])

pf.censor("Да бля, это просто shit какой-то!")
# "Да ***, это просто **** какой-то!"

Note, that order of languages in languages argument does matter. If a language tool (profane words list, Spacy model, HunSpell dictionary or pymorphy2 dictionary) is not found for a language that was detected for part of text, profanityfilter library automatically fallbacks to the first suitable language in languages.

As a consequence, if you want to filter just Russian profanity, you still need to specify some other language in languages argument to fallback on for loading Spacy model to perform tokenization, because, as noted before, there is no Spacy model for Russian.

Console Executable

$ profanity_filter -h
usage: profanity_filter [-h] [-t TEXT | -f PATH] [-l LANGUAGES] [-o OUTPUT_FILE] [--show]

Profanity filter console utility

optional arguments:
  -h, --help            show this help message and exit
  -t TEXT, --text TEXT  Test the given text for profanity
  -f PATH, --file PATH  Test the given file for profanity
  -l LANGUAGES, --languages LANGUAGES
                        Test for profanity using specified languages (comma
                        separated)
  -o OUTPUT_FILE, --output OUTPUT_FILE
                        Write the censored output to a file
  --show                Print the censored text

Credits

English profane word dictionary: https://github.com/areebbeigh/profanityfilter/ (author Areeb Beigh).

Russian profane word dictionary: https://github.com/PixxxeL/djantimat (author Ivan Sergeev).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.3.3

Apr 30, 2020

1.3.2

Apr 11, 2020

1.3.1

Dec 30, 2019

1.3.0

Dec 30, 2019

1.1.14

Apr 27, 2019

1.1.13

Apr 27, 2019

1.1.12

Mar 31, 2019

1.1.11

Mar 30, 2019

1.1.10

Mar 30, 2019

1.1.9

Mar 30, 2019

1.1.8

Mar 30, 2019

1.1.7

Mar 30, 2019

1.1.6

Mar 29, 2019

1.1.5

Mar 29, 2019

1.1.4

Mar 29, 2019

1.1.3

Mar 29, 2019

1.1.2

Mar 28, 2019

1.1.1

Mar 28, 2019

1.1.0

Mar 28, 2019

1.0.16

Mar 27, 2019

1.0.15

Mar 27, 2019

1.0.14

Mar 27, 2019

1.0.13

Mar 27, 2019

1.0.12

Mar 24, 2019

1.0.10

Mar 22, 2019

1.0.9

Mar 22, 2019

1.0.8

Mar 22, 2019

1.0.7

Mar 22, 2019

1.0.6

Mar 20, 2019

1.0.5

Mar 20, 2019

1.0.4

Nov 25, 2018

1.0.3

Oct 22, 2018

This version

1.0.2

Oct 22, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profanity-filter-1.0.2.tar.gz (28.7 kB view details)

Uploaded Oct 22, 2018 Source

Built Distribution

profanity_filter-1.0.2-py3-none-any.whl (130.3 kB view details)

Uploaded Oct 22, 2018 Python 3

File details

Details for the file profanity-filter-1.0.2.tar.gz.

File metadata

Download URL: profanity-filter-1.0.2.tar.gz
Upload date: Oct 22, 2018
Size: 28.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/0.11.5 CPython/3.6.5 Linux/4.18.11-1-default

File hashes

Hashes for profanity-filter-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`755a1f199ac6135616105abf7e8844a5b37d7fec539f2f49e17334a2b1a35788`
MD5	`6e03fed1594b63c900497ba0d3aea418`
BLAKE2b-256	`f3fcd289f7b9c3f124e8dcad1281b2428fa4bace91c88da1193adad1a9c6d7a1`

See more details on using hashes here.

File details

Details for the file profanity_filter-1.0.2-py3-none-any.whl.

File metadata

Download URL: profanity_filter-1.0.2-py3-none-any.whl
Upload date: Oct 22, 2018
Size: 130.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/0.11.5 CPython/3.6.5 Linux/4.18.11-1-default

File hashes

Hashes for profanity_filter-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab39729c87537bf870dba297adf8ec7a85827e5ecbb5e28bfdf2f1425e07769d`
MD5	`303d9805ffef37a27980a180dd09a8fb`
BLAKE2b-256	`e284f0c536433245b4fee0f42f213c5bcc437fbb6ffb784dff0007414998349d`

See more details on using hashes here.

profanity-filter 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

profanity-filter: A Python library for detecting and filtering profanity

Installation

Usage

Deep analysis

Multilingual support

Add language

Russian language support

Pymorphy2

Usage

Console Executable

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes