Skip to main content

Blazingly fast cleaning swear words (and their leetspeak) in strings

Project description

better_profanity

Blazingly fast cleaning swear words (and their leetspeak) in strings

release Build Status python license

Inspired from package profanity of Ben Friedland, this library is significantly faster than the original one, by using string comparison instead of regex.

It supports modified spellings (such as p0rn, h4NDjob, handj0b and b*tCh).

Requirements

This package works with Python 3.4+ and PyPy3.

Installation

$ pip install better_profanity

Unicode characters

Only Unicode characters from categories Ll, Lu, Mc and Mn are added. More on Unicode categories can be found here.

Not all languages are supported yet, such as Chinese.

Usage

from better_profanity import profanity

if __name__ == "__main__":
    profanity.load_censor_words()

    text = "You p1ec3 of sHit."
    censored_text = profanity.censor(text)
    print(censored_text)
    # You **** of ****.

All modified spellings of words in profanity_wordlist.txt will be generated. For example, the word handjob would be loaded into:

'handjob', 'handj*b', 'handj0b', 'handj@b', 'h@ndjob', 'h@ndj*b', 'h@ndj0b', 'h@ndj@b',
'h*ndjob', 'h*ndj*b', 'h*ndj0b', 'h*ndj@b', 'h4ndjob', 'h4ndj*b', 'h4ndj0b', 'h4ndj@b'

The full mapping of the library can be found in profanity.py.

1. Censor swear words from a text

By default, profanity replaces each swear words with 4 asterisks ****.

from better_profanity import profanity

if __name__ == "__main__":
    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text)
    print(censored_text)
    # You **** of ****.

2. Censor doesn't care about word dividers

The function .censor() also hide words separated not just by an empty space but also other dividers, such as _, , and .. Except for @, $, *, ", '.

from better_profanity import profanity

if __name__ == "__main__":
    text = "...sh1t...hello_cat_fuck,,,,123"

    censored_text = profanity.censor(text)
    print(censored_text)
    # "...****...hello_cat_****,,,,123"

3. Censor swear words with custom character

4 instances of the character in second parameter in .censor() will be used to replace the swear words.

from better_profanity import profanity

if __name__ == "__main__":
    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text, '-')
    print(censored_text)
    # You ---- of ----.

4. Check if the string contains any swear words

Function .contains_profanity() return True if any words in the given string has a word existing in the wordlist.

from better_profanity import profanity

if __name__ == "__main__":
    dirty_text = "That l3sbi4n did a very good H4ndjob."

    profanity.contains_profanity(dirty_text)
    # True

5. Censor swear words with a custom wordlist

5.1. Wordlist as a List

Function load_censor_words takes a List of strings as censored words. The provided list will replace the default wordlist.

from better_profanity import profanity

if __name__ == "__main__":
    custom_badwords = ['happy', 'jolly', 'merry']
    profanity.load_censor_words(custom_badwords)

    print(profanity.contains_profanity("Have a merry day! :)"))
    # Have a **** day! :)

5.2. Wordlist as a file

Function `load_censor_words_from_file takes a filename, which is a text file and each word is separated by lines.

from better_profanity import profanity

if __name__ == "__main__":
    profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt')

6. Whitelist

Function load_censor_words and load_censor_words_from_file takes a keyword argument whitelist_words to ignore words in a wordlist.

It is best used when there are only a few words that you would like to ignore in the wordlist.

# Use the default wordlist
profanity.load_censor_words(whitelist_words=['happy', 'merry'])

# or with your custom words as a List
custom_badwords = ['happy', 'jolly', 'merry']
profanity.load_censor_words(custom_badwords, whitelist_words=['merry'])

# or with your custom words as a text file
profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt', whitelist_words=['merry'])

7. Add more censor words

from better_profanity import profanity

if __name__ == "__main__":
    custom_badwords = ['happy', 'jolly', 'merry']
    profanity.add_censor_words(custom_badwords)

    print(profanity.contains_profanity("Happy you, fuck!"))
    # **** you, ****!

Limitations

  1. As the library compares each word by characters, the censor could easily be bypassed by adding any character(s) to the word:
profanity.censor('I just have sexx')
# returns 'I just have sexx'

profanity.censor('jerkk off')
# returns 'jerkk off'
  1. Any word in wordlist that have non-space separators cannot be recognised, such as s & m, and therefore, it won't be filtered out. This problem was raised in #5.

Testing

$ python tests.py

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Special thanks to

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

better_profanity-0.7.0.tar.gz (30.0 kB view details)

Uploaded Source

Built Distribution

better_profanity-0.7.0-py3-none-any.whl (46.1 kB view details)

Uploaded Python 3

File details

Details for the file better_profanity-0.7.0.tar.gz.

File metadata

  • Download URL: better_profanity-0.7.0.tar.gz
  • Upload date:
  • Size: 30.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for better_profanity-0.7.0.tar.gz
Algorithm Hash digest
SHA256 8a6fdc8606d7471e7b5f6801917eca98ec211098262e82f62da4f5de3a73145b
MD5 a9d31bffe54927365a90dd422c1665b1
BLAKE2b-256 b54a52966c1c883819f0b48540312a4a7f63b5c49f4e5b1a10838ae7f06ebb3c

See more details on using hashes here.

File details

Details for the file better_profanity-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: better_profanity-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 46.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for better_profanity-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd4c529ea6aa2db1aaa50524be1ed14d0fe5c664f1fd88c8bc388c7e9f9f00e8
MD5 4a2f5b70e40e12dffe06433c90e35f71
BLAKE2b-256 f3dd0b074d89e903cc771721cde2c4bf3d8c9d114b5bd791af5c62bcf5fb9459

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page