A Python library to clean English swear words in strings
Project description
better_profanity
A Python library to clean swear words in strings.
Inspired from package profanity of Ben Friedland, this library is much faster than the original one, by using string comparison instead of regex.
Requirements
To make use of Python static tying, this package only works with Python 3.6+
.
Unicode characters
A huge thanks to @Derfirm for adding support for Unicode characters.
For release 0.3-beta.0
, only Unicode characters from categories Ll
, Lu
, Mc
and Mn
are added. More on Unicode categories can be found here.
Usage
By default, on the first .censor()
call, profanity
initializes a set of words, from profanity_wordlist.txt, to be used to compare against the input texts. This set of words will be stored in memory (~5MB+).
1. Censor swear words from a text
By default, profanity
replaces each swear words with 4 asterisks ****
.
from better_profanity import profanity
if __name__ == "__main__":
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text)
print(censored_text)
# You **** of ****.
2. Censor doesn't care about word dividers
The function .censor()
also hide words separated not just by an empty space
but also other dividers, such as _
, ,
and .
. Except for @, $, ^, *, &, \, \
.
from better_profanity import profanity
if __name__ == "__main__":
text = "...shit...hello_cat_fuck,,,,123"
censored_text = profanity.censor(text)
print(censored_text)
# "...****...hello_cat_****,,,,123"
3. Censor swear words with custom character
4 instances of the character in second parameter in .censor()
will be used to replace the swear words.
from better_profanity import profanity
if __name__ == "__main__":
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text, '-')
print(censored_text)
# You ---- of ----.
4. Check if the string contains any swear words
from better_profanity import profanity
if __name__ == "__main__":
dirty_text = "That l3sbi4n did a very good H4ndjob."
profanity.contains_profanity(dirty_text)
# True
5. Censor swear words with a custom wordlist
The provided list of words will replace the default wordlist.
4 instances of the character in second parameter in .censor()
will be used to replace the swear words.
from better_profanity import profanity
if __name__ == "__main__":
text = "You p1ec3 of sHit."
custom_badwords = ['happy', 'jolly', 'merry']
profanity.load_censor_words(custom_badwords)
print(profanity.contains_profanity("Fuck you!"))
# Fuck you
print(profanity.contains_profanity("Have a merry day! :)"))
# Have a **** day! :)
6. Censor Unicode characters
from better_profanity import profanity
if __name__ == "__main__":
bad_text = "Эффекти́вного противоя́дия от я́да фу́гу не существу́ет до сих пор"
profanity.load_censor_words(["противоя́дия"])
censored_text = profanity.censor(text)
print(censored_text)
# Эффекти́вного **** от я́да фу́гу не существу́ет до сих пор
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for better_profanity-0.3b0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91c86bb78a86b6dc9c119d484b65141c998f162ed1b6e3b945594a70ef1c297c |
|
MD5 | 444b5161e8cbfa489682ffcbc78a9d35 |
|
BLAKE2b-256 | fcca19ed97881f22a875b6238b18f613ee727bc62a381a21896ed28bcd936d16 |