Skip to main content

A fast and flexible utility for censoring and filtering text

Project description

Fast Censor

fast_censor

  • A fast and flexible package for filtering out profanity or other strings from text, ~100 times faster than alternatives
  • the fastest string utility for profanity detection / censoring
  • Allows for detection with repeated characters and character substitution
  • Zero-dependency and works for python 3.6 -- 3.11

Installation

From source

cd fast-censor  # enter into project directory
python setup.py install 
# or with pip locally
pip install -e . 

From GitHub

pip install git+https://github.com/mbuchove/fast_censor.git

Uses

from fast_censor import FastCensor

# to load default (encoded) profanity word list
censor = FastCensor()

# load alternate path, example is a plain text word list without encoding
censor_clean = fast_censor.FastCensor(
    wordlist=fast_censor.WordListHandler.get_default_wordlist_path("clean_wordlist_decoded.txt"), 
    wordlist_encoded=False,
)
censor_clean.add_words(['bat', 'rick'])

# censor texts or simply get the indices of matches
matches = censor_clean.check_text("this bat is for riii1ick")
# >>> [(5, 9), (17, 25)]
censored_text = censor_clean.censor("fuuudge you")
# >>> "******* you"

Character substitutions

FastCensor's profanity matcher allows the flexibility to match words when specified characters are substituted for others, as is customary in 1337 speak. A default is set for commonly used substitutions.

To set your own, for example, you would pass the following into FastCensor

substitutions = {'a': '@4'}

  • all matching is case-insensitive

Character repititon

By default, words will still match even if a matching character is repeated any number of times. This includes any valid substitute for that character

For example, "baaa@@aatt" will match "bat"

You can turn this off by passing allow_repititions=False to censor_text or check_text

Delimiters

Use the delimiters parameter of FastCensor to set the delimiter characters, which determine the boundaries of a word. Profanity matches will not extend across any delimiting character.

For example, if '_' is a delimiter, "ba_t" would not match "bat"

Editing and saving wordlist

censor.add_word('new_word') # to add a new word censor.write_words_file("word_lists/new_wordlist_encoded.txt", encode=True)

Encoding

By default, the word lists are base64-encoded, so you can avoid displaying vulgar or offensive words. If you would like to save a word list in plain text, set encode=False in write_words_file

Benchmarks

See: This Gist for performance measures of filtering compared to other packages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_censor-0.3.2.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fast_censor-0.3.2-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file fast_censor-0.3.2.tar.gz.

File metadata

  • Download URL: fast_censor-0.3.2.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for fast_censor-0.3.2.tar.gz
Algorithm Hash digest
SHA256 1a330679e92f9b42c7beef570e5d452691aac008c0e196ff7a0c704c5ecccef5
MD5 0315a0a221be94e6771fa7afbf10e366
BLAKE2b-256 1fdd95bb2aabf0098a2bbbb30508ec2c5b1540091465ab33263b9558e60526d9

See more details on using hashes here.

File details

Details for the file fast_censor-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: fast_censor-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for fast_censor-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e9bd67936f95a5c9705e113ddd638d3c3329f1171f1367a188214d49e77d463f
MD5 3a7a68211f81ad8e1004bf0c2180731e
BLAKE2b-256 0155e9e6cef3289a8006e21a730c446223939bef5de3b6a6f23c70b95fca326b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page