A fast and flexible utility for censoring and filtering text

These details have not been verified by PyPI

Project description

Fast Censor

fast_censor

A fast and flexible package for filtering out profanity or other strings from text, ~100 times faster than alternatives
the fastest string utility for profanity detection / censoring
Allows for detection with repeated characters and character substitution
Zero-dependency and works for python 3.6 -- 3.11

Installation

From source

cd fast-censor  # enter into project directory
python setup.py install 
# or with pip locally
pip install -e .

From GitHub

pip install git+https://github.com/mbuchove/fast_censor.git

Uses

from fast_censor import FastCensor

# to load default (encoded) profanity word list
censor = FastCensor()

# load alternate path, example is a plain text word list without encoding
censor_clean = fast_censor.FastCensor(
    wordlist=fast_censor.WordListHandler.get_default_wordlist_path("clean_wordlist_decoded.txt"), 
    wordlist_encoded=False,
)
censor_clean.add_words(['bat', 'rick'])

# censor texts or simply get the indices of matches
matches = censor_clean.check_text("this bat is for riii1ick")
# >>> [(5, 9), (17, 25)]
censored_text = censor_clean.censor("fuuudge you")
# >>> "******* you"

Character substitutions

FastCensor's profanity matcher allows the flexibility to match words when specified characters are substituted for others, as is customary in 1337 speak. A default is set for commonly used substitutions.

To set your own, for example, you would pass the following into FastCensor

substitutions = {'a': '@4'}

all matching is case-insensitive

Character repititon

By default, words will still match even if a matching character is repeated any number of times. This includes any valid substitute for that character

For example, "baaa@@aatt" will match "bat"

You can turn this off by passing allow_repititions=False to censor_text or check_text

Delimiters

Use the delimiters parameter of FastCensor to set the delimiter characters, which determine the boundaries of a word. Profanity matches will not extend across any delimiting character.

For example, if '_' is a delimiter, "ba_t" would not match "bat"

Editing and saving wordlist

censor.add_word('new_word') # to add a new word censor.write_words_file("word_lists/new_wordlist_encoded.txt", encode=True)

Encoding

By default, the word lists are base64-encoded, so you can avoid displaying vulgar or offensive words. If you would like to save a word list in plain text, set encode=False in write_words_file

Benchmarks

See: This Gist for performance measures of filtering compared to other packages

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.2

Mar 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_censor-0.3.2.tar.gz (15.9 kB view details)

Uploaded Mar 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fast_censor-0.3.2-py3-none-any.whl (16.1 kB view details)

Uploaded Mar 1, 2025 Python 3

File details

Details for the file fast_censor-0.3.2.tar.gz.

File metadata

Download URL: fast_censor-0.3.2.tar.gz
Upload date: Mar 1, 2025
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for fast_censor-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`1a330679e92f9b42c7beef570e5d452691aac008c0e196ff7a0c704c5ecccef5`
MD5	`0315a0a221be94e6771fa7afbf10e366`
BLAKE2b-256	`1fdd95bb2aabf0098a2bbbb30508ec2c5b1540091465ab33263b9558e60526d9`

See more details on using hashes here.

File details

Details for the file fast_censor-0.3.2-py3-none-any.whl.

File metadata

Download URL: fast_censor-0.3.2-py3-none-any.whl
Upload date: Mar 1, 2025
Size: 16.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for fast_censor-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e9bd67936f95a5c9705e113ddd638d3c3329f1171f1367a188214d49e77d463f`
MD5	`3a7a68211f81ad8e1004bf0c2180731e`
BLAKE2b-256	`0155e9e6cef3289a8006e21a730c446223939bef5de3b6a6f23c70b95fca326b`

See more details on using hashes here.

fast-censor 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Fast Censor

fast_censor

Installation

From source

From GitHub

Uses

Character substitutions

Character repititon

Delimiters

Editing and saving wordlist

Encoding

Benchmarks

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes