A Python library for filtering profane words using string matching techniques.
Project description
prof4nities
A small Python library to detect and optionally censor profane or inappropriate words using language-specific wordlists, Levenshtein distance, and fuzzy string matching.
Installation
To install using pip, run:
pip install prof4nities
Quick example
from prof4nities import Censor
censor = Censor(language="en")
# assuming "badword" is in the wordlist
print(censor("badword in a sentence"))
# >>> ******* in a sentence
print(censor(["badword", "in", "a", "sentence"]))
# >>> ******* in a sentence
print(censor(["badword", "in", "a", "sentence"], stringify=False))
# >>> [Word('badword'), Word('in'), Word('a'), Word('sentence')]
The Censor class loads a language-specific Wordlist (default en) and exposes a callable interface. See prof4nities/filter.py for implementation details.
Configuration via environment variables
prof4nities reads a couple of runtime thresholds from environment variables (defined in prof4nities/config.py). Export them before running your code to change behavior:
export LEVENSHTEIN_THRESHOLD=0.75
export FUZZY_RATIO_THRESHOLD=0.85
# then run your script or REPL
python -c "from prof4nities import Censor; print(Censor('en')('badword'))"
Defaults are:
LEVENSHTEIN_THRESHOLD=0.8FUZZY_RATIO_THRESHOLD=0.8
Persistent cache
Two kinds of persisted data are used by the library to avoid repeated downloads:
- WordNet corpora:
Censordownloads the NLTK WordNet corpus on first use and stores it in the platform cache directory returned byprof4nities.config.Directories.CACHE_DIR. - Wordlists: fetched profanity wordlists are cached under the same application cache in the
wordlists/subdirectory (e.g.<cache_dir>/wordlists/en.txt). TheWordlistmanager will use a cached copy when present and otherwise fetch from the upstream source and write a best-effort cache copy.
The exact cache location depends on the platform (see the platformdirs package). To inspect or override the cache location at runtime you can set the typical platform environment variables (for example on Linux set XDG_CACHE_HOME) or read the path via the prof4nities.config.Directories.CACHE_DIR value in code.
Development
Setting up
Install uv using pip.
pip install uv
Other installation methods are documented here.
Structure
prof4nities/
├── censor.py # Core Censor behavior and docstring examples
├── config.py # Environment-driven settings
└── manager/
└── wordlist.py # Wordlist fetching and processing
Testing
Pytest is used for testing. To run unit tests, run:
uv run pytest tests/
To make your own test, make a file prefixed with test_ and put it inside the tests/ directory and should be arranged accordingly.
Then within the file simply import the functionality to test
import pytest
from prof4nities import Censor
# Test cases should be prefixed with `test_`
def test_censor():
...
assert True is True
...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prof4nities-0.1.1.tar.gz.
File metadata
- Download URL: prof4nities-0.1.1.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0871416d14bfced729f65a75b952a526edf64dedf4fdf0d3a187c9b932be841
|
|
| MD5 |
d751a82c44436a587daa9e7304ea359a
|
|
| BLAKE2b-256 |
5c6bd056f127f8bb274b93b6b52e6d3591b57e2a645965fa04eb948a2b19e57f
|
File details
Details for the file prof4nities-0.1.1-py3-none-any.whl.
File metadata
- Download URL: prof4nities-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ba08b7d18b542993f17654fbf54cc4088972f2790c2b6179ee69e793bf722a2
|
|
| MD5 |
9c1d859f9aa957b83a0913ae7d5a3807
|
|
| BLAKE2b-256 |
eeef6ff6de60acd0418099e0ff1b0847662a0a632b92fc740e53c410e9feb597
|