Skip to main content

A Python library for filtering profane words using string matching techniques.

Project description

prof4nities

A small Python library to detect and optionally censor profane or inappropriate words using language-specific wordlists, Levenshtein distance, and fuzzy string matching.

Installation

To install using pip, run:

pip install prof4nities

Quick example

from prof4nities import Censor

censor = Censor(language="en")

# assuming "badword" is in the wordlist
print(censor("badword in a sentence"))
# >>> ******* in a sentence

print(censor(["badword", "in", "a", "sentence"]))
# >>> ******* in a sentence

print(censor(["badword", "in", "a", "sentence"], stringify=False))
# >>> [Word('badword'), Word('in'), Word('a'), Word('sentence')]

The Censor class loads a language-specific Wordlist (default en) and exposes a callable interface. See prof4nities/filter.py for implementation details.

Configuration via environment variables

prof4nities reads a couple of runtime thresholds from environment variables (defined in prof4nities/config.py). Export them before running your code to change behavior:

export LEVENSHTEIN_THRESHOLD=0.75
export FUZZY_RATIO_THRESHOLD=0.85
# then run your script or REPL
python -c "from prof4nities import Censor; print(Censor('en')('badword'))"

Defaults are:

  • LEVENSHTEIN_THRESHOLD=0.8
  • FUZZY_RATIO_THRESHOLD=0.8

Persistent cache

Two kinds of persisted data are used by the library to avoid repeated downloads:

  • WordNet corpora: Censor downloads the NLTK WordNet corpus on first use and stores it in the platform cache directory returned by prof4nities.config.Directories.CACHE_DIR.
  • Wordlists: fetched profanity wordlists are cached under the same application cache in the wordlists/ subdirectory (e.g. <cache_dir>/wordlists/en.txt). The Wordlist manager will use a cached copy when present and otherwise fetch from the upstream source and write a best-effort cache copy.

The exact cache location depends on the platform (see the platformdirs package). To inspect or override the cache location at runtime you can set the typical platform environment variables (for example on Linux set XDG_CACHE_HOME) or read the path via the prof4nities.config.Directories.CACHE_DIR value in code.

Development

Setting up

Install uv using pip.

pip install uv

Other installation methods are documented here.

Structure

prof4nities/
├── censor.py          # Core Censor behavior and docstring examples
├── config.py          # Environment-driven settings
└── manager/
    └── wordlist.py    # Wordlist fetching and processing

Testing

Pytest is used for testing. To run unit tests, run:

uv run pytest tests/

To make your own test, make a file prefixed with test_ and put it inside the tests/ directory and should be arranged accordingly.

Then within the file simply import the functionality to test

import pytest

from prof4nities import Censor


# Test cases should be prefixed with `test_`
def test_censor():
    ...
    assert True is True
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prof4nities-0.1.1.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prof4nities-0.1.1-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file prof4nities-0.1.1.tar.gz.

File metadata

  • Download URL: prof4nities-0.1.1.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.15

File hashes

Hashes for prof4nities-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e0871416d14bfced729f65a75b952a526edf64dedf4fdf0d3a187c9b932be841
MD5 d751a82c44436a587daa9e7304ea359a
BLAKE2b-256 5c6bd056f127f8bb274b93b6b52e6d3591b57e2a645965fa04eb948a2b19e57f

See more details on using hashes here.

File details

Details for the file prof4nities-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for prof4nities-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6ba08b7d18b542993f17654fbf54cc4088972f2790c2b6179ee69e793bf722a2
MD5 9c1d859f9aa957b83a0913ae7d5a3807
BLAKE2b-256 eeef6ff6de60acd0418099e0ff1b0847662a0a632b92fc740e53c410e9feb597

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page