Skip to main content

This is a library for effective moderation of content.

Project description

🚫 BadWords

High-performance profanity filter for Python with multilingual support and evasion detection.

Python Version License: MIT Build Status Downloads

InstallationQuick StartSupported LanguagesAdvanced Evasion Detection


📖 Description

BadWords is a sophisticated profanity filtering library designed to clean up user-generated content. Unlike simple keyword matching, it uses similarity scoring, homoglyph detection, and transliteration to catch even the most cleverly disguised insults.

📦 Installation

Requirements

  • Recommended: Python 3.13
  • Minimum: Python 3.10+

Install via GitHub

pip install git+[https://github.com/FlacSy/badwords.git](https://github.com/FlacSy/badwords.git)

Install via PyPI

pip install badwords-py

⚡ Quick Start

Basic Initialization

from badwords import ProfanityFilter

# Initialize filter
p = ProfanityFilter()

# Load specific languages (e.g., English and Russian)
p.init(languages=["en", "ru"])

# Or load ALL 26+ supported languages
p.init()

Checking and Filtering Text

text = "Some very b4d text here"

# 1. Simple check (Returns Boolean)
is_bad = p.filter_text(text)
print(is_bad) # True

# 2. Censoring text (Returns String)
clean_text = p.filter_text(text, replace_character="*")
print(clean_text) # "Some very *** text here"

🛠 Methods & API

filter_text(text, match_threshold=0.8, replace_character=None)

The core method of the library.

Parameter Type Default Description
text str Required Input text to check.
match_threshold float 0.8 Similarity threshold (1.0 = exact match, 0.7 = aggressive).
replace_character str/None None If provided, returns censored string. If None, returns bool.

[!WARNING] Performance Tip: Using match_threshold < 1.0 enables fuzzy matching which is slower. Use 1.0 for high-traffic real-time filtering, or 0.95 for a good balance.


🧩 Advanced Evasion Detection

Standard filters are easy to bypass. BadWords is built to detect:

  • Homoglyphs: Detects hеllo (using Cyrillic 'е') or h4llo (numbers).
  • Transliteration: Automatically handles mapping between Cyrillic and Latin alphabets.
  • Normalization: Strips diacritics, special characters, and decorative Unicode symbols.
  • Similarity Analysis: Uses fuzzy matching to find words with deliberate typos.

Examples of detected evasions:

_filter.filter_text("hеllо")  # Mixed alphabets (Cyrillic + Latin) -> DETECTED
_filter.filter_text("h3ll0")  # Character substitution -> DETECTED
_filter.filter_text("h⍺llo")  # Mathematical/Greek symbols -> DETECTED
_filter.filter_text("привет") # Transliterated matches -> DETECTED

🌍 Supported Languages

BadWords currently supports 26 languages out of the box:

Code Language Code Language Code Language
en English ru Russian ua Ukrainian
de German fr French it Italian
sp Spanish pl Polish cz Czech
ja Japanese ko Korean th Thai
... & 14 more

Use p.get_all_languages() to see the full list in your code.


🚀 Full Integration Example

from badwords import ProfanityFilter

def monitor_chat():
    # Setup for a global chat
    profanity_filter = ProfanityFilter()
    profanity_filter.init(["en", "ru", "de"])
    
    # Custom project-specific banned words
    profanity_filter.add_words(["spam_link_v1", "scam_bot_99"])

    user_input = "Hey! Check out this b.a.d.w.o.r.d"
    
    # Moderate with high accuracy
    is_offensive = profanity_filter.filter_text(user_input, match_threshold=0.95)
    
    if is_offensive:
        print("Message blocked: Contains restricted language.")
    else:
        # Proceed with processing
        pass

if __name__ == "__main__":
    monitor_chat()

🤝 Contributing

Contributions are what make the open-source community an amazing place to learn, inspire, and create.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

Distributed under the MIT License. See LICENSE for more information.

Developed with ❤️ by FlacSy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

badwords_py-2.1.0.tar.gz (69.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

badwords_py-2.1.0-py3-none-any.whl (67.3 kB view details)

Uploaded Python 3

File details

Details for the file badwords_py-2.1.0.tar.gz.

File metadata

  • Download URL: badwords_py-2.1.0.tar.gz
  • Upload date:
  • Size: 69.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for badwords_py-2.1.0.tar.gz
Algorithm Hash digest
SHA256 c2de970d162a5f589013c97eff149aaad028c62fe6628cc945372b664fcf00a0
MD5 9a29efe0ac18305166e168562997d5b0
BLAKE2b-256 ca8f7c3955cb6ad2cead0e3d43443163698c31d43140032cc1725b7e7c39dd56

See more details on using hashes here.

File details

Details for the file badwords_py-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: badwords_py-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for badwords_py-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3dd7cbbb03bd5e7b1ab25f7a2588c0c3776a43151e1efa8885c25288b43663bb
MD5 86da92e72bd142a219efb8fdcc674fdd
BLAKE2b-256 b4df71458dd92d161488cc120f4c233a860f991d4dfcfd015b917247f633d164

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page