Skip to main content

This is a library for effective moderation of content.

Project description

🚫 BadWords

High-performance profanity filter for Python, Rust, and JavaScript (WebAssembly)
with multilingual support and evasion detection.


Tests Format License: MIT Documentation Platform

Python Rust JavaScript TypeScript

PyPI npm (badwords-wasm) npm (@badwords/languages) crates.io badwords-py bdw (legacy)


InstallationQuick StartBenchmarksSupported LanguagesEvasion DetectionDocumentation

---

📖 Description

BadWords is a sophisticated profanity filtering library designed to clean up user-generated content. Unlike simple keyword matching, it uses similarity scoring, homoglyph detection, and transliteration to catch even the most cleverly disguised insults.

Architecture: The core is implemented in Rust for performance. Python provides a thin API layer with full type hints for IDE/linter support. The Rust library can also be used directly from Rust projects.

📦 Installation

Requirements

  • Recommended: Python 3.13
  • Minimum: Python 3.10+

Install via GitHub

pip install git+[https://github.com/FlacSy/badwords.git](https://github.com/FlacSy/badwords.git)

Install via PyPI

pip install badwords-py

⚡ Quick Start

Basic Initialization

from badwords import ProfanityFilter

# Initialize filter
p = ProfanityFilter()

# Load specific languages (e.g., English and Russian)
p.init(languages=["en", "ru"])

# Or load ALL 26+ supported languages
p.init()

Checking and Filtering Text

text = "Some very b4d text here"

# 1. Simple check (Returns Boolean)
is_bad = p.filter_text(text)
print(is_bad) # True

# 2. Censoring text (Returns String)
clean_text = p.filter_text(text, replace_character="*")
print(clean_text) # "Some very *** text here"

⏱ Benchmarks

CPU GPU RAM OS
x86_64 i7 Intel® Core™ i7-10700KF × 16 NVIDIA GeForce RTX™ 3070 64 GB DDR4 3200MHz Ubuntu 24.04.2 LTS

Rule-based matching (en+ru, match_threshold=1.0). Run: make bench

Scenario Rust (badwords-core) Python (badwords-py)
Clean text (no match) ~7.6 µs (~130 K/s) ~7.7 µs (~130 K/s)
Bad word (match) ~3.1 µs (~320 K/s) ~2.7 µs (~370 K/s)
Censor (replace) ~2.8 µs (~360 K/s) ~2.5 µs (~400 K/s)
5 texts batch ~15 µs (~330 K/s) ~16 µs (~310 K/s)

Python uses Rust via PyO3, overhead minimal.

vs glin-profanity

Rule-based mode, en+ru. Run: make bench-compare (requires pip install glin-profanity)

Scenario BadWords glin-profanity
Clean text ~7 µs (~140 K/s) ~4.4 ms (~230/s)
Bad word ~1.3 µs (~770 K/s) ~0.2 ms (~5 K/s)
Censor ~1.8 µs (~560 K/s) ~1.4 ms (~700/s)
5 texts batch ~16 µs (~310 K/s) ~10 ms (~500/s)

BadWords is ~100–600× faster (Rust core vs pure Python).

ML mode

pip install glin-profanity[ml] + make bench-compare. 100 iter each.

Scenario BadWords ML (ONNX) glin transformer
Clean text (43 chars) ~6.5 ms (~150/s) ~27 ms (~37/s)
Bad word (8 chars) ~4.6 ms (~220/s) ~21 ms (~47/s)
5 texts batch (82 chars) ~24 ms (~210/s) ~107 ms (~47/s)

BadWords ML (XLM-RoBERTa) ~3–4× faster than glin transformer.


🛠 Methods & API

filter_text(text, match_threshold=1.0, replace_character=None)

The core method of the library.

Parameter Type Default Description
text str Required Input text to check.
match_threshold float 1.0 Similarity threshold (1.0 = exact match, 0.95 = fuzzy).
replace_character str/None None If provided, returns censored string. If None, returns bool.

[!WARNING] Performance Tip: Using match_threshold < 1.0 enables fuzzy matching which is slower. Use 1.0 for high-traffic real-time filtering, or 0.95 for a good balance.


🧩 Advanced Evasion Detection

Standard filters are easy to bypass. BadWords is built to detect:

  • Homoglyphs: Detects hеllo (using Cyrillic 'е') or h4llo (numbers).
  • Transliteration: Automatically handles mapping between Cyrillic and Latin alphabets.
  • Normalization: Strips diacritics, special characters, and decorative Unicode symbols.
  • Similarity Analysis: Uses fuzzy matching to find words with deliberate typos.

Examples of detected evasions:

_filter.filter_text("hеllо")  # Mixed alphabets (Cyrillic + Latin) -> DETECTED
_filter.filter_text("h3ll0")  # Character substitution -> DETECTED
_filter.filter_text("h⍺llo")  # Mathematical/Greek symbols -> DETECTED
_filter.filter_text("привет") # Transliterated matches -> DETECTED

🌍 Supported Languages

BadWords supports 25 languages out of the box:

Code Language Code Language Code Language
en English ru Russian ua Ukrainian
de German fr French it Italian
sp Spanish pl Polish cz Czech
ja Japanese ko Korean th Thai
br Portuguese (BR) da Danish du Dutch
fi Finnish gr Greek hu Hungarian
in Indonesian lt Lithuanian no Norwegian
po Portuguese ro Romanian sw Swedish
tu Turkish

Use p.get_all_languages() in code. Full list with word counts: badwords.flacsy.dev


🚀 Full Integration Example

from badwords import ProfanityFilter

def monitor_chat():
    # Setup for a global chat
    profanity_filter = ProfanityFilter()
    profanity_filter.init(["en", "ru", "de"])
    
    # Custom project-specific banned words
    profanity_filter.add_words(["spam_link_v1", "scam_bot_99"])

    user_input = "Hey! Check out this b.a.d.w.o.r.d"
    
    # Moderate with high accuracy
    is_offensive = profanity_filter.filter_text(user_input, match_threshold=0.95)
    
    if is_offensive:
        print("Message blocked: Contains restricted language.")
    else:
        # Proceed with processing
        pass

if __name__ == "__main__":
    monitor_chat()

🦀 Rust API (badwords-core)

Published on crates.io:

[dependencies]
badwords-core = "2"
use badwords_core::{ProfanityFilter, default_resource_dir};

let resource_dir = default_resource_dir();
let mut filter = ProfanityFilter::new(&resource_dir, true, true, true, true);
filter.init(None).unwrap();
filter.add_words(&["custom".to_string()]);
let (found, _) = filter.filter_text("hello", 1.0, None);

🌐 WebAssembly (JavaScript/TypeScript)

Same Rust code for browser and Node.js, compiled to WASM.

Build

# Browser
make wasm

# Node.js
make wasm-nodejs

Frontend (browser)

<script type="module">
  import init, { ProfanityFilter } from './path/to/badwords_wasm.js';
  await init();
  const filter = new ProfanityFilter();
  console.log(filter.isBad('text'));      // boolean
  console.log(filter.censor('text', '*')); // string
</script>

Backend (Node.js)

const { ProfanityFilter } = require('badwords-wasm');
const filter = new ProfanityFilter();
filter.isBad('hello');           // false
filter.censor('bad word', '*');  // "*** word"
filter.addWords(['custom']);

Optional languages (npm)

Built-in: en and ru. Additional languages via @badwords/languages:

npm install badwords-wasm @badwords/languages
import init, { ProfanityFilter } from 'badwords-wasm';
import de from '@badwords/languages/de';
import ua from '@badwords/languages/ua';

await init();
const filter = new ProfanityFilter();
filter.addWords(de);
filter.addWords(ua);

Available: br, cz, da, de, du, en, fi, fr, gr, hu, in, it, ja, ko, lt, no, pl, po, ro, ru, sp, sw, th, tu, ua. See @badwords/languages.

Examples: examples/wasm/browser/, examples/wasm/node/

🔧 Building from source

Requires: Rust, Python, maturin

python -m venv .venv && source .venv/bin/activate  # Linux/macOS
pip install maturin
make develop
# or: cd python && maturin build && pip install target/wheels/badwords_py-*.whl

🌐 WebAssembly (browser & Node.js)

Build the WASM package (requires wasm-pack):

cargo install wasm-pack
make wasm

Output: rust/badwords-wasm/pkg/ (npm package badwords-wasm)

  • Browser: Use the generated JS with a bundler or static server. See examples/wasm/browser/
  • Node.js: import init, { ProfanityFilter } from 'badwords-wasm' after npm install. See examples/wasm/node/
  • Publish to npm: make wasm or make wasm-nodejs, then make npm-publish
  • Optional languages: @badwords/languagesmake lang-packages then make npm-publish-languages

📚 Documentation

Full documentation (Python, Rust, JavaScript) with examples and API reference: badwords.flacsy.dev (EN / RU).

🤝 Contributing

Contributions are what make the open-source community an amazing place to learn, inspire, and create.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

Distributed under the MIT License. See LICENSE for more information.

Developed with ❤️ by FlacSy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

badwords_py-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (531.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

badwords_py-2.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (531.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

badwords_py-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (531.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

badwords_py-2.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (531.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

badwords_py-2.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (532.0 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file badwords_py-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for badwords_py-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 83e8980446e4bf86b360cbf9bdc0360f2a276517fd5265b79bec64e92ee94b9d
MD5 cd055a57f620eb67fa6f77982e5654dd
BLAKE2b-256 564d08b32b38b140b0f284ea3c809b9e4d409ae0eca1caf4110645ea1bf5fbcd

See more details on using hashes here.

File details

Details for the file badwords_py-2.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for badwords_py-2.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d056c6e7703b73439727e4c30f3a620c87fb302460a22e69467849eb0c726dc6
MD5 8ff734ae5015fe71b7dceb1caa6a5f53
BLAKE2b-256 01db7c2a39a68c7f700d5ec297f13cc679facdb0621285d6296c09794681de39

See more details on using hashes here.

File details

Details for the file badwords_py-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for badwords_py-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9f182cedeeebb9f083c2b159ee902876b6d7ae3f845fd6e0e546056b2e85cb4d
MD5 297ea12b7f7854b43d53c09cea1f4edf
BLAKE2b-256 e0078afd7905269e1655fbe7f744c726916b867b17373a17f310283ed4f97d15

See more details on using hashes here.

File details

Details for the file badwords_py-2.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for badwords_py-2.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1f2c72c539bb8b70fe472a1bde76a7d299cd8fbcfa5433592f05124147693a80
MD5 f4d609c425fffd3cf62c6820376779c4
BLAKE2b-256 673c3054ed014ec8e5d904cd1802610c08d25267837017b17ddc7709e8c9a998

See more details on using hashes here.

File details

Details for the file badwords_py-2.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for badwords_py-2.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ffa91778648c958cf43830e44d85305933cecbd54abc044a2702e93532f61f8f
MD5 1248da945126905a28bee36ccb7a9a36
BLAKE2b-256 6f92a29bc2f0736c43523ed2822d1280f2ec501190a174c559221fda10d57ebe

See more details on using hashes here.

Provenance

The following attestation bundles were made for badwords_py-2.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on FlacSy/BadWords

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page