This is a library for effective moderation of content.
Project description
🚫 BadWords
High-performance profanity filter for Python, Rust, and JavaScript (WebAssembly)
with multilingual support and evasion detection.
Installation • Quick Start • Benchmarks • Supported Languages • Evasion Detection • Documentation
📖 Description
BadWords is a sophisticated profanity filtering library designed to clean up user-generated content. Unlike simple keyword matching, it uses similarity scoring, homoglyph detection, and transliteration to catch even the most cleverly disguised insults.
Architecture: The core is implemented in Rust for performance. Python provides a thin API layer with full type hints for IDE/linter support. The Rust library can also be used directly from Rust projects.
📦 Installation
Requirements
- Recommended: Python 3.13
- Minimum: Python 3.10+
Install via GitHub
pip install git+[https://github.com/FlacSy/badwords.git](https://github.com/FlacSy/badwords.git)
Install via PyPI
pip install badwords-py
⚡ Quick Start
Basic Initialization
from badwords import ProfanityFilter
# Initialize filter
p = ProfanityFilter()
# Load specific languages (e.g., English and Russian)
p.init(languages=["en", "ru"])
# Or load ALL 26+ supported languages
p.init()
Checking and Filtering Text
text = "Some very b4d text here"
# 1. Simple check (Returns Boolean)
is_bad = p.filter_text(text)
print(is_bad) # True
# 2. Censoring text (Returns String)
clean_text = p.filter_text(text, replace_character="*")
print(clean_text) # "Some very *** text here"
⏱ Benchmarks
| CPU | GPU | RAM | OS |
|---|---|---|---|
| x86_64 i7 Intel® Core™ i7-10700KF × 16 | NVIDIA GeForce RTX™ 3070 | 64 GB DDR4 3200MHz | Ubuntu 24.04.2 LTS |
Rule-based matching (en+ru, match_threshold=1.0). Run: make bench
| Scenario | Rust (badwords-core) | Python (badwords-py) |
|---|---|---|
| Clean text (no match) | ~7.6 µs (~130 K/s) | ~7.7 µs (~130 K/s) |
| Bad word (match) | ~3.1 µs (~320 K/s) | ~2.7 µs (~370 K/s) |
| Censor (replace) | ~2.8 µs (~360 K/s) | ~2.5 µs (~400 K/s) |
| 5 texts batch | ~15 µs (~330 K/s) | ~16 µs (~310 K/s) |
Python uses Rust via PyO3, overhead minimal.
vs glin-profanity
Rule-based mode, en+ru. Run: make bench-compare (requires pip install glin-profanity)
| Scenario | BadWords | glin-profanity |
|---|---|---|
| Clean text | ~7 µs (~140 K/s) | ~4.4 ms (~230/s) |
| Bad word | ~1.3 µs (~770 K/s) | ~0.2 ms (~5 K/s) |
| Censor | ~1.8 µs (~560 K/s) | ~1.4 ms (~700/s) |
| 5 texts batch | ~16 µs (~310 K/s) | ~10 ms (~500/s) |
BadWords is ~100–600× faster (Rust core vs pure Python).
ML mode
pip install glin-profanity[ml] + make bench-compare. 100 iter each.
| Scenario | BadWords ML (ONNX) | glin transformer |
|---|---|---|
| Clean text (43 chars) | ~6.5 ms (~150/s) | ~27 ms (~37/s) |
| Bad word (8 chars) | ~4.6 ms (~220/s) | ~21 ms (~47/s) |
| 5 texts batch (82 chars) | ~24 ms (~210/s) | ~107 ms (~47/s) |
BadWords ML (XLM-RoBERTa) ~3–4× faster than glin transformer.
🛠 Methods & API
filter_text(text, match_threshold=1.0, replace_character=None)
The core method of the library.
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
Required | Input text to check. |
match_threshold |
float |
1.0 |
Similarity threshold (1.0 = exact match, 0.95 = fuzzy). |
replace_character |
str/None |
None |
If provided, returns censored string. If None, returns bool. |
[!WARNING] Performance Tip: Using
match_threshold < 1.0enables fuzzy matching which is slower. Use1.0for high-traffic real-time filtering, or0.95for a good balance.
🧩 Advanced Evasion Detection
Standard filters are easy to bypass. BadWords is built to detect:
- Homoglyphs: Detects
hеllo(using Cyrillic 'е') orh4llo(numbers). - Transliteration: Automatically handles mapping between Cyrillic and Latin alphabets.
- Normalization: Strips diacritics, special characters, and decorative Unicode symbols.
- Similarity Analysis: Uses fuzzy matching to find words with deliberate typos.
Examples of detected evasions:
_filter.filter_text("hеllо") # Mixed alphabets (Cyrillic + Latin) -> DETECTED
_filter.filter_text("h3ll0") # Character substitution -> DETECTED
_filter.filter_text("h⍺llo") # Mathematical/Greek symbols -> DETECTED
_filter.filter_text("привет") # Transliterated matches -> DETECTED
🌍 Supported Languages
BadWords supports 25 languages out of the box:
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
en |
English | ru |
Russian | ua |
Ukrainian |
de |
German | fr |
French | it |
Italian |
sp |
Spanish | pl |
Polish | cz |
Czech |
ja |
Japanese | ko |
Korean | th |
Thai |
br |
Portuguese (BR) | da |
Danish | du |
Dutch |
fi |
Finnish | gr |
Greek | hu |
Hungarian |
in |
Indonesian | lt |
Lithuanian | no |
Norwegian |
po |
Portuguese | ro |
Romanian | sw |
Swedish |
tu |
Turkish |
Use p.get_all_languages() in code. Full list with word counts: badwords.flacsy.dev
🚀 Full Integration Example
from badwords import ProfanityFilter
def monitor_chat():
# Setup for a global chat
profanity_filter = ProfanityFilter()
profanity_filter.init(["en", "ru", "de"])
# Custom project-specific banned words
profanity_filter.add_words(["spam_link_v1", "scam_bot_99"])
user_input = "Hey! Check out this b.a.d.w.o.r.d"
# Moderate with high accuracy
is_offensive = profanity_filter.filter_text(user_input, match_threshold=0.95)
if is_offensive:
print("Message blocked: Contains restricted language.")
else:
# Proceed with processing
pass
if __name__ == "__main__":
monitor_chat()
🦀 Rust API (badwords-core)
Published on crates.io:
[dependencies]
badwords-core = "2"
use badwords_core::{ProfanityFilter, default_resource_dir};
let resource_dir = default_resource_dir();
let mut filter = ProfanityFilter::new(&resource_dir, true, true, true, true);
filter.init(None).unwrap();
filter.add_words(&["custom".to_string()]);
let (found, _) = filter.filter_text("hello", 1.0, None);
🌐 WebAssembly (JavaScript/TypeScript)
Same Rust code for browser and Node.js, compiled to WASM.
Build
# Browser
make wasm
# Node.js
make wasm-nodejs
Frontend (browser)
<script type="module">
import init, { ProfanityFilter } from './path/to/badwords_wasm.js';
await init();
const filter = new ProfanityFilter();
console.log(filter.isBad('text')); // boolean
console.log(filter.censor('text', '*')); // string
</script>
Backend (Node.js)
const { ProfanityFilter } = require('badwords-wasm');
const filter = new ProfanityFilter();
filter.isBad('hello'); // false
filter.censor('bad word', '*'); // "*** word"
filter.addWords(['custom']);
Optional languages (npm)
Built-in: en and ru. Additional languages via @badwords/languages:
npm install badwords-wasm @badwords/languages
import init, { ProfanityFilter } from 'badwords-wasm';
import de from '@badwords/languages/de';
import ua from '@badwords/languages/ua';
await init();
const filter = new ProfanityFilter();
filter.addWords(de);
filter.addWords(ua);
Available: br, cz, da, de, du, en, fi, fr, gr, hu, in, it, ja, ko, lt, no, pl, po, ro, ru, sp, sw, th, tu, ua. See @badwords/languages.
Examples: examples/wasm/browser/, examples/wasm/node/
🔧 Building from source
Requires: Rust, Python, maturin
python -m venv .venv && source .venv/bin/activate # Linux/macOS
pip install maturin
make develop
# or: cd python && maturin build && pip install target/wheels/badwords_py-*.whl
🌐 WebAssembly (browser & Node.js)
Build the WASM package (requires wasm-pack):
cargo install wasm-pack
make wasm
Output: rust/badwords-wasm/pkg/ (npm package badwords-wasm)
- Browser: Use the generated JS with a bundler or static server. See
examples/wasm/browser/ - Node.js:
import init, { ProfanityFilter } from 'badwords-wasm'afternpm install. Seeexamples/wasm/node/ - Publish to npm:
make wasmormake wasm-nodejs, thenmake npm-publish - Optional languages:
@badwords/languages—make lang-packagesthenmake npm-publish-languages
📚 Documentation
Full documentation (Python, Rust, JavaScript) with examples and API reference: badwords.flacsy.dev (EN / RU).
🤝 Contributing
Contributions are what make the open-source community an amazing place to learn, inspire, and create.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📄 License
Distributed under the MIT License. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file badwords_py-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: badwords_py-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 531.6 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83e8980446e4bf86b360cbf9bdc0360f2a276517fd5265b79bec64e92ee94b9d
|
|
| MD5 |
cd055a57f620eb67fa6f77982e5654dd
|
|
| BLAKE2b-256 |
564d08b32b38b140b0f284ea3c809b9e4d409ae0eca1caf4110645ea1bf5fbcd
|
File details
Details for the file badwords_py-2.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: badwords_py-2.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 531.6 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d056c6e7703b73439727e4c30f3a620c87fb302460a22e69467849eb0c726dc6
|
|
| MD5 |
8ff734ae5015fe71b7dceb1caa6a5f53
|
|
| BLAKE2b-256 |
01db7c2a39a68c7f700d5ec297f13cc679facdb0621285d6296c09794681de39
|
File details
Details for the file badwords_py-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: badwords_py-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 531.6 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f182cedeeebb9f083c2b159ee902876b6d7ae3f845fd6e0e546056b2e85cb4d
|
|
| MD5 |
297ea12b7f7854b43d53c09cea1f4edf
|
|
| BLAKE2b-256 |
e0078afd7905269e1655fbe7f744c726916b867b17373a17f310283ed4f97d15
|
File details
Details for the file badwords_py-2.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: badwords_py-2.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 531.8 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f2c72c539bb8b70fe472a1bde76a7d299cd8fbcfa5433592f05124147693a80
|
|
| MD5 |
f4d609c425fffd3cf62c6820376779c4
|
|
| BLAKE2b-256 |
673c3054ed014ec8e5d904cd1802610c08d25267837017b17ddc7709e8c9a998
|
File details
Details for the file badwords_py-2.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: badwords_py-2.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 532.0 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffa91778648c958cf43830e44d85305933cecbd54abc044a2702e93532f61f8f
|
|
| MD5 |
1248da945126905a28bee36ccb7a9a36
|
|
| BLAKE2b-256 |
6f92a29bc2f0736c43523ed2822d1280f2ec501190a174c559221fda10d57ebe
|
Provenance
The following attestation bundles were made for badwords_py-2.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on FlacSy/BadWords
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
badwords_py-2.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
ffa91778648c958cf43830e44d85305933cecbd54abc044a2702e93532f61f8f - Sigstore transparency entry: 1018666154
- Sigstore integration time:
-
Permalink:
FlacSy/BadWords@ce4c51f6ea47b572e63896275ecadc4b7dcfb281 -
Branch / Tag:
refs/tags/v2.3.1 - Owner: https://github.com/FlacSy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ce4c51f6ea47b572e63896275ecadc4b7dcfb281 -
Trigger Event:
release
-
Statement type: