Fast profanity filtering tool for multiple languages
Project description
🤔 why safetext?
Fast profanity detection and filtering for 13 languages.
- Multi-format Detection: Single words, phrases, and contextual profanity
- Custom Word Lists: Extend built-in lists with your own profanity words
- Whitelisting: Exclude specific words from detection
- Auto Language Detection: From text or subtitle files
- Precise Filtering: Exact position tracking and custom censoring
- Simple Integration: One-line setup with clean API
📦 installation
easily install safetext with pip:
pip install safetext
for development setup, see our scripts documentation.
🎯 quickstart
check and censor profanity
>>> from safetext import SafeText
>>> st = SafeText(language='en')
>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]
>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."
extending profanity lists with custom words
Add your own profanity words by providing a custom words directory:
# Directory structure:
# custom_profanity_words/
# ├── en.txt # English custom words
# ├── tr.txt # Turkish custom words
# └── es.txt # Spanish custom words
>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')
>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]
Custom word files should contain one word/phrase per line:
# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term
using whitelist
exclude specific words from profanity detection:
# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])
# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')
# Combining custom words with whitelist
>>> st = SafeText(
... language='en',
... custom_words_dir='custom_profanity_words',
... whitelist=['allowedcustomword']
... )
automated language detection
- from text:
>>> from safetext import SafeText
>>> eng_text = "This story is about to take a dark turn."
>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)
>>> st.language
'en'
- from .srt (subtitle) file:
>>> from safetext import SafeText
>>> turkish_srt_file_path = "turkish.srt"
>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)
>>> st.language
'tr'
🌍 supported languages
safetext currently supports profanity detection in 13 languages:
| Language | ISO 639-1 Code | Language Name |
|---|---|---|
| 🇸🇦 | ar |
Arabic |
| 🇦🇿 | az |
Azerbaijani |
| 🇩🇪 | de |
German |
| 🇬🇧 | en |
English |
| 🇪🇸 | es |
Spanish |
| 🇮🇷 | fa |
Persian (Farsi) |
| 🇫🇷 | fr |
French |
| 🇮🇳 | hi |
Hindi |
| 🇯🇵 | ja |
Japanese |
| 🇵🇹 | pt |
Portuguese |
| 🇷🇺 | ru |
Russian |
| 🇹🇷 | tr |
Turkish |
| 🇨🇳 | zh |
Chinese |
🤝 contribute to safetext
join our mission in refining content moderation!
contribute by:
- adding new languages: create a folder with the ISO 639-1 code and include a
words.txt. - enhancing word lists: improve detection accuracy.
- sharing feedback: your ideas can shape
safetext.
see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.
🏆 contributors
meet our awesome contributors who make safetext better every day!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file safetext-0.3.3.tar.gz.
File metadata
- Download URL: safetext-0.3.3.tar.gz
- Upload date:
- Size: 91.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb7029be79f443c51fa668f139e1947d19d16f8d56dbbd959e8e166f487b9ee9
|
|
| MD5 |
751e7e9a3c52aa98755fed0337e5373d
|
|
| BLAKE2b-256 |
a4b587b6a597c3fb63a469d9333ad5753c04eab187368f392a07578915a6fd09
|
File details
Details for the file safetext-0.3.3-py3-none-any.whl.
File metadata
- Download URL: safetext-0.3.3-py3-none-any.whl
- Upload date:
- Size: 82.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3612329352a23da30ccfcf0e497d9f57bd3a88db983e5d42d3eb7e8b16ccf0fb
|
|
| MD5 |
e1f867e69811686a155d11762a3c8c29
|
|
| BLAKE2b-256 |
39a5c1781cabec361044f4ef5058d9605099518d333f5df233deeb219faad209
|