Skip to main content

Fast profanity filtering tool for multiple languages

Project description

🤔 why safetext?

Fast profanity detection and filtering for 13 languages.

  • Multi-format Detection: Single words, phrases, and contextual profanity
  • Custom Word Lists: Extend built-in lists with your own profanity words
  • Whitelisting: Exclude specific words from detection
  • Auto Language Detection: From text or subtitle files
  • Precise Filtering: Exact position tracking and custom censoring
  • Simple Integration: One-line setup with clean API

📦 installation

easily install safetext with pip:

pip install safetext

for development setup, see our scripts documentation.

🎯 quickstart

check and censor profanity

>>> from safetext import SafeText

>>> st = SafeText(language='en')

>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]

>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."

extending profanity lists with custom words

Add your own profanity words by providing a custom words directory:

# Directory structure:
# custom_profanity_words/
# ├── en.txt              # English custom words
# ├── tr.txt              # Turkish custom words
# └── es.txt              # Spanish custom words

>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')

>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]

Custom word files should contain one word/phrase per line:

# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term

using whitelist

exclude specific words from profanity detection:

# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])

# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')

# Combining custom words with whitelist
>>> st = SafeText(
...     language='en', 
...     custom_words_dir='custom_profanity_words',
...     whitelist=['allowedcustomword']
... )

automated language detection

  • from text:
>>> from safetext import SafeText

>>> eng_text = "This story is about to take a dark turn."

>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)

>>> st.language
'en'
  • from .srt (subtitle) file:
>>> from safetext import SafeText

>>> turkish_srt_file_path = "turkish.srt"

>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)

>>> st.language
'tr'

🌍 supported languages

safetext currently supports profanity detection in 13 languages:

Language ISO 639-1 Code Language Name
🇸🇦 ar Arabic
🇦🇿 az Azerbaijani
🇩🇪 de German
🇬🇧 en English
🇪🇸 es Spanish
🇮🇷 fa Persian (Farsi)
🇫🇷 fr French
🇮🇳 hi Hindi
🇯🇵 ja Japanese
🇵🇹 pt Portuguese
🇷🇺 ru Russian
🇹🇷 tr Turkish
🇨🇳 zh Chinese

🤝 contribute to safetext

join our mission in refining content moderation!

contribute by:

  • adding new languages: create a folder with the ISO 639-1 code and include a words.txt.
  • enhancing word lists: improve detection accuracy.
  • sharing feedback: your ideas can shape safetext.

see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.


🏆 contributors

meet our awesome contributors who make safetext better every day!


follow us for more!

LinkedInHugging FaceX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safetext-0.3.3.tar.gz (91.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safetext-0.3.3-py3-none-any.whl (82.5 kB view details)

Uploaded Python 3

File details

Details for the file safetext-0.3.3.tar.gz.

File metadata

  • Download URL: safetext-0.3.3.tar.gz
  • Upload date:
  • Size: 91.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for safetext-0.3.3.tar.gz
Algorithm Hash digest
SHA256 cb7029be79f443c51fa668f139e1947d19d16f8d56dbbd959e8e166f487b9ee9
MD5 751e7e9a3c52aa98755fed0337e5373d
BLAKE2b-256 a4b587b6a597c3fb63a469d9333ad5753c04eab187368f392a07578915a6fd09

See more details on using hashes here.

File details

Details for the file safetext-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: safetext-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 82.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for safetext-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3612329352a23da30ccfcf0e497d9f57bd3a88db983e5d42d3eb7e8b16ccf0fb
MD5 e1f867e69811686a155d11762a3c8c29
BLAKE2b-256 39a5c1781cabec361044f4ef5058d9605099518d333f5df233deeb219faad209

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page