Automatically remove profanity and toxic content from audio files using Whisper and Detoxify
Project description
PodangelEX
Automatically clean audio files by removing profanity and toxic content.
PodangelEX uses OpenAI's Whisper speech-to-text model combined with the Detoxify toxicity detector to automatically identify and remove profane, toxic, and harmful content from audio files.
Installation
Requirements: Python 3.8 or later, FFmpeg
pip install podangelex_JustAnotherCoderTheThird
System Requirements
- FFmpeg: Install via:
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt-get install ffmpeg - Windows: Download from ffmpeg.org
- macOS:
Quick Start
- Install the package (see above)
- Run the setup wizard:
podangel
- First run: The app will auto-create:
- Configuration directory at
~/.podangelex/ - Workspace at
~/podangelex_data/with folders:Input/- Place audio files hereOutput/- Cleaned audio files appear here.bridge/- Temporary processing files
- Configuration directory at
- Add audio files to the
Input/folder - Run again:
podangelto clean your files - Get results from the
Output/folder
How It Works
Step 1: Transcription
Whisper transcribes your audio file to text with word-level timestamps.
Step 2: Toxicity Detection
Two-phase approach:
- Exact matching: Checks transcribed words against a built-in profanity list
- Context-aware: Uses machine learning to detect toxic phrases even if not on the word list
Step 3: Audio Cutting
FFmpeg extracts only the clean portions of audio and concatenates them.
Configuration
On first run, you'll be asked to configure:
Model Size
- tiny (1GB) - Fastest, ~60% accuracy
- base (1GB) - Fast, ~70% accuracy
- small (2GB) - Balanced, ~75% accuracy (recommended)
- medium (5GB) - Better, ~80% accuracy
- large (10GB) - Best, ~85% accuracy
- turbo (6GB) - Latest, ~80% accuracy
Workers
Number of parallel files to process. Use more workers if you have lots of VRAM and many files.
Thresholds
Fine-tune what gets flagged as toxic (0-1 scale, higher = stricter):
- toxicity (t): General profanity
- severe_toxicity (st): Severe language
- obscene (o): Obscene content
- threat (th): Threats
- insult (i): Insults
- identity_attack (id): Slurs/hate speech
Environment Variables
Optional customization:
PODANGELEX_HOME- Custom config directory (default:~/.podangelex/)PODANGELEX_WORKSPACE- Custom workspace location (default:~/podangelex_data/)
Troubleshooting
"ffmpeg: command not found"
Install FFmpeg using the commands above.
"ModuleNotFoundError: No module named 'whisper'"
Reinstall the package: pip install --upgrade podangelex_JustAnotherCoderTheThird
Audio not being cleaned properly
Adjust toxicity thresholds by re-running podangel and selecting option (1) to reconfigure.
License
CC0 1.0 Universal - Public Domain
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file podangelex_justanothercoderthethird-1.0.1.tar.gz.
File metadata
- Download URL: podangelex_justanothercoderthethird-1.0.1.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b234128229b5309dbdd3e9c515404bc63ff1909e2aaf8e65b69efcd8fb8ef841
|
|
| MD5 |
ca3d5193d6e1d560bbe9b220c1b98efe
|
|
| BLAKE2b-256 |
20115fb83454979f7a79a7af898b253f8c1422409e39d63835c77b761096c4f2
|
File details
Details for the file podangelex_justanothercoderthethird-1.0.1-py3-none-any.whl.
File metadata
- Download URL: podangelex_justanothercoderthethird-1.0.1-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cae17e13f6dfb81a447f6b9f4b04ef4dfc9c0713dd3469c8808b770659cfe047
|
|
| MD5 |
79aaf18bed917367cd52b5a695f68907
|
|
| BLAKE2b-256 |
6b24fc1f262ff3365e5ea45ae2759236d5141dc6aa288f383c36cd5b6214d94c
|