Skip to main content

Automatically remove profanity and toxic content from audio files using Whisper and Detoxify

Project description

PodangelEX

Automatically clean audio files by removing profanity and toxic content.

PodangelEX uses OpenAI's Whisper speech-to-text model combined with the Detoxify toxicity detector to automatically identify and remove profane, toxic, and harmful content from audio files.

Installation

Requirements: Python 3.8 or later, FFmpeg

pip install podangelex_JustAnotherCoderTheThird

System Requirements

  • FFmpeg: Install via:
    • macOS: brew install ffmpeg
    • Ubuntu/Debian: sudo apt-get install ffmpeg
    • Windows: Download from ffmpeg.org

Quick Start

  1. Install the package (see above)
  2. Run the setup wizard:
    podangel
    
  3. First run: The app will auto-create:
    • Configuration directory at ~/.podangelex/
    • Workspace at ~/podangelex_data/ with folders:
      • Input/ - Place audio files here
      • Output/ - Cleaned audio files appear here
      • .bridge/ - Temporary processing files
  4. Add audio files to the Input/ folder
  5. Run again: podangel to clean your files
  6. Get results from the Output/ folder

How It Works

Step 1: Transcription

Whisper transcribes your audio file to text with word-level timestamps.

Step 2: Toxicity Detection

Two-phase approach:

  • Exact matching: Checks transcribed words against a built-in profanity list
  • Context-aware: Uses machine learning to detect toxic phrases even if not on the word list

Step 3: Audio Cutting

FFmpeg extracts only the clean portions of audio and concatenates them.

Configuration

On first run, you'll be asked to configure:

Model Size

  • tiny (1GB) - Fastest, ~60% accuracy
  • base (1GB) - Fast, ~70% accuracy
  • small (2GB) - Balanced, ~75% accuracy (recommended)
  • medium (5GB) - Better, ~80% accuracy
  • large (10GB) - Best, ~85% accuracy
  • turbo (6GB) - Latest, ~80% accuracy

Workers

Number of parallel files to process. Use more workers if you have lots of VRAM and many files.

Thresholds

Fine-tune what gets flagged as toxic (0-1 scale, higher = stricter):

  • toxicity (t): General profanity
  • severe_toxicity (st): Severe language
  • obscene (o): Obscene content
  • threat (th): Threats
  • insult (i): Insults
  • identity_attack (id): Slurs/hate speech

Environment Variables

Optional customization:

  • PODANGELEX_HOME - Custom config directory (default: ~/.podangelex/)
  • PODANGELEX_WORKSPACE - Custom workspace location (default: ~/podangelex_data/)

Troubleshooting

"ffmpeg: command not found"

Install FFmpeg using the commands above.

"ModuleNotFoundError: No module named 'whisper'"

Reinstall the package: pip install --upgrade podangelex_JustAnotherCoderTheThird

Audio not being cleaned properly

Adjust toxicity thresholds by re-running podangel and selecting option (1) to reconfigure.

License

CC0 1.0 Universal - Public Domain

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

podangelex_justanothercoderthethird-1.0.1.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file podangelex_justanothercoderthethird-1.0.1.tar.gz.

File metadata

File hashes

Hashes for podangelex_justanothercoderthethird-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b234128229b5309dbdd3e9c515404bc63ff1909e2aaf8e65b69efcd8fb8ef841
MD5 ca3d5193d6e1d560bbe9b220c1b98efe
BLAKE2b-256 20115fb83454979f7a79a7af898b253f8c1422409e39d63835c77b761096c4f2

See more details on using hashes here.

File details

Details for the file podangelex_justanothercoderthethird-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for podangelex_justanothercoderthethird-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cae17e13f6dfb81a447f6b9f4b04ef4dfc9c0713dd3469c8808b770659cfe047
MD5 79aaf18bed917367cd52b5a695f68907
BLAKE2b-256 6b24fc1f262ff3365e5ea45ae2759236d5141dc6aa288f383c36cd5b6214d94c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page