Skip to main content

Automatically remove profanity and toxic content from audio files using Whisper and Detoxify

Project description

PodangelEX

Automatically clean audio files by removing profanity and toxic content.

PodangelEX uses OpenAI's Whisper speech-to-text model combined with the Detoxify toxicity detector to automatically identify and remove profane, toxic, and harmful content from audio files.

Installation

Requirements: Python 3.8 or later, FFmpeg

pip install podangelex_JustAnotherCoderTheThird

System Requirements

  • FFmpeg: Install via:
    • macOS: brew install ffmpeg
    • Ubuntu/Debian: sudo apt-get install ffmpeg
    • Windows: Download from ffmpeg.org

Quick Start

  1. Install the package (see above)
  2. Run the setup wizard:
    podangel
    
  3. First run: The app will auto-create:
    • Configuration directory at ~/.podangelex/
    • Workspace at ~/podangelex_data/ with folders:
      • Input/ - Place audio files here
      • Output/ - Cleaned audio files appear here
      • .bridge/ - Temporary processing files
  4. Add audio files to the Input/ folder
  5. Run again: podangel to clean your files
  6. Get results from the Output/ folder

How It Works

Step 1: Transcription

Whisper transcribes your audio file to text with word-level timestamps.

Step 2: Toxicity Detection

Two-phase approach:

  • Exact matching: Checks transcribed words against a built-in profanity list
  • Context-aware: Uses machine learning to detect toxic phrases even if not on the word list

Step 3: Audio Cutting

FFmpeg extracts only the clean portions of audio and concatenates them.

Configuration

On first run, you'll be asked to configure:

Model Size

  • tiny (1GB) - Fastest, ~60% accuracy
  • base (1GB) - Fast, ~70% accuracy
  • small (2GB) - Balanced, ~75% accuracy (recommended)
  • medium (5GB) - Better, ~80% accuracy
  • large (10GB) - Best, ~85% accuracy
  • turbo (6GB) - Latest, ~80% accuracy

Workers

Number of parallel files to process. Use more workers if you have lots of VRAM and many files.

Thresholds

Fine-tune what gets flagged as toxic (0-1 scale, higher = stricter):

  • toxicity (t): General profanity
  • severe_toxicity (st): Severe language
  • obscene (o): Obscene content
  • threat (th): Threats
  • insult (i): Insults
  • identity_attack (id): Slurs/hate speech

Environment Variables

Optional customization:

  • PODANGELEX_HOME - Custom config directory (default: ~/.podangelex/)
  • PODANGELEX_WORKSPACE - Custom workspace location (default: ~/podangelex_data/)

Troubleshooting

"ffmpeg: command not found"

Install FFmpeg using the commands above.

"ModuleNotFoundError: No module named 'whisper'"

Reinstall the package: pip install --upgrade podangelex_JustAnotherCoderTheThird

Audio not being cleaned properly

Adjust toxicity thresholds by re-running podangel and selecting option (1) to reconfigure.

License

CC0 1.0 Universal - Public Domain

AI Declaration

I used some AI to help debug the code, provide commit messages on Github, and to organize the files for package uploading

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

podangelex_justanothercoderthethird-1.0.2.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file podangelex_justanothercoderthethird-1.0.2.tar.gz.

File metadata

File hashes

Hashes for podangelex_justanothercoderthethird-1.0.2.tar.gz
Algorithm Hash digest
SHA256 aae8408debb7b3f58795a008a6791bb0fb7ea5917f14f4fd7aad9b2d678f3bd1
MD5 d5c3e86edfceb30b3d1b2b3c1e2ee38a
BLAKE2b-256 bfe640e9ca509ecca162f6512bdf7e0866af46bbd7780b7ef34aea2a3bc8b5f9

See more details on using hashes here.

File details

Details for the file podangelex_justanothercoderthethird-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for podangelex_justanothercoderthethird-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 521d2d066355bf88e7f3108b492b58e150c0e8ac346ca06a624c86aa682425f6
MD5 6bbdc2656221d521be06db9571ecf423
BLAKE2b-256 3fba44245888cf631ea2aa2fa29ecdb41c1af52f16825b4db5a9a3b086095c71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page