Automatically remove profanity and toxic content from audio files using Whisper and Detoxify
Project description
PodAngelEX
PodAngelEX is a passion project of mine, specifically made for muting inappropriate words and segments in audio
How it works
PodAngel takes input files and transcribes them using OpenAI's Whisper model. Depending on your specs, you can choose multiple workers, that run asyncronously, to drastically improve how many files get 'cleaned' in a set amount of time.
Once there is a list of all words and sentences from the audio file, the program first compares each word with a list of swears. Once it finds all the swears, it 'makes note' of the start and end time of each bad word. Then, so as to not skew the context catching, it 'erases' those swear words from the transcription
Then, the 'new' transcription is passed to Detoxify. Detoxify reads a sentence and assigns multiple values to it, denoting how vulgar it is. If any of those values pass the user-set threshold, the segment is flagged for muting.
Once we have the words and segments to be muted, the start and end times for each are passed to FFMPEG, which cuts and concentates the audio file accordingly, resulting in a much more socially appropriate file
*Please note, I cannot promise absolute accuracy. Rarely, the program may miss one or two swears in my experience, using Whisper's Turbo model
What can be configured
I made PodAngel with the intent of being highly configurable. You can configure:
1. Worker amount. There is, functionally, no limit to the amount of workers you want active at one time. They still use VRAM though, so don't just set it to a hundred and let it run if you can't support that.
2. Toxicity thresholds. Each threshold from Detoxify can be increased or decreased in severity. The lower the number, the stricter the context catching. It's a float from 0.0 to 1.0. 1.0 will let everything through, and 0.0 will let about nothing through
3. File paths. You can set a file path if you'd like to move PodAngel's 'workspace'. This will move every file/folder that PodAngel relies on to that new path, so if you change it, maybe put it in a fresh folder.
Install and How to Run
To install(I don't know how to do the fancy formatting), simply run the following command: pip install podangelex-JustAnotherCoderTheThird Then, to run it, just run this command in your terminal: podangel On the first run, it'll guide you through setting up the program, while also making all the necessary files/folders it will need. Then, after initialization, just put files in the input folder, run the program, wait a bit, and enjoy your clean audio.
License
CC0 1.0 Universal - Public Domain
AI Declaration
I used some AI to help debug the code, provide commit messages on Github, and to organize the files for package uploading
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file podangelex_justanothercoderthethird-1.0.4.tar.gz.
File metadata
- Download URL: podangelex_justanothercoderthethird-1.0.4.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a93e863fc4dbbd5df06c3efca385d9f1d958feb73b81d8bd28054a2cea1d71f2
|
|
| MD5 |
47cb4622f9852c8b310970ac74e23c7d
|
|
| BLAKE2b-256 |
d5884c60fc326faf19a04d5fc47d1f688aca6e986ead5de79400592a614a6f61
|
File details
Details for the file podangelex_justanothercoderthethird-1.0.4-py3-none-any.whl.
File metadata
- Download URL: podangelex_justanothercoderthethird-1.0.4-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87bf7356db8225e42f7e031a2b93a2e91d93f705d66a3968de5ab96636469860
|
|
| MD5 |
5f4fc5ae06420a26bcb6cf2fe6bcd39e
|
|
| BLAKE2b-256 |
0e1060fc7ddbbfc01037d1d38fe4ce5b2e466e7ccb4b4dd6bb8699ecfbf10716
|