Skip to main content

Automatically remove profanity and toxic content from audio files using Whisper and Detoxify

Project description

PodAngelEX

PodAngelEX is a passion project of mine, specifically made for muting inappropriate words and segments in audio

How it works

PodAngel takes input files and transcribes them using OpenAI's Whisper model. Depending on your specs, you can choose multiple workers, that run asyncronously, to drastically improve how many files get 'cleaned' in a set amount of time.

Once there is a list of all words and sentences from the audio file, the program first compares each word with a list of swears. Once it finds all the swears, it 'makes note' of the start and end time of each bad word. Then, so as to not skew the context catching, it 'erases' those swear words from the transcription

Then, the 'new' transcription is passed to Detoxify. Detoxify reads a sentence and assigns multiple values to it, denoting how vulgar it is. If any of those values pass the user-set threshold, the segment is flagged for muting.

Once we have the words and segments to be muted, the start and end times for each are passed to FFMPEG, which cuts and concentates the audio file accordingly, resulting in a much more socially appropriate file

*Please note, I cannot promise absolute accuracy. Rarely, the program may miss one or two swears in my experience, using Whisper's Turbo model

What can be configured

I made PodAngel with the intent of being highly configurable. You can configure:

1. Worker amount. There is, functionally, no limit to the amount of workers you want active at one time. They still use VRAM though, so don't just set it to a hundred and let it run if you can't support that.

2. Toxicity thresholds. Each threshold from Detoxify can be increased or decreased in severity. The lower the number, the stricter the context catching. It's a float from 0.0 to 1.0. 1.0 will let everything through, and 0.0 will let about nothing through

3. File paths. You can set a file path if you'd like to move PodAngel's 'workspace'. This will move every file/folder that PodAngel relies on to that new path, so if you change it, maybe put it in a fresh folder. 

Install and How to Run

To install(I don't know how to do the fancy formatting), simply run the following command: pip install podangelex-JustAnotherCoderTheThird Then, to run it, just run this command in your terminal: podangel On the first run, it'll guide you through setting up the program, while also making all the necessary files/folders it will need. Then, after initialization, just put files in the input folder, run the program, wait a bit, and enjoy your clean audio.

License

CC0 1.0 Universal - Public Domain

AI Declaration

I used some AI to help debug the code, provide commit messages on Github, and to organize the files for package uploading

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

podangelex_justanothercoderthethird-1.0.4.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file podangelex_justanothercoderthethird-1.0.4.tar.gz.

File metadata

File hashes

Hashes for podangelex_justanothercoderthethird-1.0.4.tar.gz
Algorithm Hash digest
SHA256 a93e863fc4dbbd5df06c3efca385d9f1d958feb73b81d8bd28054a2cea1d71f2
MD5 47cb4622f9852c8b310970ac74e23c7d
BLAKE2b-256 d5884c60fc326faf19a04d5fc47d1f688aca6e986ead5de79400592a614a6f61

See more details on using hashes here.

File details

Details for the file podangelex_justanothercoderthethird-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for podangelex_justanothercoderthethird-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 87bf7356db8225e42f7e031a2b93a2e91d93f705d66a3968de5ab96636469860
MD5 5f4fc5ae06420a26bcb6cf2fe6bcd39e
BLAKE2b-256 0e1060fc7ddbbfc01037d1d38fe4ce5b2e466e7ccb4b4dd6bb8699ecfbf10716

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page