Skip to main content

Automatically remove profanity and toxic content from audio files using Whisper and Detoxify

Project description

PodAngelEX

PodAngelEX is a passion project of mine, specifically made for muting inappropriate words and segments in audio

How it works

PodAngel takes input files and transcribes them using OpenAI's Whisper model. Depending on your specs, you can choose multiple workers, that run asyncronously, to drastically improve how many files get 'cleaned' in a set amount of time.

Once there is a list of all words and sentences from the audio file, the program first compares each word with a list of swears. Once it finds all the swears, it 'makes note' of the start and end time of each bad word. Then, so as to not skew the context catching, it 'erases' those swear words from the transcription

Then, the 'new' transcription is passed to Detoxify. Detoxify reads a sentence and assigns multiple values to it, denoting how vulgar it is. If any of those values pass the user-set threshold, the segment is flagged for muting.

Once we have the words and segments to be muted, the start and end times for each are passed to FFMPEG, which cuts and concentates the audio file accordingly, resulting in a much more socially appropriate file

*Please note, I cannot promise absolute accuracy. Rarely, the program may miss one or two swears in my experience, using Whisper's Turbo model

What can be configured

I made PodAngel with the intent of being highly configurable. You can configure:

1. Worker amount. There is, functionally, no limit to the amount of workers you want active at one time. They still use VRAM though, so don't just set it to a hundred and let it run if you can't support that.

2. Toxicity thresholds. Each threshold from Detoxify can be increased or decreased in severity. The lower the number, the stricter the context catching. It's a float from 0.0 to 1.0. 1.0 will let everything through, and 0.0 will let about nothing through

3. File paths. You can set a file path if you'd like to move PodAngel's 'workspace'. This will move every file/folder that PodAngel relies on to that new path, so if you change it, maybe put it in a fresh folder. 

Install and How to Run

To install(I don't know how to do the fancy formatting), simply run the following command:

1. pip install podangelex-JustAnotherCoderTheThird

Then, to run it, just run this command in your terminal:

2. podangel

On the first run, it'll guide you through setting up the program, while also making all the necessary files/folders it will need. Then, after initialization, just put files in the input folder, run the program, wait a bit, and enjoy your clean audio.

License

CC0 1.0 Universal - Public Domain

AI Declaration

I used some AI to help debug the code, provide commit messages on Github, and to organize the files for package uploading

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

podangelex_justanothercoderthethird-1.0.6.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file podangelex_justanothercoderthethird-1.0.6.tar.gz.

File metadata

File hashes

Hashes for podangelex_justanothercoderthethird-1.0.6.tar.gz
Algorithm Hash digest
SHA256 b2897449271cfcf48b6c4557695539a01df90fc2f7dd6ac214579fbcbc6783c6
MD5 8f0f83f2986dcd745981620ef9b8b599
BLAKE2b-256 00eb2b274bb2aa222b42988d568f4ee562d7e7e8756312748744cd2587134ea0

See more details on using hashes here.

File details

Details for the file podangelex_justanothercoderthethird-1.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for podangelex_justanothercoderthethird-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 33ef8a82edb33c15f121eaab9a8a59aa2ddb254f63d6976118b9e8144ac7709d
MD5 b5e6f882129f6499b01622ebe421847f
BLAKE2b-256 f58622a3eb7ca7c3f22bc252a32193044c88ddd4f1749bfeb1fde701c5a16b08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page