Check your audio for profanity on steroids.
Project description
Audio profanity is a big headache and while it seems small but if you're working with age sensitive research or projects. You would want to rate your audio and know more about it. that's where this simple project come into play. it uses OpenAI whisper model to segment audio and let you know before it becomes a headache.
What can be done?
Honestly, tons! For starters I have written a simple substring based matching algorithm that can match and compare from a list of curse words released by CMU (Carnegie Mellon). Find more info: https://www.cs.cmu.edu/~biglou/resources/bad-words.txt
- Segment the audio and extract the transcriptions (not intended rather a byproduct)
- Extract wordlist of your audio
- and then do matching
I have more ideas in mind and gonna maintain this like a dedicated religion. Because I have a newfound interest in audio segmentation.
How it works?
Good question!
from audiocencesored import *
# this func transcribes your audio. I didn't harcode file-name
transcribe_timestamps(audio_file, output_file)
# extracting the words from transcript
extract_words(json_file, output_file)
# let's download the CMU list
download_list(output_file="bad_words.txt")
# checking the score
check_profanity(word_list_file, bad_words_file, rating="R")
Anything to keep in mind?
Certainly! Have your audio files in .wav format.
Disclaimer: It's meant to be fun-project while providing support and feature is suppose to be religion for me. Drop a hi, on github if you have some features in mind. https://github.com/sleepingcat4/audio-profanity
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audiocencesored-0.2.tar.gz.
File metadata
- Download URL: audiocencesored-0.2.tar.gz
- Upload date:
- Size: 3.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b76455706ade85117466dec9367a7162ac922bcdaa9f122b2a1cb22bd6704af9
|
|
| MD5 |
bbb06386517ea043d062c5f3bcb8fcf4
|
|
| BLAKE2b-256 |
df417f297ddd0698a9a3e22c7de62f5c0a7acf676d749a73418af33f096f8a77
|
File details
Details for the file audiocencesored-0.2-py3-none-any.whl.
File metadata
- Download URL: audiocencesored-0.2-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9dff77a8c78e804cad01d6a5c06722ff384bb9a1fb5dd1a7730aa4789805fc3
|
|
| MD5 |
95f2a667f403c6ec37fb13355051ca4d
|
|
| BLAKE2b-256 |
5d431afe967fc226f26aa87c4c9a0c4f54dbfc2ac96d802b6633a50a630f3604
|