Skip to main content

Check your audio for profanity on steroids.

Project description

Audio profanity is a big headache and while it seems small but if you're working with age sensitive research or projects. You would want to rate your audio and know more about it. that's where this simple project come into play. it uses OpenAI whisper model to segment audio and let you know before it becomes a headache.

What can be done?

Honestly, tons! For starters I have written a simple substring based matching algorithm that can match and compare from a list of curse words released by CMU (Carnegie Mellon). Find more info: https://www.cs.cmu.edu/~biglou/resources/bad-words.txt

  1. Segment the audio and extract the transcriptions (not intended rather a byproduct)
  2. Extract wordlist of your audio
  3. and then do matching

I have more ideas in mind and gonna maintain this like a dedicated religion. Because I have a newfound interest in audio segmentation.

How it works?

Good question!

from audiocencesored import *

# this func transcribes your audio. I didn't harcode file-name
transcribe_timestamps(audio_file, output_file)

# extracting the words from transcript
extract_words(json_file, output_file)

# let's download the CMU list
download_list(output_file="bad_words.txt")

# checking the score
check_profanity(word_list_file, bad_words_file, rating="R")

Anything to keep in mind?

Certainly! Have your audio files in .wav format.

Disclaimer: It's meant to be fun-project while providing support and feature is suppose to be religion for me. Drop a hi, on github if you have some features in mind. https://github.com/sleepingcat4/audio-profanity

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiocencesored-0.2.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audiocencesored-0.2-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file audiocencesored-0.2.tar.gz.

File metadata

  • Download URL: audiocencesored-0.2.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.8

File hashes

Hashes for audiocencesored-0.2.tar.gz
Algorithm Hash digest
SHA256 b76455706ade85117466dec9367a7162ac922bcdaa9f122b2a1cb22bd6704af9
MD5 bbb06386517ea043d062c5f3bcb8fcf4
BLAKE2b-256 df417f297ddd0698a9a3e22c7de62f5c0a7acf676d749a73418af33f096f8a77

See more details on using hashes here.

File details

Details for the file audiocencesored-0.2-py3-none-any.whl.

File metadata

  • Download URL: audiocencesored-0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.8

File hashes

Hashes for audiocencesored-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e9dff77a8c78e804cad01d6a5c06722ff384bb9a1fb5dd1a7730aa4789805fc3
MD5 95f2a667f403c6ec37fb13355051ca4d
BLAKE2b-256 5d431afe967fc226f26aa87c4c9a0c4f54dbfc2ac96d802b6633a50a630f3604

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page