This is a python API which allows you to check for swear words in a youtube video, srt file, text file, custom source with multi language support. There are additional features like getting youtube transcript of a video, srt parser etc.
Project description
Profanity Police
This is a python API which allows you to check for swear words in a youtube video, srt file, text file, custom source with multi language support. There are additional features like getting youtube transcript of a video, srt parser etc.
Install
Install package using pip
pip install profanity-police
If you want to use it from source, you'll have to install the dependencies manually:
pip install -r requirements.txt
API
Youtube
Basic implementation to check for swear words in a youtube video in a particular language
from profanity_police.transcript_checker import TranscriptChecker
checker = TranscriptChecker()
print(checker.check_transcript(source = "youtube", url = "https://www.youtube.com/watch?v=Vev2ybF2Z6g", language_code = "en"))
# video id can be passed directly instead of url if needed
# print(checker.check_transcript(source = "youtube", video_id = "Vev2ybF2Z6g", language_code = "hi"))
This would print something like this:-
[
{
"text":"The whole fucking bruhaha was happening..",
"start":298.91,
"duration":2.06,
"found":[
"fucking"
]
},
{
"text":"What the fuck is happening?",
"start":330.99,
"duration":0.91,
"found":[
"fuck"
]
},
{
"text":"Shit scripts how do you say it's shit?",
"start":1218.77,
"duration":1.63,
"found":[
"shit"
]
}
]
The duration is depicted in seconds.
Get youtube transcript for a video
from profanity_police.youtube import YoutubeTranscript
y_transcript = YoutubeTranscript(url = "https://www.youtube.com/watch?v=Vev2ybF2Z6g")
# Get the original transcripts available for a video
y_transcript.get_original_languages()
# Get the languages to which the video can be translated to
y_transcript.get_translation_languages()
transcript_en = y_transcript.get_transcript(language_code = "en-GB")
transcript_fr = y_transcript.get_transcript(language_code = "fr")
transcript_hi = y_transcript.get_transcript(language_code = "hi")
Custom File
SRT file
from profanity_police.transcript_checker import TranscriptChecker
checker = TranscriptChecker()
swear_phrases = checker.check_transcript(source = "file", file_path = "sample_srt_files/panchayat_episode_6.srt", file_type = "srt", language_code = "en")
print(swear_phrases)
Text file
from profanity_police.transcript_checker import TranscriptChecker
checker = TranscriptChecker()
swear_phrases = checker.check_transcript(source = "file", file_path = "y", file_type = "txt", language_code = "en")
print(swear_phrases)
Additional APIs
Custom checker
from profanity_police.checker import Checker
checker = Checker()
transcript = [{"text": "What is your name?"}, {"text": "shut the fuck up"}]
language_code = "en"
# `transcript` needs to be a list of dictionaries with one mandatory key - `text`
swear_words_in_transcript = checker.check_swear_word(transcript, language_code)
SRT text Extractor
from profanity_police.srt_extractor import SrtExtractor
file_path = "sample_srt_files/panchayat_episode_3.srt"
transcript = SrtExtractor().extract_text(file_path)
"""
transcript is a list of dictionary with the below format
[
{"text": "what is your name?", "start": 10, "end": 12}
]
start and end are in seconds
"""
Languages
For swear word checker
Name | Code |
---|---|
English | en |
French | fr |
Hindi | hi |
Italian | it |
Korean | ko |
Portuguese | pt |
Russian | ru |
Spanish | es |
For youtube translation:- All languages supported by youtube.
If you liked it and it was helpful, then
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file profanity_police-1.0.0.tar.gz
.
File metadata
- Download URL: profanity_police-1.0.0.tar.gz
- Upload date:
- Size: 34.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e92a33dc11677ab33da16f834c9faa126b7d1bc51985a2263153798a6e8cb3c0 |
|
MD5 | 0265e1b487ecf7f53943fba93a70b588 |
|
BLAKE2b-256 | 02abc33745d510d390ea1afa80de700cb42323ab059f9aee254eecb38d064385 |