Skip to main content

A simple Youtube video transcriber and summarizer

Project description

#Audiutor

This project has been developed for the Computer Programming course at the University of Bolzano, Ed Faculty. It allows you to automatically transcribe and summarize a Youtube video in English using Watson IBM speech to text API.

Youtube is full of interesting lectures and educational videos, but not everyone has time to watch them. This little program was developed with one goal in mind -- reduce time and allow students and working people to retrieve the information contained in such educational videos, faster. At the moment, the program is available for English videos only and it works with larger files as well.

Before running this program you will need to create your own Watson Speech to text API key and URL, it will literally take 2 minutes of your time. You can do so by (last update June 2021):

  1. Registering for free here: https://cloud.ibm.com/registration

  2. Creating your Watson Speech to text API key and URL. Just hit the CREATE button (in the right-hand corner below) on this page: https://cloud.ibm.com/catalog/services/speech-to-text'

Call the function:

  • audiutor() to run the program
  • transcript_wordcloud() to create a wordcloud
  • kws_extraction() to extract keywords

In order for the program to function correctly, please do not change the filenames.

Used dependencies:

from pytube import YouTube import os import moviepy.editor as mp from ibm_watson import SpeechToTextV1 from ibm_watson.websocket import RecognizeCallback, AudioSource from ibm_cloud_sdk_core.authenticators import IAMAuthenticator import subprocess import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize nltk.download('stopwords') nltk.download('punkt') from wordcloud import WordCloud from string import punctuation import re import string from wordcloud import WordCloud import matplotlib.pyplot as plt from gensim.summarization import summarize from random import randrange from time import sleep import pprint import spacy import textacy.ke from textacy import * from docx import Document import sys

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Audiutor-PatBe-0.0.5.tar.gz (3.0 kB view hashes)

Uploaded Source

Built Distribution

Audiutor_PatBe-0.0.5-py3-none-any.whl (3.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page