Skip to main content

A small light-weight package to summarize transcript of an eligible YouTube Video. The video should have well formatted closed-captions to perform summarization by various techniques.

Project description

YouTube Transcript Summarizer: Python Package

Our code is available on GitHub Here. You can use it by installing over PIP !

This Package performs GET requests to our Flask Back-end server API for summarizing the transcripts.

More details about our backend can be read at our back-end repository here.

YouTube Video Transcript Summarization using PIP: Yes. You heard it right! This package is available on PyPi. Just read below to know how can you integrate it in your own project.
When ever you invoke the function of our package, it sends API calls to our Flask server, and then the server responds back with the summarized text response. Then you can further display the received result to the user.
As we make API calls to our back-end, this package needs an internet connection to received summarized transcript using requests library.

Requirements
  • Python >=3.5 (Users below 3.5 might still run it, but you may face some errors since the package was not tested below 3.5)
  • requests (Used to make API calls to our server)

Pre-requisite Knowledge: YouTube is an American free to use online video sharing and social media platform launched in February 2005. It is currently one of the biggest video platforms where its users watch more than 1 billion hours of videos every day.
Closed captions are the text derived from the video which are intended for adding more details (such as dialogues, speech translation, non-speech elements) for the viewer. They are widely used to understand video without understanding its audio.

Use case Scenario: YouTube has very large number of videos which has transcripts. Summarization would be especially helpful in the cases where videos are longer and different parts might have varying importance. In this sense, Summarization of the video might be useful in saving the viewer’s time. It will help in improving user productivity since they will focus only on the important text spoken in video.

Aim

This repository is part of our project, in which there is a back-end server using Flask Framework. The backend has also a browser based summarizer, but the package available in this repository depict how server-client server makes efficient use of our code!
When you install the package, and then invoke the function, it makes request only, and the back-end summarizes the transcript, and sends the response back in JSON Format.
This package returns a tuple which has a summary and a dictionary storing some insights about your request. Read below for more details.

Installation and Usage

You can go to your terminal, and before you start, make sure that pip is installed. Then, simply type:

pip install yt_trans_sum

Once the installation is complete, you can import and use the package like this:

from yt_trans_sum import YouTubeTranscriptSummarizer

if __name__ == "__main__":
    # Simplest Call Example
    my_summary, my_summary_insights = YouTubeTranscriptSummarizer().get_by_url('https://www.youtube.com/watch?v=zhUgaKb0s5A')
    print("My Summary:", my_summary) # String
    print("My Summary Insights: ", my_summary_insights) # Dictionary

Here, 'my_summary_insights` is a dictionary with key-value pairs with insight of your request. Below snippet can help you understand the values inside this dictionary.

# There are 4 values inside this dictionary for now. The snippet is self explanatory.
print("Characters in Transcript:", my_summary_insights['length_original'])
print("Sentences in Transcript:", my_summary_insights['sentence_original'])
print("Characters in Summary:", my_summary_insights['length_summary'])
print("Sentences in Transcript:", my_summary_insights['sentence_summary'])

More Examples of Usage

Since the backend requires video id, algorithm and a percentage to summarize the transcript, this package also takes these inputs.

Mandatory Field (Video ID/URL)
  • id : Video ID of the YouTube Video. Each video has its own unique ID in its URL. For example, 9No-FiEInLA is the Video ID in https​://www​.youtube​.com/watch?v=9No-FiEInLA..
    You can give video id directly using get_by_id() or give your complete URL by get_by_url() method as shown in the sample snippets.

Optional Field(s)

  • choice : Algorithm Choice for the summarizing the Transcript. There are only six accepted values in this variable.
    These choices are written along with algorithm names as follows:
    • gensim-sum : Text Rank Algorithm Based using Gensim
    • spacy-sum : Frequency Based Approach using Spacy.
    • nltk-sum : Frequency Based Summarization using NLTK.
    • sumy-lsa-sum : Latent Semantic Analysis Based using Sumy.
    • sumy-luhn-sum : Luhn Algorithm Based using Sumy.
    • sumy-text-rank-sum : Text Rank Algorithm Based using Sumy.
  • percent : The percentage is used to present the summary in approx. X% lines of the available transcript. Values between 20 and 30 give better results.

NOTE: By default, Algorithm selected is gensim-sum and percentage is 20. You can change these values like below.

  1. Print logs while we request the summary. While debug_logs is True, the package prints the current status of the request as well.
from yt_trans_sum import YouTubeTranscriptSummarizer

if __name__ == "__main__":
    # Debug Logs turned on.
    my_summary, my_summary_insights = YouTubeTranscriptSummarizer(debug_logs=True).get_by_url('https://www.youtube.com/watch?v=zhUgaKb0s5A')
    print("My Summary:", my_summary)
    print("My Summary Insights: ", my_summary_insights)
  1. Change percentage and algorithm for the summary request:
from yt_trans_sum import YouTubeTranscriptSummarizer

if __name__ == "__main__":
    # Full control of arguments
    my_summary, my_summary_insights = YouTubeTranscriptSummarizer().get_by_url(video_url='https://www.youtube.com/watch?v=zhUgaKb0s5A', percent=10, choice='sumy-lsa-sum')
    print("My Summary:", my_summary)
    print("My Summary Insights: ", my_summary_insights)
  1. Summarization request by using video ID instead of video URL
from yt_trans_sum import YouTubeTranscriptSummarizer

if __name__ == "__main__":
    # get_by_id() is called instead of get_by_url()
    my_summary, my_summary_insights = YouTubeTranscriptSummarizer().get_by_id(video_id='zhUgaKb0s5A', percent=10, choice='sumy-lsa-sum')
    print("My Summary:", my_summary)
    print("My Summary Insights: ", my_summary_insights)

More information about the backend

You can click here to know more about how backend receives and sends data.

Feel free to improve this package or even our back-end, add comments and ask any queries if you have any.

The back-end uses an undocumented part of the YouTube API, which is called by the YouTube web-client. So there is no guarantee that it would stop working tomorrow, if they change how things work. In case that happens, I will do my best to make things work again as soon as possible if that happens. So if it stops working, let me know!
This is not an official package from YouTube. I have built this package for my final year project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_trans_sum-1.0.3.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_trans_sum-1.0.3-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file yt_trans_sum-1.0.3.tar.gz.

File metadata

  • Download URL: yt_trans_sum-1.0.3.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for yt_trans_sum-1.0.3.tar.gz
Algorithm Hash digest
SHA256 77b3c5f63d75aa288c2cbaeddb0b9d6a9325617d8ea964dfbd6ec50444514a19
MD5 43b37edd1ac3eaa8d5e3a26cf74ff6f8
BLAKE2b-256 52c819f14b01f39ab92b9313087d38274f3faafdd3fe7f0365ec14a41eb64946

See more details on using hashes here.

File details

Details for the file yt_trans_sum-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: yt_trans_sum-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for yt_trans_sum-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a66e58a6b1ef2dead185ac54a184cd83691b0e3a8e06e4c83fc26c1bfa62d737
MD5 08057f35041304a1756e56b60e0350c9
BLAKE2b-256 9038e55c8f09aa3bc364d71e5272fc4f9be724dc3f95bb9e63dfec6a1d0d437e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page