Skip to main content

Python script for extracting and cleaning YouTube video transcripts for Pre-Processing in machine learning.

Project description

Tube-Data: YouTube Video Transcript Extractor

Tube-Data is a Python script designed for extracting and cleaning YouTube video transcripts for preprocessing in machine learning. This versatile tool streamlines the process of acquiring high-quality text data from YouTube videos, making it ideal for various natural language processing tasks, sentiment analysis, speech recognition, and more.

Features

  • Extracts video transcripts from YouTube videos.
  • Cleans transcripts by removing unwanted elements like music and applause.
  • Saves cleaned transcripts into separate text files.
  • Supports individual video URLs, batch processing from a list of URLs, and entire playlists.
  • Streamlines the dataset collection process for machine learning applications.

Installation

You can install the required dependencies using pip:

pip install tubelearns

Usage

Extract Transcripts from a List of Video URLs

from tubelearns import text_link

# Provide a path to a text file containing YouTube video URLs.
text_link('path_to_file.txt', name='output_folder_name')

Extract Transcript from a Single Video URL

from tubelearns import url_grab

# Provide a single YouTube video URL.
url_grab('video_url', name='output_folder_name')

Extract Transcripts from a YouTube Playlist

from tubelearns import playlist_grab

# Provide the URL of a YouTube playlist.
playlist_grab('playlist_url', name='output_folder_name')

Convert Playlist Video Links to Text File

from tubelearns import play2text

# Provide the URL of a YouTube playlist.
play2text('playlist_url')

Development Status

This project is currently in the planning stage.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributions

Contributions are welcome! Please feel free to open issues or submit pull requests.

Contact

For any inquiries or feedback, please contact KabilPreethamK.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubelearns-1.1.1.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

tubelearns-1.1.1-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file tubelearns-1.1.1.tar.gz.

File metadata

  • Download URL: tubelearns-1.1.1.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for tubelearns-1.1.1.tar.gz
Algorithm Hash digest
SHA256 f359d7e79391661a0efc11e37422ad0b3960aaefe805ae94e52d757f2516bb48
MD5 eaaa3b4a2f9c5e4f12bbe859cbb000af
BLAKE2b-256 656492cb372a8b23e45ff3a9e271229b11261875257b990d7ae0cdb08f5f4082

See more details on using hashes here.

File details

Details for the file tubelearns-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: tubelearns-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for tubelearns-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 11eb791f93ea70ce1196604771f79d7c8457e4488ef4ef84a6f15d0905991f4b
MD5 f6a0b698d4b4be49fe97c09d747e7e0a
BLAKE2b-256 a90589c47e95f87212458c689fbc946313c4a544695bf9fcb8bb68e200592099

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page