Skip to main content

A Python package for retrieving YouTube data, including video statistics, captions, and channel information. TubeFrames outputs results in a user-friendly pandas DataFrame format, making it ideal for data analysis workflows — especially in Jupyter Notebooks.

Project description

TubeFrames - A YouTube Data Analysis Library

PyPI version Python Versions License: GPL v3

A Python package for retrieving YouTube data, including video statistics, captions, and channel information. TubeFrames outputs results in a user-friendly pandas DataFrame format, making it ideal for data analysis workflows — especially in Jupyter Notebooks.

Table of Contents

Features

  • 🔍 YouTube Search: Query and retrieve results in DataFrame format
  • 📊 Video Statistics: View counts, likes and comments count
  • 📝 Caption Extraction: Extract video captions in multiple languages
  • 📺 Channel Information: Data collection from specific channels

Attribution

This project uses the YouTube Data API and is not affiliated with or endorsed by YouTube or Google. All YouTube content and trademarks are the property of their respective owners.

Setup

Requirements

  • Python 3.6+
  • YouTube Data API key
  • Required dependencies are installed automatically

API Key Setup

To use tubeframes, create a YouTube Data API key following the official Google documentation.

Setting as Environment Variable

Linux: edit ~/.profile and add:

export YOUTUBE_DEVELOPER_KEY=<YOUR_YOUTUBE_DEVELOPER_KEY>

Windows: Set via System Properties → Environment Variables (under User variables)

Installation

pip install tubeframes

Usage

Basic Search

Create a search object to retrieve video information:

import tubeframes as yt
tubeframes_search = yt.Search("Test", developer_key=<YOUR_YOUTUBE_DEVELOPER_KEY>)
tubeframes_search.df  # DataFrame with YouTube infos (likes, views, title, etc.)

Results include:

videoId publishedAt channelId title viewCount likeCount favoriteCount commentCount
abcde1234 2021-06-01 10:00:00+00:00 abcde1234abc Video title example 1 100000 6000 0 200
abcde1235 2021-06-01 11:00:00+00:00 abcde1234abc Video title example 2 200000 5000 1 210
abcde1236 2021-06-01 12:00:00+00:00 abcde1234abd Video title example 3 100000 4000 0 150

Working with Captions

To include video captions in your results, use the argument captions=True:

import tubeframes as yt
# YOUTUBE_DEVELOPER_KEY is not necessary if set as environment variable
tubeframes_search = yt.Search("Test", caption=True)
tubeframes_search.df  # A new column with captions "video_caption" will appear

Results with captions:

videoId publishedAt channelId title commentCount video_caption
abcde1234 2021-06-01 10:00:00+00:00 abcde1234abc Video title example 1 200 What they say; words and more words; thanks for watching
abcde1235 2021-06-01 11:00:00+00:00 abcde1234abc Video title example 2 210 None
abcde1236 2021-06-01 12:00:00+00:00 abcde1234abd Video title example 3 150 Words and more words and more words; thanks for watching

Channel Search

To search for channels instead of videos:

import tubeframes as yt
tubeframes_search = yt.Search("Test", item_type="channel")
tubeframes_search.df  # DataFrame with YouTube channel information

Channel search results:

channelId publishedAt title description channelTitle publishTime
abcde1234abc 2021-06-01 10:00:00+00:00 Example channel 1 Description of example channel 1 Example channel 1 2021-06-01 10:00:00+00:00
abcde1234abd 2021-06-01 11:00:00+00:00 Example channel 2 Description of example channel 2 Example channel 2 2021-06-01 11:00:00+00:00

Channel Information

To get information and captions from videos of specific channel(s), use the ChannelInfo class:

import tubeframes as yt
channel_info = yt.ChannelInfo(
    channel_ids=["<A CHANNEL ID>"],
    max_results=10,
    accepted_caption_lang=['pt', 'en'],
)
channel_info.df  # DataFrame with video information and captions

Channel information results:

channelId videoId title publishedAt caption thumbnailUrl
EXAMPLE_CHANNEL_ID1 EXAMPLE_VIDEO_ID1 Example Video Title 1 2025-03-22 22:00:39+00:00 Example caption text; More example text; Thanks... https://example.com/sddefault.jpg
EXAMPLE_CHANNEL_ID1 EXAMPLE_VIDEO_ID2 Example Video Title 2 2025-03-22 18:00:22+00:00 Example caption text; Follow us on social media... https://example.com/maxresdefault.jpg

Parameter Reference

Search Class

The Search class accepts the following arguments:

Parameter Type Required Default Description
term string Yes - YouTube search term
caption boolean No False Whether to include video captions
maxres integer No 50 Maximum number of results to return
accepted_caption_lang list No ['en'] List of accepted languages for captions
item_type string No "video" Type of search: "video" or "channel"
developer_key string No - YouTube API key (optional if set as environment variable)

Example with all parameters:

import tubeframes as yt

tubeframes_search = yt.Search(
    term="Python Tutorial",
    caption=True,
    maxres=100,
    accepted_caption_lang=['pt', 'en'],
    item_type="video",
    developer_key="<YOUR_DEVELOPER_KEY>"
)

# Access the resulting DataFrame
df = tubeframes_search.df

ChannelInfo Class

The ChannelInfo class accepts the following arguments:

Parameter Type Required Default Description
channel_ids string/list Yes - Channel ID or list of channel IDs
max_results integer No 10 Maximum number of results per channel
accepted_caption_lang list No ['pt', 'en'] List of accepted languages for captions
developer_key string No - YouTube API key (optional if set as environment variable)

Example with all parameters:

import tubeframes as yt

channel_info = yt.ChannelInfo(
    channel_ids=["<CHANNEL ID 1>", "<CHANNEL ID 2>"],
    max_results=20,
    accepted_caption_lang=['pt', 'en', 'es'],
    developer_key="<YOUR_DEVELOPER_KEY>"
)

# Access the resulting DataFrame
df = channel_info.df

Applications

TubeFrames is particularly useful for:

  • Sentiment Analysis: Extract captions for sentiment analysis
  • Text Mining: Identify keywords and topics from YouTube channels
  • Academic Research: Dataset creation for video engagement studies
  • Content Marketing: Channel performance analysis and strategy optimization
  • Competitor Research: Tracking metrics of competitor channels

Contributing

Contributions are welcome! Open an issue or submit a pull request on GitHub.

License

This project is licensed under the GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubeframes-0.3.3.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tubeframes-0.3.3-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file tubeframes-0.3.3.tar.gz.

File metadata

  • Download URL: tubeframes-0.3.3.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tubeframes-0.3.3.tar.gz
Algorithm Hash digest
SHA256 9fb623a7ed5d4fa21fb6581bbd611133e0cb3b8c990882a8953782b7506a30af
MD5 888c5fa624e27e2cb2d459262884423b
BLAKE2b-256 6e00b7d4ad988c9b6367a75b7182b84f8500ece0a16a2357c0f5550aa320fba1

See more details on using hashes here.

File details

Details for the file tubeframes-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: tubeframes-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tubeframes-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 76375eefb9da6ff6949ce839b607eaace280a81f10ac95833dfd56ff4988e328
MD5 043f3513600125ffa34d7fdc0b04e7a3
BLAKE2b-256 323e6879281ccfae0375bb39f47a493095dc10f1de64e7a74a7d3f88b1856791

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page