Skip to main content

A Python package for retrieving YouTube data, including video statistics, captions, and channel information. TubeFrames outputs results in a user-friendly pandas DataFrame format, making it ideal for data analysis workflows — especially in Jupyter Notebooks.

Project description

TubeFrames - A YouTube Data Analysis Library

PyPI version Python Versions License: GPL v3

A Python package for retrieving YouTube data, including video statistics, captions, and channel information. TubeFrames outputs results in a user-friendly pandas DataFrame format, making it ideal for data analysis workflows — especially in Jupyter Notebooks.

Table of Contents

Features

  • 🔍 YouTube Search: Query and retrieve results in DataFrame format
  • 📊 Video Statistics: View counts, likes and comments count
  • 📝 Caption Extraction: Extract video captions in multiple languages
  • 📺 Channel Information: Data collection from specific channels

Attribution

This project uses the YouTube Data API and is not affiliated with or endorsed by YouTube or Google. All YouTube content and trademarks are the property of their respective owners.

Setup

Requirements

  • Python 3.6+
  • YouTube Data API key
  • Required dependencies are installed automatically

API Key Setup

To use tubeframes, create a YouTube Data API key following the official Google documentation.

Setting as Environment Variable

Linux: edit ~/.profile and add:

export YOUTUBE_DEVELOPER_KEY=<YOUR_YOUTUBE_DEVELOPER_KEY>

Windows: Set via System Properties → Environment Variables (under User variables)

Installation

pip install tubeframes

Usage

Basic Search

Create a search object to retrieve video information:

import tubeframes as yt
tubeframes_search = yt.Search("Test", developer_key=<YOUR_YOUTUBE_DEVELOPER_KEY>)
tubeframes_search.df  # DataFrame with YouTube infos (likes, views, title, etc.)

Results include:

videoId publishedAt channelId title viewCount likeCount favoriteCount commentCount
abcde1234 2021-06-01 10:00:00+00:00 abcde1234abc Video title example 1 100000 6000 0 200
abcde1235 2021-06-01 11:00:00+00:00 abcde1234abc Video title example 2 200000 5000 1 210
abcde1236 2021-06-01 12:00:00+00:00 abcde1234abd Video title example 3 100000 4000 0 150

Working with Captions

To include video captions in your results, use the argument captions=True:

import tubeframes as yt
# YOUTUBE_DEVELOPER_KEY is not necessary if set as environment variable
tubeframes_search = yt.Search("Test", caption=True)
tubeframes_search.df  # A new column with captions "video_caption" will appear

Results with captions:

videoId publishedAt channelId title commentCount video_caption
abcde1234 2021-06-01 10:00:00+00:00 abcde1234abc Video title example 1 200 What they say; words and more words; thanks for watching
abcde1235 2021-06-01 11:00:00+00:00 abcde1234abc Video title example 2 210 None
abcde1236 2021-06-01 12:00:00+00:00 abcde1234abd Video title example 3 150 Words and more words and more words; thanks for watching

Channel Search

To search for channels instead of videos:

import tubeframes as yt
tubeframes_search = yt.Search("Test", item_type="channel")
tubeframes_search.df  # DataFrame with YouTube channel information

Channel search results:

channelId publishedAt title description channelTitle publishTime
abcde1234abc 2021-06-01 10:00:00+00:00 Example channel 1 Description of example channel 1 Example channel 1 2021-06-01 10:00:00+00:00
abcde1234abd 2021-06-01 11:00:00+00:00 Example channel 2 Description of example channel 2 Example channel 2 2021-06-01 11:00:00+00:00

Channel Information

To get information and captions from videos of specific channel(s), use the ChannelInfo class:

import tubeframes as yt
channel_info = yt.ChannelInfo(
    channel_ids=["<A CHANNEL ID>"],
    max_results=10,
    accepted_caption_lang=['pt', 'en'],
)
channel_info.df  # DataFrame with video information and captions

Channel information results:

channelId videoId title publishedAt caption thumbnailUrl
EXAMPLE_CHANNEL_ID1 EXAMPLE_VIDEO_ID1 Example Video Title 1 2025-03-22 22:00:39+00:00 Example caption text; More example text; Thanks... https://example.com/sddefault.jpg
EXAMPLE_CHANNEL_ID1 EXAMPLE_VIDEO_ID2 Example Video Title 2 2025-03-22 18:00:22+00:00 Example caption text; Follow us on social media... https://example.com/maxresdefault.jpg

Parameter Reference

Search Class

The Search class accepts the following arguments:

Parameter Type Required Default Description
term string Yes - YouTube search term
caption boolean No False Whether to include video captions
maxres integer No 50 Maximum number of results to return
accepted_caption_lang list No ['en'] List of accepted languages for captions
item_type string No "video" Type of search: "video" or "channel"
developer_key string No - YouTube API key (optional if set as environment variable)

Example with all parameters:

import tubeframes as yt

tubeframes_search = yt.Search(
    term="Python Tutorial",
    caption=True,
    maxres=100,
    accepted_caption_lang=['pt', 'en'],
    item_type="video",
    developer_key="<YOUR_DEVELOPER_KEY>"
)

# Access the resulting DataFrame
df = tubeframes_search.df

ChannelInfo Class

The ChannelInfo class accepts the following arguments:

Parameter Type Required Default Description
channel_ids string/list Yes - Channel ID or list of channel IDs
max_results integer No 10 Maximum number of results per channel
accepted_caption_lang list No ['pt', 'en'] List of accepted languages for captions
developer_key string No - YouTube API key (optional if set as environment variable)

Example with all parameters:

import tubeframes as yt

channel_info = yt.ChannelInfo(
    channel_ids=["<CHANNEL ID 1>", "<CHANNEL ID 2>"],
    max_results=20,
    accepted_caption_lang=['pt', 'en', 'es'],
    developer_key="<YOUR_DEVELOPER_KEY>"
)

# Access the resulting DataFrame
df = channel_info.df

Applications

TubeFrames is particularly useful for:

  • Sentiment Analysis: Extract captions for sentiment analysis
  • Text Mining: Identify keywords and topics from YouTube channels
  • Academic Research: Dataset creation for video engagement studies
  • Content Marketing: Channel performance analysis and strategy optimization
  • Competitor Research: Tracking metrics of competitor channels

Contributing

Contributions are welcome! Open an issue or submit a pull request on GitHub.

License

This project is licensed under the GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubeframes-0.3.2.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tubeframes-0.3.2-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file tubeframes-0.3.2.tar.gz.

File metadata

  • Download URL: tubeframes-0.3.2.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for tubeframes-0.3.2.tar.gz
Algorithm Hash digest
SHA256 fafb4df41f1748252a6355c08ce2e45ea8d77ab8a91e0e845814a17796492d7f
MD5 ae4b347fa3ec608209eed974144e37aa
BLAKE2b-256 6ddead596566205bf17352764226a655ae36b558b071468a89d938f4a482a6a8

See more details on using hashes here.

File details

Details for the file tubeframes-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: tubeframes-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for tubeframes-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 289963bf8136e87ab69a0174af250fa6811d12d52e43cc921e93a0da6058d44c
MD5 0ab21311520aa22f07d43d5b4d38e53c
BLAKE2b-256 de73acee5bd256cf1d85151891f0ad5c6f90655085652d95687c1ae19e4f6b5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page