A Python package for retrieving YouTube data, including video statistics, captions, and channel information. TubeFrames outputs results in a user-friendly pandas DataFrame format, making it ideal for data analysis workflows — especially in Jupyter Notebooks.
Project description
TubeFrames - A YouTube Data Analysis Library
A Python package for retrieving YouTube data, including video statistics, captions, and channel information. TubeFrames outputs results in a user-friendly pandas DataFrame format, making it ideal for data analysis workflows — especially in Jupyter Notebooks.
Table of Contents
Features
- 🔍 YouTube Search: Query and retrieve results in DataFrame format
- 📊 Video Statistics: View counts, likes and comments count
- 📝 Caption Extraction: Extract video captions in multiple languages
- 📺 Channel Information: Data collection from specific channels
Attribution
This project uses the YouTube Data API and is not affiliated with or endorsed by YouTube or Google. All YouTube content and trademarks are the property of their respective owners.
Setup
Requirements
- Python 3.6+
- YouTube Data API key
- Required dependencies are installed automatically
API Key Setup
To use tubeframes, create a YouTube Data API key following the official Google documentation.
Setting as Environment Variable
Linux: edit ~/.profile and add:
export YOUTUBE_DEVELOPER_KEY=<YOUR_YOUTUBE_DEVELOPER_KEY>
Windows: Set via System Properties → Environment Variables (under User variables)
Installation
pip install tubeframes
Usage
Basic Search
Create a search object to retrieve video information:
import tubeframes as yt
tubeframes_search = yt.Search("Test", developer_key=<YOUR_YOUTUBE_DEVELOPER_KEY>)
tubeframes_search.df # DataFrame with YouTube infos (likes, views, title, etc.)
Results include:
| videoId | publishedAt | channelId | title | … | viewCount | likeCount | favoriteCount | commentCount |
|---|---|---|---|---|---|---|---|---|
| abcde1234 | 2021-06-01 10:00:00+00:00 | abcde1234abc | Video title example 1 | … | 100000 | 6000 | 0 | 200 |
| abcde1235 | 2021-06-01 11:00:00+00:00 | abcde1234abc | Video title example 2 | … | 200000 | 5000 | 1 | 210 |
| abcde1236 | 2021-06-01 12:00:00+00:00 | abcde1234abd | Video title example 3 | … | 100000 | 4000 | 0 | 150 |
Working with Captions
To include video captions in your results, use the argument captions=True:
import tubeframes as yt
# YOUTUBE_DEVELOPER_KEY is not necessary if set as environment variable
tubeframes_search = yt.Search("Test", caption=True)
tubeframes_search.df # A new column with captions "video_caption" will appear
Results with captions:
| videoId | publishedAt | channelId | title | … | commentCount | video_caption |
|---|---|---|---|---|---|---|
| abcde1234 | 2021-06-01 10:00:00+00:00 | abcde1234abc | Video title example 1 | … | 200 | What they say; words and more words; thanks for watching |
| abcde1235 | 2021-06-01 11:00:00+00:00 | abcde1234abc | Video title example 2 | … | 210 | None |
| abcde1236 | 2021-06-01 12:00:00+00:00 | abcde1234abd | Video title example 3 | … | 150 | Words and more words and more words; thanks for watching |
| … | … | … | … | … | … | … |
Channel Search
To search for channels instead of videos:
import tubeframes as yt
tubeframes_search = yt.Search("Test", item_type="channel")
tubeframes_search.df # DataFrame with YouTube channel information
Channel search results:
| channelId | publishedAt | title | description | channelTitle | publishTime |
|---|---|---|---|---|---|
| abcde1234abc | 2021-06-01 10:00:00+00:00 | Example channel 1 | Description of example channel 1 | Example channel 1 | 2021-06-01 10:00:00+00:00 |
| abcde1234abd | 2021-06-01 11:00:00+00:00 | Example channel 2 | Description of example channel 2 | Example channel 2 | 2021-06-01 11:00:00+00:00 |
Channel Information
To get information and captions from videos of specific channel(s), use the ChannelInfo class:
import tubeframes as yt
channel_info = yt.ChannelInfo(
channel_ids=["<A CHANNEL ID>"],
max_results=10,
accepted_caption_lang=['pt', 'en'],
)
channel_info.df # DataFrame with video information and captions
Channel information results:
| channelId | videoId | title | publishedAt | caption | thumbnailUrl |
|---|---|---|---|---|---|
| EXAMPLE_CHANNEL_ID1 | EXAMPLE_VIDEO_ID1 | Example Video Title 1 | 2025-03-22 22:00:39+00:00 | Example caption text; More example text; Thanks... | https://example.com/sddefault.jpg |
| EXAMPLE_CHANNEL_ID1 | EXAMPLE_VIDEO_ID2 | Example Video Title 2 | 2025-03-22 18:00:22+00:00 | Example caption text; Follow us on social media... | https://example.com/maxresdefault.jpg |
Parameter Reference
Search Class
The Search class accepts the following arguments:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| term | string | Yes | - | YouTube search term |
| caption | boolean | No | False | Whether to include video captions |
| maxres | integer | No | 50 | Maximum number of results to return |
| accepted_caption_lang | list | No | ['en'] | List of accepted languages for captions |
| item_type | string | No | "video" | Type of search: "video" or "channel" |
| developer_key | string | No | - | YouTube API key (optional if set as environment variable) |
Example with all parameters:
import tubeframes as yt
tubeframes_search = yt.Search(
term="Python Tutorial",
caption=True,
maxres=100,
accepted_caption_lang=['pt', 'en'],
item_type="video",
developer_key="<YOUR_DEVELOPER_KEY>"
)
# Access the resulting DataFrame
df = tubeframes_search.df
ChannelInfo Class
The ChannelInfo class accepts the following arguments:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| channel_ids | string/list | Yes | - | Channel ID or list of channel IDs |
| max_results | integer | No | 10 | Maximum number of results per channel |
| accepted_caption_lang | list | No | ['pt', 'en'] | List of accepted languages for captions |
| developer_key | string | No | - | YouTube API key (optional if set as environment variable) |
Example with all parameters:
import tubeframes as yt
channel_info = yt.ChannelInfo(
channel_ids=["<CHANNEL ID 1>", "<CHANNEL ID 2>"],
max_results=20,
accepted_caption_lang=['pt', 'en', 'es'],
developer_key="<YOUR_DEVELOPER_KEY>"
)
# Access the resulting DataFrame
df = channel_info.df
Applications
TubeFrames is particularly useful for:
- Sentiment Analysis: Extract captions for sentiment analysis
- Text Mining: Identify keywords and topics from YouTube channels
- Academic Research: Dataset creation for video engagement studies
- Content Marketing: Channel performance analysis and strategy optimization
- Competitor Research: Tracking metrics of competitor channels
Contributing
Contributions are welcome! Open an issue or submit a pull request on GitHub.
License
This project is licensed under the GNU General Public License v3.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tubeframes-0.3.3.tar.gz.
File metadata
- Download URL: tubeframes-0.3.3.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fb623a7ed5d4fa21fb6581bbd611133e0cb3b8c990882a8953782b7506a30af
|
|
| MD5 |
888c5fa624e27e2cb2d459262884423b
|
|
| BLAKE2b-256 |
6e00b7d4ad988c9b6367a75b7182b84f8500ece0a16a2357c0f5550aa320fba1
|
File details
Details for the file tubeframes-0.3.3-py3-none-any.whl.
File metadata
- Download URL: tubeframes-0.3.3-py3-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76375eefb9da6ff6949ce839b607eaace280a81f10ac95833dfd56ff4988e328
|
|
| MD5 |
043f3513600125ffa34d7fdc0b04e7a3
|
|
| BLAKE2b-256 |
323e6879281ccfae0375bb39f47a493095dc10f1de64e7a74a7d3f88b1856791
|