A Python library for scraping YouTube video data
Project description
NGTube
A comprehensive Python library for scraping YouTube data, including videos, comments, and channel profiles.
⚠️ Disclaimer
This library is provided for educational and research purposes only. Scraping YouTube data may violate YouTube's Terms of Service. Use at your own risk. The authors are not responsible for any misuse or legal consequences. Always respect robots.txt and implement appropriate rate limiting.
Features
- Video Extraction: Extract detailed metadata from YouTube videos (title, views, likes, duration, tags, description, etc.)
- Comment Extraction: Extract comments from videos, including loading additional comments via YouTube's internal API
- Channel Extraction: Extract complete channel profile data (subscribers, description, featured video, video list with continuation support)
- Flexible Video Loading: Load specific number of videos or all available videos from a channel
- Clean Data Output: Structured JSON-compatible data output
- Modular Design: Separate classes for different extraction tasks
Installation
Option 1: Install as Package (Recommended)
- Clone or download the repository.
- Navigate to the project directory.
- Install the package using pip:
pip install .
This will install NGTube as a Python package with all dependencies automatically handled.
Option 2: Manual Installation
- Clone or download the repository.
- Ensure you have Python 3.6+ installed.
- Install required dependencies:
pip install requests demjson3
- Copy the
NGTubefolder to your project directory or add it to your Python path.
Using setup.py
The setup.py file is used for packaging and installation. You can also install manually:
python setup.py install
However, using pip install . is recommended as it handles modern Python packaging better.
Quick Start
Extract Video Metadata
from NGTube import Video
url = "https://www.youtube.com/watch?v=y1XrJyFF1O0"
video = Video(url)
metadata = video.extract_metadata()
print("Title:", metadata['title'])
print("Views:", metadata['view_count'])
print("Likes:", metadata['like_count'])
print("Duration:", metadata['duration_seconds'], "seconds")
Extract Comments
from NGTube import Comments
url = "https://www.youtube.com/watch?v=y1XrJyFF1O0"
comments = Comments(url)
comment_data = comments.get_comments()
print(f"Total comments: {len(comment_data['comments'])}")
for comment in comment_data['comments'][:3]:
print(f"{comment['author']}: {comment['text'][:50]}...")
Extract Channel Profile
from NGTube import Channel
url = "https://www.youtube.com/@HandOfUncut"
channel = Channel(url)
# Load first 10 videos
profile = channel.extract_profile(max_videos=10)
print("Channel Title:", profile['title'])
print("Subscribers:", profile['subscribers'])
print("Videos loaded:", profile['loaded_videos_count'])
# Load all videos
profile_all = channel.extract_profile(max_videos='all')
print("Total videos:", profile_all['loaded_videos_count'])
Detailed Usage
Video Class
from NGTube import Video
video = Video("https://www.youtube.com/watch?v=VIDEO_ID")
metadata = video.extract_metadata()
# Available metadata keys:
# - title, view_count, like_count, duration_seconds
# - channel_name, channel_id, subscriber_count
# - description, tags, category, is_private
# - upload_date, published_time_text
Comments Class
from NGTube import Comments
comments = Comments("https://www.youtube.com/watch?v=VIDEO_ID")
data = comments.get_comments()
# Returns dictionary with:
# - 'top_comment': list of top comments
# - 'comments': list of regular comments
# Each comment contains:
# - author, text, like_count, published_time_text
# - author_thumbnail, comment_id, reply_count
Channel Class
from NGTube import Channel
channel = Channel("https://www.youtube.com/@ChannelHandle")
# Extract profile with specific number of videos
profile = channel.extract_profile(max_videos=50)
# Extract profile with all videos (may take time)
profile = channel.extract_profile(max_videos='all')
# Available profile data:
# - title, description, channel_id, channel_url
# - keywords, is_family_safe, links
# - subscriber_count_text, view_count_text, video_count_text
# - subscribers, total_views, video_count (parsed numbers)
# - featured_video (dict with videoId, title, description)
# - videos (list of video dictionaries)
# - loaded_videos_count
Examples
See the examples/ directory for complete working examples:
basic_usage.py: Extract video metadata and commentsbatch_processing.py: Process multiple videoschannel_usage.py: Extract channel profile data
Run any example:
python examples/basic_usage.py
API Reference
Core Classes
YouTubeCore
Base class for YouTube interactions.
__init__(url: str): Initialize with YouTube URLfetch_html() -> str: Fetch HTML contentextract_ytinitialdata(html: str) -> dict: Extract ytInitialDatamake_api_request(endpoint: str, payload: dict) -> dict: Make API requests
Video
Extract video metadata.
__init__(url: str): Initialize with video URLextract_metadata() -> dict: Extract and return video metadata
Comments
Extract video comments.
__init__(url: str): Initialize with video URLget_comments() -> dict: Extract and return comments data
Channel
Extract channel profile and videos.
__init__(url: str): Initialize with channel URLextract_profile(max_videos: int | str = 200) -> dict: Extract profile datamax_videos: Number of videos to load, or 'all' for all videos
Utils Module
extract_number(text: str) -> int: Extract numbers from text (handles German formatting)extract_links(text: str) -> list: Extract URLs from text
Data Structures
Video Metadata
{
"title": "Video Title",
"view_count": 299955,
"duration_in_seconds": 6994,
"description": "Video description...",
"tags": ["tag1", "tag2"],
"video_id": "VIDEO_ID",
"channel_id": "UC...",
"is_owner_viewing": false,
"is_crawlable": true,
"thumbnail": {...},
"allow_ratings": true,
"author": "Channel Name",
"is_private": false,
"is_unplugged_corpus": false,
"is_live_content": false,
"like_count": 8547,
"channel_name": "Channel Name",
"category": "Gaming",
"publish_date": "2023-12-01",
"upload_date": "2023-12-01",
"family_safe": true,
"channel_url": "https://...",
"subscriber_count": 1400000
}
Comment Data
{
"top_comment": [...],
"comments": [
{
"author": "Username",
"text": "Comment text",
"likeCount": 196,
"publishedTimeText": "vor 1 Tag",
"authorThumbnail": "https://...",
"commentId": "...",
"replyCount": 1
}
]
}
Channel Profile
{
"title": "Channel Title",
"description": "Channel description...",
"channelId": "UC...",
"channelUrl": "https://...",
"keywords": "keyword1 keyword2",
"isFamilySafe": true,
"links": ["https://..."],
"subscriberCountText": "159.000 Abonnenten",
"viewCountText": "84.770 Aufrufe",
"videoCountText": "2583 Videos",
"subscribers": 159000,
"total_views": 84770,
"video_count": 2583,
"featured_video": {
"videoId": "...",
"title": "Featured Video Title",
"description": "Featured video description..."
},
"videos": [
{
"videoId": "...",
"title": "Video Title",
"publishedTimeText": "vor 1 Tag",
"viewCountText": "40.773 Aufrufe",
"lengthText": "1:02:58",
"thumbnails": [...]
}
],
"loaded_videos_count": 1
}
Limitations
- Rate Limiting: YouTube may rate-limit requests. Add delays between requests for bulk operations.
- Comment Limits: Without authentication, typically 40-50 comments can be loaded per video.
- Video Limits: Channel video extraction may be limited by YouTube's pagination.
- Terms of Service: This library is for educational purposes. Respect YouTube's Terms of Service and robots.txt.
Troubleshooting
- Import Errors: Ensure NGTube folder is in your Python path
- API Errors: YouTube changes their internal APIs frequently. The library uses current endpoints as of December 2025.
- Missing Data: Some videos/channels may have restricted data access
Contributing
This library is maintained for educational purposes. Feel free to submit issues or improvements.
License
This project can be used by anyone with attribution.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ngtube-1.0.0.tar.gz.
File metadata
- Download URL: ngtube-1.0.0.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8c05ce2c01df9b0351edd1f70395e8e97a2d88962602597bc806d5471580162
|
|
| MD5 |
00b21deb9ea9ae3e3f3032f4e667f63f
|
|
| BLAKE2b-256 |
4b4f49e0afa83f92fd3a51e4207301c0df474cfef150a258c5759cf673b9eb39
|
Provenance
The following attestation bundles were made for ngtube-1.0.0.tar.gz:
Publisher:
publish.yml on NGxDTV/NGTube-Youtube
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ngtube-1.0.0.tar.gz -
Subject digest:
d8c05ce2c01df9b0351edd1f70395e8e97a2d88962602597bc806d5471580162 - Sigstore transparency entry: 755329822
- Sigstore integration time:
-
Permalink:
NGxDTV/NGTube-Youtube@c27c2c2462223f9a83e4a3edba85a6c68dac9770 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/NGxDTV
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c27c2c2462223f9a83e4a3edba85a6c68dac9770 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file ngtube-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ngtube-1.0.0-py3-none-any.whl
- Upload date:
- Size: 24.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e06a7d7b7de347a599a8f5cb2b20a96c3e2c694df07ea78d8b6a53949e51cab
|
|
| MD5 |
b416f76e475594580b57abf3a2df19a4
|
|
| BLAKE2b-256 |
66d3b9fcd092960cd7c04c36f7ed35a96efc5835de8a4046f5c0e4a03be22bb4
|
Provenance
The following attestation bundles were made for ngtube-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on NGxDTV/NGTube-Youtube
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ngtube-1.0.0-py3-none-any.whl -
Subject digest:
1e06a7d7b7de347a599a8f5cb2b20a96c3e2c694df07ea78d8b6a53949e51cab - Sigstore transparency entry: 755329882
- Sigstore integration time:
-
Permalink:
NGxDTV/NGTube-Youtube@c27c2c2462223f9a83e4a3edba85a6c68dac9770 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/NGxDTV
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c27c2c2462223f9a83e4a3edba85a6c68dac9770 -
Trigger Event:
workflow_dispatch
-
Statement type: