Scrape YouTube search results, channels, transcripts, and playlists — no API key needed. Sync + async. CLI + REST API included.
Project description
TubeScrape
Scrape YouTube search results, channels, transcripts, and playlists — no API key needed.
Built on YouTube's internal InnerTube API. Three interfaces: Python SDK, CLI, and REST API.
Features
- Search with filters — type, duration, upload date, sort order, features (4K, HDR, live, CC)
- Channel browsing — videos, shorts, playlists, and in-channel search
- Transcripts — fetch, translate, format (SRT / WebVTT / JSON), save to file
- Playlists — full pagination with position tracking and metadata
- Async-first — every method has an
asyncvariant for concurrent workloads - JSON-ready — every result object has
.to_dict()for instant serialization - Proxy rotation — built-in support for residential proxy lists
- Three interfaces — Python SDK, CLI with Rich tables, REST API with Swagger docs
- Zero config — no API key, no OAuth, no setup
- Lightweight — only
httpxas a core dependency
Installation
recommended to install using pip
You can also integrate it into an existing project or use it via a CLI.
# Core SDK only
pip install tubescrape
# With CLI (adds click + rich)
pip install "tubescrape[cli]"
# With REST API server (adds fastapi + uvicorn)
pip install "tubescrape[api]"
# Everything
pip install "tubescrape[all]"
Requirements: Python 3.10+
Quick Start
import json
from tubescrape import YouTube
with YouTube() as yt:
# Search YouTube
results = yt.search('python tutorial', max_results=5)
for video in results.videos:
print(f'{video.title} — {video.duration} — {video.view_count}')
# Browse a channel
channel = yt.get_channel_videos('@lexfridman', max_results=10)
for video in channel.videos:
print(f'{video.title} ({video.published_text})')
# Get a transcript and save as subtitles
transcript = yt.get_transcript('dQw4w9WgXcQ')
print(transcript.text)
transcript.save('subtitles.srt')
# Every result serializes to JSON instantly
data = results.to_dict()
print(json.dumps(data, indent=2))
Every method accepts plain IDs, full URLs, or @handles — parsed automatically.
Why tubescrape?
| tubescrape | youtube-transcript-api | pytube / pytubefix | yt-dlp | |
|---|---|---|---|---|
| Search videos | Yes | No | No | Limited |
| Channel browse | Yes | No | No | Yes |
| Transcripts | Yes | Yes | No | Yes |
| Playlists | Yes | No | Yes | Yes |
| Async support | Yes | No | No | No |
| Built-in REST API | Yes | No | No | No |
| CLI tool | Yes | No | No | Yes |
| Core dependencies | 1 (httpx) | 1 (requests) | 0 | Many |
| API key needed | No | No | No | No |
Search
results = yt.search('python tutorial', max_results=5)
for video in results.videos:
print(f'{video.title} — {video.url}')
print(f' {video.view_count} | {video.duration} | {video.published_text}')
print(f' Channel: {video.channel} (verified: {video.is_verified})')
Search Filters
All filters can be combined in a single call:
results = yt.search(
'podcast interview',
max_results=10,
type='video', # video | channel | playlist | movie
duration='long', # short (<4m) | medium (4-20m) | long (>20m)
upload_date='this_month', # last_hour | today | this_week | this_month | this_year
sort_by='view_count', # relevance | upload_date | view_count | rating
features=['hd', 'subtitles'], # live | 4k | hd | subtitles | cc | creative_commons | 360 | vr180 | 3d | hdr
)
Channel Browsing
All channel methods accept @handle, channel ID (UC...), or full URL.
# Videos (newest first, with pagination)
videos = yt.get_channel_videos('@lexfridman', max_results=10)
all_videos = yt.get_channel_videos('@lexfridman', max_results=0) # fetch ALL
# Shorts
shorts = yt.get_channel_shorts('@lexfridman')
for short in shorts.shorts:
print(f'{short.title} — {short.view_count} — {short.url}')
# Playlists
playlists = yt.get_channel_playlists('@lexfridman')
for pl in playlists.playlists:
print(f'{pl.title} — {pl.video_count} — {pl.url}')
# Search within a channel
results = yt.search_channel('@lexfridman', 'artificial intelligence', max_results=10)
Transcripts
# Fetch transcript (auto-detects best language)
transcript = yt.get_transcript('dQw4w9WgXcQ')
print(transcript.text) # full text as a single string
# Choose language (priority order fallback)
transcript = yt.get_transcript('dQw4w9WgXcQ', languages=['de', 'en'])
# Translate to any language
transcript = yt.get_transcript('dQw4w9WgXcQ', translate_to='es')
# Without timestamps (plain text blob)
transcript = yt.get_transcript('dQw4w9WgXcQ', timestamps=False)
# List available languages
languages = yt.list_transcripts('dQw4w9WgXcQ')
for entry in languages:
print(f'{entry.language} ({entry.language_code}) — {"auto" if entry.is_generated else "manual"}')
Formatting & Saving
transcript = yt.get_transcript('dQw4w9WgXcQ')
# Format as SRT, WebVTT, JSON, or plain text
srt = YouTube.format_transcript(transcript, fmt='srt')
vtt = YouTube.format_transcript(transcript, fmt='vtt')
# Save to file (format auto-detected from extension)
transcript.save('subtitles.srt')
transcript.save('subtitles.vtt')
transcript.save('transcript.json')
transcript.save('transcript.txt')
Playlists
# Accepts playlist ID or full URL
playlist = yt.get_playlist('PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf')
print(f'{playlist.title} by {playlist.channel} — {len(playlist.videos)} videos')
for entry in playlist.videos:
print(f'#{entry.position} {entry.title} — {entry.duration}')
Serialization (.to_dict())
Every result object converts to a plain Python dictionary — ready for JSON, databases, or APIs:
import json
# Works on every result type
results.to_dict() # SearchResult → dict
video.to_dict() # VideoResult → dict
channel.to_dict() # BrowseResult → dict
shorts.to_dict() # ShortsResult → dict
playlists.to_dict() # ChannelPlaylistsResult → dict
playlist.to_dict() # PlaylistResult → dict
transcript.to_dict() # Transcript → dict
# Sparse output: optional fields excluded when empty/default
# is_verified=False → omitted | badges=[] → omitted | None fields → omitted
print(json.dumps(results.to_dict(), indent=2, ensure_ascii=False))
Async Support
Every method has an async variant prefixed with a. Use in FastAPI, Discord bots, or any async application:
import asyncio
from tubescrape import YouTube
async def main():
async with YouTube() as yt:
# All methods have async variants
results = await yt.asearch('python', max_results=5)
transcript = await yt.aget_transcript('dQw4w9WgXcQ')
# Run multiple requests concurrently
r1, r2, r3 = await asyncio.gather(
yt.asearch('python'),
yt.asearch('javascript'),
yt.asearch('rust'),
)
asyncio.run(main())
Proxy Support
# Single proxy
yt = YouTube(proxy='http://user:pass@proxy.example.com:8080')
# Proxy rotation (round-robin per request)
yt = YouTube(proxies=[
'http://user:pass@proxy1:8080',
'http://user:pass@proxy2:8080',
])
# SOCKS5
yt = YouTube(proxy='socks5://user:pass@proxy:1080')
# Custom timeout and retries
yt = YouTube(proxy='http://proxy:8080', timeout=60.0, max_retries=5)
Tip: YouTube blocks cloud IPs aggressively. Use rotating residential proxies (BrightData, SmartProxy, Oxylabs) for production.
CLI
Install with pip install "tubescrape[cli]".
tubescrape search "python tutorial" -n 5
tubescrape search "podcast" --type video --duration long --sort-by view_count
tubescrape search "python" --json # JSON output
tubescrape channel @lexfridman # videos (default)
tubescrape channel @lexfridman shorts # shorts
tubescrape channel @lexfridman playlists # playlists
tubescrape channel @lexfridman search "podcast" # search within channel
tubescrape playlist PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf
tubescrape transcript dQw4w9WgXcQ # plain text
tubescrape transcript dQw4w9WgXcQ --format srt # SRT subtitles
tubescrape transcript dQw4w9WgXcQ --translate es # translate
tubescrape transcript dQw4w9WgXcQ --save output.srt # save to file
tubescrape transcript dQw4w9WgXcQ --list-languages # available languages
tubescrape --proxy http://user:pass@host:port search "python" # with proxy
export TUBESCRAPE_PROXY="http://user:pass@host:port" # env variable
REST API
Install with pip install "tubescrape[api]".
tubescrape serve # starts on localhost:8000
tubescrape serve --host 0.0.0.0 --port 3000
Interactive Swagger docs at http://localhost:8000/docs.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/search?q=python |
Search videos |
| GET | /api/v1/channel/{id}/videos |
Channel videos |
| GET | /api/v1/channel/{id}/shorts |
Channel shorts |
| GET | /api/v1/channel/{id}/playlists |
Channel playlists |
| GET | /api/v1/channel/{id}/search?q=... |
Search within channel |
| GET | /api/v1/playlist/{id} |
Fetch playlist |
| GET | /api/v1/transcript/{video_id} |
Fetch transcript |
| GET | /api/v1/transcript/{video_id}/languages |
List languages |
| GET | /health |
Health check |
curl "http://localhost:8000/api/v1/search?q=python+tutorial&max_results=5"
curl "http://localhost:8000/api/v1/transcript/dQw4w9WgXcQ?format=srt&translate_to=es"
Error Handling
All exceptions inherit from YouTubeError:
YouTubeError
├── RequestError
│ ├── RateLimitError # HTTP 429
│ └── BotDetectedError # HTTP 403
├── VideoUnavailableError # private, deleted, region-locked
│ └── AgeRestrictedError
├── TranscriptsDisabledError
├── TranscriptsNotAvailableError
├── TranscriptFetchError
├── TranslationNotAvailableError
├── ChannelNotFoundError
├── PlaylistNotFoundError
├── APIKeyNotFoundError
└── ParsingError
from tubescrape import YouTube, YouTubeError, RateLimitError
try:
results = yt.search('python')
except RateLimitError:
print('Rate limited — use a proxy')
except YouTubeError as e:
print(f'YouTube error: {e}')
Full Documentation
For detailed examples, all field references, and advanced usage, see the Complete Usage Guide.
Warning
This library uses YouTube's undocumented InnerTube API. It may break if YouTube changes their internal API. If it does, please open an issue.
Contributing
git clone https://github.com/zaidkx37/tubescrape.git
cd tubescrape
pip install -e ".[all,dev]"
pytest # run tests
ruff check src/ # lint
mypy src/tubescrape/ # type check
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tubescrape-0.1.1.tar.gz.
File metadata
- Download URL: tubescrape-0.1.1.tar.gz
- Upload date:
- Size: 646.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1a698deb0f1cd7e10489eae8364ddf1476b1e3761699cc964b9a3a664a0f170
|
|
| MD5 |
6bcf279bc607cb384b58eae78ef3ff5b
|
|
| BLAKE2b-256 |
24cb964975bcd4591e2f3d80b4d2fd141b9de36714026daee617042cd42ad664
|
File details
Details for the file tubescrape-0.1.1-py3-none-any.whl.
File metadata
- Download URL: tubescrape-0.1.1-py3-none-any.whl
- Upload date:
- Size: 52.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e65ae66e2905e2ab30ca207f91521cc481c940db99cf212be7750eca32af69e
|
|
| MD5 |
1e760f52bb8bf96e7bbfe82f1afa6e11
|
|
| BLAKE2b-256 |
6925e82af93a45fae07f31713b9799591517d83ce4dc3296d2d08c2d63f55049
|