Skip to main content

Scrape YouTube search results, channels, transcripts, and playlists — no API key needed. Sync + async. CLI + REST API included.

Project description

TubeScrape

Scrape YouTube search results, channels, transcripts, and playlists — no API key needed.

PyPI version Python versions CI License: MIT Downloads

Built on YouTube's internal InnerTube API. Three interfaces: Python SDK, CLI, and REST API.


Features

  • Search with filters — type, duration, upload date, sort order, features (4K, HDR, live, CC)
  • Channel browsing — videos, shorts, playlists, and in-channel search
  • Transcripts — fetch, translate, format (SRT / WebVTT / JSON), save to file
  • Playlists — full pagination with position tracking and metadata
  • Async-first — every method has an async variant for concurrent workloads
  • JSON-ready — every result object has .to_dict() for instant serialization
  • Proxy rotation — built-in support for residential proxy lists
  • Three interfaces — Python SDK, CLI with Rich tables, REST API with Swagger docs
  • Zero config — no API key, no OAuth, no setup
  • Lightweight — only httpx as a core dependency

Installation

recommended to install using pip

You can also integrate it into an existing project or use it via a CLI.

# Core SDK only
pip install tubescrape

# With CLI (adds click + rich)
pip install "tubescrape[cli]"

# With REST API server (adds fastapi + uvicorn)
pip install "tubescrape[api]"

# Everything
pip install "tubescrape[all]"

Requirements: Python 3.10+


Quick Start

import json
from tubescrape import YouTube

with YouTube() as yt:
    # Search YouTube
    results = yt.search('python tutorial', max_results=5)
    for video in results.videos:
        print(f'{video.title}{video.duration}{video.view_count}')

    # Browse a channel
    channel = yt.get_channel_videos('@lexfridman', max_results=10)
    for video in channel.videos:
        print(f'{video.title} ({video.published_text})')

    # Get a transcript and save as subtitles
    transcript = yt.get_transcript('dQw4w9WgXcQ')
    print(transcript.text)
    transcript.save('subtitles.srt')

    # Every result serializes to JSON instantly
    data = results.to_dict()
    print(json.dumps(data, indent=2))

Every method accepts plain IDs, full URLs, or @handles — parsed automatically.


Why tubescrape?

tubescrape youtube-transcript-api pytube / pytubefix yt-dlp
Search videos Yes No No Limited
Channel browse Yes No No Yes
Transcripts Yes Yes No Yes
Playlists Yes No Yes Yes
Async support Yes No No No
Built-in REST API Yes No No No
CLI tool Yes No No Yes
Core dependencies 1 (httpx) 1 (requests) 0 Many
API key needed No No No No

Search

results = yt.search('python tutorial', max_results=5)

for video in results.videos:
    print(f'{video.title}{video.url}')
    print(f'  {video.view_count} | {video.duration} | {video.published_text}')
    print(f'  Channel: {video.channel} (verified: {video.is_verified})')

Search Filters

All filters can be combined in a single call:

results = yt.search(
    'podcast interview',
    max_results=10,
    type='video',              # video | channel | playlist | movie
    duration='long',           # short (<4m) | medium (4-20m) | long (>20m)
    upload_date='this_month',  # last_hour | today | this_week | this_month | this_year
    sort_by='view_count',      # relevance | upload_date | view_count | rating
    features=['hd', 'subtitles'],  # live | 4k | hd | subtitles | cc | creative_commons | 360 | vr180 | 3d | hdr
)

Channel Browsing

All channel methods accept @handle, channel ID (UC...), or full URL.

# Videos (newest first, with pagination)
videos = yt.get_channel_videos('@lexfridman', max_results=10)
all_videos = yt.get_channel_videos('@lexfridman', max_results=0)  # fetch ALL

# Shorts
shorts = yt.get_channel_shorts('@lexfridman')
for short in shorts.shorts:
    print(f'{short.title}{short.view_count}{short.url}')

# Playlists
playlists = yt.get_channel_playlists('@lexfridman')
for pl in playlists.playlists:
    print(f'{pl.title}{pl.video_count}{pl.url}')

# Search within a channel
results = yt.search_channel('@lexfridman', 'artificial intelligence', max_results=10)

Transcripts

# Fetch transcript (auto-detects best language)
transcript = yt.get_transcript('dQw4w9WgXcQ')
print(transcript.text)  # full text as a single string

# Choose language (priority order fallback)
transcript = yt.get_transcript('dQw4w9WgXcQ', languages=['de', 'en'])

# Translate to any language
transcript = yt.get_transcript('dQw4w9WgXcQ', translate_to='es')

# Without timestamps (plain text blob)
transcript = yt.get_transcript('dQw4w9WgXcQ', timestamps=False)

# List available languages
languages = yt.list_transcripts('dQw4w9WgXcQ')
for entry in languages:
    print(f'{entry.language} ({entry.language_code}) — {"auto" if entry.is_generated else "manual"}')

Formatting & Saving

transcript = yt.get_transcript('dQw4w9WgXcQ')

# Format as SRT, WebVTT, JSON, or plain text
srt = YouTube.format_transcript(transcript, fmt='srt')
vtt = YouTube.format_transcript(transcript, fmt='vtt')

# Save to file (format auto-detected from extension)
transcript.save('subtitles.srt')
transcript.save('subtitles.vtt')
transcript.save('transcript.json')
transcript.save('transcript.txt')

Playlists

# Accepts playlist ID or full URL
playlist = yt.get_playlist('PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf')

print(f'{playlist.title} by {playlist.channel}{len(playlist.videos)} videos')

for entry in playlist.videos:
    print(f'#{entry.position}  {entry.title}{entry.duration}')

Serialization (.to_dict())

Every result object converts to a plain Python dictionary — ready for JSON, databases, or APIs:

import json

# Works on every result type
results.to_dict()       # SearchResult → dict
video.to_dict()         # VideoResult → dict
channel.to_dict()       # BrowseResult → dict
shorts.to_dict()        # ShortsResult → dict
playlists.to_dict()     # ChannelPlaylistsResult → dict
playlist.to_dict()      # PlaylistResult → dict
transcript.to_dict()    # Transcript → dict

# Sparse output: optional fields excluded when empty/default
# is_verified=False → omitted | badges=[] → omitted | None fields → omitted
print(json.dumps(results.to_dict(), indent=2, ensure_ascii=False))

Async Support

Every method has an async variant prefixed with a. Use in FastAPI, Discord bots, or any async application:

import asyncio
from tubescrape import YouTube

async def main():
    async with YouTube() as yt:
        # All methods have async variants
        results = await yt.asearch('python', max_results=5)
        transcript = await yt.aget_transcript('dQw4w9WgXcQ')

        # Run multiple requests concurrently
        r1, r2, r3 = await asyncio.gather(
            yt.asearch('python'),
            yt.asearch('javascript'),
            yt.asearch('rust'),
        )

asyncio.run(main())

Proxy Support

# Single proxy
yt = YouTube(proxy='http://user:pass@proxy.example.com:8080')

# Proxy rotation (round-robin per request)
yt = YouTube(proxies=[
    'http://user:pass@proxy1:8080',
    'http://user:pass@proxy2:8080',
])

# SOCKS5
yt = YouTube(proxy='socks5://user:pass@proxy:1080')

# Custom timeout and retries
yt = YouTube(proxy='http://proxy:8080', timeout=60.0, max_retries=5)

Tip: YouTube blocks cloud IPs aggressively. Use rotating residential proxies (BrightData, SmartProxy, Oxylabs) for production.


CLI

Install with pip install "tubescrape[cli]".

tubescrape search tubescrape channel search tubescrape transcript tubescrape transcript srt

tubescrape search "python tutorial" -n 5
tubescrape search "podcast" --type video --duration long --sort-by view_count
tubescrape search "python" --json                    # JSON output

tubescrape channel @lexfridman                       # videos (default)
tubescrape channel @lexfridman shorts                # shorts
tubescrape channel @lexfridman playlists             # playlists
tubescrape channel @lexfridman search "podcast"      # search within channel

tubescrape playlist PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf

tubescrape transcript dQw4w9WgXcQ                    # plain text
tubescrape transcript dQw4w9WgXcQ --format srt       # SRT subtitles
tubescrape transcript dQw4w9WgXcQ --translate es      # translate
tubescrape transcript dQw4w9WgXcQ --save output.srt  # save to file
tubescrape transcript dQw4w9WgXcQ --list-languages   # available languages
tubescrape --proxy http://user:pass@host:port search "python"  # with proxy
export TUBESCRAPE_PROXY="http://user:pass@host:port"           # env variable

REST API

Install with pip install "tubescrape[api]".

tubescrape serve                          # starts on localhost:8000
tubescrape serve --host 0.0.0.0 --port 3000

Interactive Swagger docs at http://localhost:8000/docs.

Method Endpoint Description
GET /api/v1/search?q=python Search videos
GET /api/v1/channel/{id}/videos Channel videos
GET /api/v1/channel/{id}/shorts Channel shorts
GET /api/v1/channel/{id}/playlists Channel playlists
GET /api/v1/channel/{id}/search?q=... Search within channel
GET /api/v1/playlist/{id} Fetch playlist
GET /api/v1/transcript/{video_id} Fetch transcript
GET /api/v1/transcript/{video_id}/languages List languages
GET /health Health check
curl "http://localhost:8000/api/v1/search?q=python+tutorial&max_results=5"
curl "http://localhost:8000/api/v1/transcript/dQw4w9WgXcQ?format=srt&translate_to=es"

Error Handling

All exceptions inherit from YouTubeError:

YouTubeError
├── RequestError
│   ├── RateLimitError          # HTTP 429
│   └── BotDetectedError        # HTTP 403
├── VideoUnavailableError       # private, deleted, region-locked
│   └── AgeRestrictedError
├── TranscriptsDisabledError
├── TranscriptsNotAvailableError
├── TranscriptFetchError
├── TranslationNotAvailableError
├── ChannelNotFoundError
├── PlaylistNotFoundError
├── APIKeyNotFoundError
└── ParsingError
from tubescrape import YouTube, YouTubeError, RateLimitError

try:
    results = yt.search('python')
except RateLimitError:
    print('Rate limited — use a proxy')
except YouTubeError as e:
    print(f'YouTube error: {e}')

Full Documentation

For detailed examples, all field references, and advanced usage, see the Complete Usage Guide.


Warning

This library uses YouTube's undocumented InnerTube API. It may break if YouTube changes their internal API. If it does, please open an issue.


Contributing

git clone https://github.com/zaidkx37/tubescrape.git
cd tubescrape
pip install -e ".[all,dev]"

pytest                    # run tests
ruff check src/           # lint
mypy src/tubescrape/      # type check

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubescrape-0.1.0.tar.gz (646.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tubescrape-0.1.0-py3-none-any.whl (52.8 kB view details)

Uploaded Python 3

File details

Details for the file tubescrape-0.1.0.tar.gz.

File metadata

  • Download URL: tubescrape-0.1.0.tar.gz
  • Upload date:
  • Size: 646.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for tubescrape-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3fc39d7eb24172fb9ed0b6ee5d09403692da74d872be80b2186384c9c57758cb
MD5 bda774fe5a889473b6a7a84b3c73eb89
BLAKE2b-256 c616d68728d77f681353f402c9a92d77afcaf2595c2270d40a2f3adbd04a68f1

See more details on using hashes here.

File details

Details for the file tubescrape-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tubescrape-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 52.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for tubescrape-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b252ebfdba03e423d8abd3dbf93a392656342cc5dbb1fc0181ee284edfbf934
MD5 6150a2c3523f34a02ec00235e9ea553c
BLAKE2b-256 515bd6dd1c96d097dc791a377840ce95d20f48f05d1a2fe0e000fc8e23e0f83d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page